linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 0/2] Detect stalls on guest vCPUS
@ 2022-06-16  9:27 Sebastian Ene
  2022-06-16  9:27 ` [PATCH v6 1/2] dt-bindings: vcpu_stall_detector: Add qemu,vcpu-stall-detector compatible Sebastian Ene
  2022-06-16  9:27 ` [PATCH v6 2/2] misc: Add a mechanism to detect stalls on guest vCPUs Sebastian Ene
  0 siblings, 2 replies; 12+ messages in thread
From: Sebastian Ene @ 2022-06-16  9:27 UTC (permalink / raw)
  To: Rob Herring, Greg Kroah-Hartman, Arnd Bergmann, Dragan Cvetic
  Cc: linux-kernel, devicetree, maz, will, vdonnefort, Guenter Roeck,
	Sebastian Ene

This adds a mechanism to detect stalls on the guest vCPUS by creating a
per CPU hrtimer which periodically 'pets' the host backend driver.
On a conventional watchdog-core driver, the userspace is responsible for
delivering the 'pet' events by writing to the particular /dev/watchdogN node.
In this case we require a strong thread affinity to be able to
account for lost time on a per vCPU basis.

This device driver acts as a soft lockup detector by relying on the host
backend driver to measure the elapesed time between subsequent 'pet' events.
If the elapsed time doesn't match an expected value, the backend driver
decides that the guest vCPU is locked and resets the guest. The host
backend driver takes into account the time that the guest is not
running. The communication with the backend driver is done through MMIO
and the register layout of the virtual watchdog is described as part of
the backend driver changes.

The host backend driver is implemented as part of:
https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/3548817

Changelog v6:
 - fix issues reported by lkp@intel robot

Changelog v5:
 - fix dt warnings
 - rename %s/watchdog/stall_detector/g
 - rename the config from Kconfig VM_WATCHDOG -> VCPU_STALL_DETECTOR

Changelog v4:
 - rename the source from vm-wdt.c -> vm-watchdog.c
 - convert all the error logging calls from pr_* to dev_* calls
 - rename the DTS node "clock" to "clock-frequency"

Changelog v3:
 - cosmetic fixes, remove pr_info and version information
 - improve description in the commit messag
 - improve description in the Kconfig help section

Sebastian Ene (2):
  dt-bindings: vcpu_stall_detector: Add qemu,vcpu-stall-detector
    compatible
  misc: Add a mechanism to detect stalls on guest vCPUs

 .../bindings/misc/vcpu_stall_detector.yaml    |  49 ++++
 drivers/misc/Kconfig                          |  12 +
 drivers/misc/Makefile                         |   1 +
 drivers/misc/vcpu_stall_detector.c            | 222 ++++++++++++++++++
 4 files changed, 284 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/misc/vcpu_stall_detector.yaml
 create mode 100644 drivers/misc/vcpu_stall_detector.c

-- 
2.36.1.476.g0c4daa206d-goog


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v6 1/2] dt-bindings: vcpu_stall_detector: Add qemu,vcpu-stall-detector compatible
  2022-06-16  9:27 [PATCH v6 0/2] Detect stalls on guest vCPUS Sebastian Ene
@ 2022-06-16  9:27 ` Sebastian Ene
  2022-06-16 14:05   ` Rob Herring
  2022-06-16 15:30   ` Rob Herring
  2022-06-16  9:27 ` [PATCH v6 2/2] misc: Add a mechanism to detect stalls on guest vCPUs Sebastian Ene
  1 sibling, 2 replies; 12+ messages in thread
From: Sebastian Ene @ 2022-06-16  9:27 UTC (permalink / raw)
  To: Rob Herring, Greg Kroah-Hartman, Arnd Bergmann, Dragan Cvetic
  Cc: linux-kernel, devicetree, maz, will, vdonnefort, Guenter Roeck,
	Sebastian Ene

The VCPU stall detection mechanism allows to configure the expiration
duration and the internal counter clock frequency measured in Hz.
Add these properties in the schema.

While this is a memory mapped virtual device, it is expected to be loaded
when the DT contains the compatible: "qemu,vcpu-stall-detector" node.
In a protected VM we trust the generated DT nodes and we don't rely on
the host to present the hardware peripherals.

Signed-off-by: Sebastian Ene <sebastianene@google.com>
---
 .../bindings/misc/vcpu_stall_detector.yaml    | 49 +++++++++++++++++++
 1 file changed, 49 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/misc/vcpu_stall_detector.yaml

diff --git a/Documentation/devicetree/bindings/misc/vcpu_stall_detector.yaml b/Documentation/devicetree/bindings/misc/vcpu_stall_detector.yaml
new file mode 100644
index 000000000000..55323676194b
--- /dev/null
+++ b/Documentation/devicetree/bindings/misc/vcpu_stall_detector.yaml
@@ -0,0 +1,49 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/misc/vcpu_stall_detector.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: VCPU stall detector
+
+description: |
+  This binding describes a CPU stall detector mechanism for virtual cpus
+  which is accessed through MMIO.
+
+maintainers:
+  - Sebastian Ene <sebastianene@google.com>
+
+properties:
+  compatible:
+    enum:
+      - qemu,vcpu-stall-detector
+
+  clock-frequency:
+    $ref: /schemas/types.yaml#/definitions/uint32
+    description: |
+      The internal clock of the stall detector peripheral measure in Hz used
+      to decrement its internal counter register on each tick.
+      Defaults to 10 if unset.
+
+  timeout-sec:
+    $ref: /schemas/types.yaml#/definitions/uint32
+    description: |
+      The stall detector expiration timeout measured in seconds.
+      Defaults to 8 if unset. Please note that it also takes into account the
+      time spent while the VCPU is not running.
+
+required:
+  - compatible
+
+additionalProperties: false
+
+examples:
+  - |
+    vmwdt@9030000 {
+      compatible = "qemu,vcpu-stall-detector";
+      clock-frequency = <10>;
+      timeout-sec = <8>;
+      reg = <0x0 0x9030000 0x0 0x10000>;
+    };
+
+...
-- 
2.36.1.476.g0c4daa206d-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v6 2/2] misc: Add a mechanism to detect stalls on guest vCPUs
  2022-06-16  9:27 [PATCH v6 0/2] Detect stalls on guest vCPUS Sebastian Ene
  2022-06-16  9:27 ` [PATCH v6 1/2] dt-bindings: vcpu_stall_detector: Add qemu,vcpu-stall-detector compatible Sebastian Ene
@ 2022-06-16  9:27 ` Sebastian Ene
  2022-06-16 10:08   ` Greg Kroah-Hartman
  2022-06-16 10:10   ` Greg Kroah-Hartman
  1 sibling, 2 replies; 12+ messages in thread
From: Sebastian Ene @ 2022-06-16  9:27 UTC (permalink / raw)
  To: Rob Herring, Greg Kroah-Hartman, Arnd Bergmann, Dragan Cvetic
  Cc: linux-kernel, devicetree, maz, will, vdonnefort, Guenter Roeck,
	Sebastian Ene, kernel test robot

This driver creates per-cpu hrtimers which are required to do the
periodic 'pet' operation. On a conventional watchdog-core driver, the
userspace is responsible for delivering the 'pet' events by writing to
the particular /dev/watchdogN node. In this case we require a strong
thread affinity to be able to account for lost time on a per vCPU.

This part of the driver is the 'frontend' which is reponsible for
delivering the periodic 'pet' events, configuring the virtual peripheral
and listening for cpu hotplug events. The other part of the driver
handles the peripheral emulation and this part accounts for lost time by
looking at the /proc/{}/task/{}/stat entries and is located here:
https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/3548817

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Sebastian Ene <sebastianene@google.com>
---
 drivers/misc/Kconfig               |  12 ++
 drivers/misc/Makefile              |   1 +
 drivers/misc/vcpu_stall_detector.c | 222 +++++++++++++++++++++++++++++
 3 files changed, 235 insertions(+)
 create mode 100644 drivers/misc/vcpu_stall_detector.c

diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index 41d2bb0ae23a..9b3cb5dfd5a7 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@@ -483,6 +483,18 @@ config OPEN_DICE
 
 	  If unsure, say N.
 
+config VCPU_STALL_DETECTOR
+	tristate "VCPU stall detector"
+	select LOCKUP_DETECTOR
+	help
+	  Detect CPU locks on the virtual machine. This driver relies on the
+	  hrtimers which are CPU-binded to do the 'pet' operation. When a vCPU
+	  has to do a 'pet', it exits the guest through MMIO write and the
+	  backend driver takes into account the lost ticks for this particular
+	  CPU.
+	  To compile this driver as a module, choose M here: the
+	  module will be called vcpu_stall_detector.
+
 source "drivers/misc/c2port/Kconfig"
 source "drivers/misc/eeprom/Kconfig"
 source "drivers/misc/cb710/Kconfig"
diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
index 70e800e9127f..2be8542616dd 100644
--- a/drivers/misc/Makefile
+++ b/drivers/misc/Makefile
@@ -60,3 +60,4 @@ obj-$(CONFIG_XILINX_SDFEC)	+= xilinx_sdfec.o
 obj-$(CONFIG_HISI_HIKEY_USB)	+= hisi_hikey_usb.o
 obj-$(CONFIG_HI6421V600_IRQ)	+= hi6421v600-irq.o
 obj-$(CONFIG_OPEN_DICE)		+= open-dice.o
+obj-$(CONFIG_VCPU_STALL_DETECTOR)	+= vcpu_stall_detector.o
\ No newline at end of file
diff --git a/drivers/misc/vcpu_stall_detector.c b/drivers/misc/vcpu_stall_detector.c
new file mode 100644
index 000000000000..8b33f04a9719
--- /dev/null
+++ b/drivers/misc/vcpu_stall_detector.c
@@ -0,0 +1,222 @@
+// SPDX-License-Identifier: GPL-2.0
+//
+// VCPU stall detector.
+//  Copyright (C) Google, 2022
+
+#include <linux/cpu.h>
+#include <linux/init.h>
+#include <linux/io.h>
+#include <linux/kernel.h>
+
+#include <linux/device.h>
+#include <linux/interrupt.h>
+#include <linux/module.h>
+#include <linux/nmi.h>
+#include <linux/of.h>
+#include <linux/of_device.h>
+#include <linux/param.h>
+#include <linux/percpu.h>
+#include <linux/platform_device.h>
+#include <linux/slab.h>
+
+#define REG_STATUS		(0x00)
+#define REG_LOAD_CNT		(0x04)
+#define REG_CURRENT_CNT		(0x08)
+#define REG_CLOCK_FREQ_HZ	(0x0C)
+#define REG_LEN			(0x10)
+
+#define DEFAULT_CLOCK_HZ	(10)
+#define DEFAULT_TIMEOT_SEC	(8)
+
+struct vm_stall_detect_s {
+	void __iomem *membase;
+	u32 clock_freq;
+	u32 expiration_sec;
+	u32 ping_timeout_ms;
+	struct hrtimer per_cpu_hrtimer;
+	struct platform_device *dev;
+};
+
+#define vcpu_stall_detect_reg_write(stall_detect, reg, value)	\
+	iowrite32((value), (stall_detect)->membase + (reg))
+#define vcpu_stall_detect_reg_read(stall_detect, reg)		\
+	io32read((stall_detect)->membase + (reg))
+
+static struct platform_device *virt_dev;
+
+static enum hrtimer_restart
+vcpu_stall_detect_timer_fn(struct hrtimer *hrtimer)
+{
+	struct vm_stall_detect_s *cpu_stall_detect;
+	u32 ticks;
+
+	cpu_stall_detect = container_of(hrtimer, struct vm_stall_detect_s,
+					per_cpu_hrtimer);
+	ticks = cpu_stall_detect->clock_freq *
+		cpu_stall_detect->expiration_sec;
+	vcpu_stall_detect_reg_write(cpu_stall_detect, REG_LOAD_CNT, ticks);
+	hrtimer_forward_now(hrtimer,
+			    ms_to_ktime(cpu_stall_detect->ping_timeout_ms));
+
+	return HRTIMER_RESTART;
+}
+
+static void vcpu_stall_detect_start(void *arg)
+{
+	u32 ticks;
+	struct vm_stall_detect_s *cpu_stall_detect = arg;
+	struct hrtimer *hrtimer = &cpu_stall_detect->per_cpu_hrtimer;
+
+	vcpu_stall_detect_reg_write(cpu_stall_detect, REG_CLOCK_FREQ_HZ,
+			cpu_stall_detect->clock_freq);
+
+	/* Compute the number of ticks required for the stall detector counter
+	 * register based on the internal clock frequency and the timeout
+	 * value given from the device tree.
+	 */
+	ticks = cpu_stall_detect->clock_freq *
+		cpu_stall_detect->expiration_sec;
+	vcpu_stall_detect_reg_write(cpu_stall_detect, REG_LOAD_CNT, ticks);
+
+	/* Enable the internal clock and start the stall detector */
+	vcpu_stall_detect_reg_write(cpu_stall_detect, REG_STATUS, 1);
+
+	hrtimer_init(hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+	hrtimer->function = vcpu_stall_detect_timer_fn;
+	hrtimer_start(hrtimer, ms_to_ktime(cpu_stall_detect->ping_timeout_ms),
+		      HRTIMER_MODE_REL_PINNED);
+}
+
+static void vcpu_stall_detect_stop(void *arg)
+{
+	struct vm_stall_detect_s *cpu_stall_detect = arg;
+	struct hrtimer *hrtimer = &cpu_stall_detect->per_cpu_hrtimer;
+
+	hrtimer_cancel(hrtimer);
+
+	/* Disable the stall detector */
+	vcpu_stall_detect_reg_write(cpu_stall_detect, REG_STATUS, 0);
+}
+
+static int start_stall_detector_on_cpu(unsigned int cpu)
+{
+	struct vm_stall_detect_s __percpu *vm_stall_detect;
+
+	vm_stall_detect = (struct vm_stall_detect_s __percpu *)
+		platform_get_drvdata(virt_dev);
+	vcpu_stall_detect_start(this_cpu_ptr(vm_stall_detect));
+	return 0;
+}
+
+static int stop_stall_detector_on_cpu(unsigned int cpu)
+{
+	struct vm_stall_detect_s __percpu *vm_stall_detect;
+
+	vm_stall_detect = (struct vm_stall_detect_s __percpu *)
+		platform_get_drvdata(virt_dev);
+	vcpu_stall_detect_stop(this_cpu_ptr(vm_stall_detect));
+	return 0;
+}
+
+static int vcpu_stall_detect_probe(struct platform_device *dev)
+{
+	int cpu, ret, err;
+	void __iomem *membase;
+	struct resource *r;
+	struct vm_stall_detect_s __percpu *vm_stall_detect;
+	u32 stall_detect_clock, stall_detect_timeout_sec = 0;
+
+	r = platform_get_resource(dev, IORESOURCE_MEM, 0);
+	if (r == NULL)
+		return -ENOENT;
+
+	vm_stall_detect = alloc_percpu(typeof(struct vm_stall_detect_s));
+	if (!vm_stall_detect)
+		return -ENOMEM;
+
+	membase = ioremap(r->start, resource_size(r));
+	if (!membase) {
+		ret = -ENXIO;
+		goto err_withmem;
+	}
+
+	virt_dev = dev;
+	platform_set_drvdata(dev, vm_stall_detect);
+	if (of_property_read_u32(dev->dev.of_node, "clock-frequency",
+				 &stall_detect_clock))
+		stall_detect_clock = DEFAULT_CLOCK_HZ;
+
+	if (of_property_read_u32(dev->dev.of_node, "timeout-sec",
+				 &stall_detect_timeout_sec))
+		stall_detect_timeout_sec = DEFAULT_TIMEOT_SEC;
+
+	for_each_cpu_and(cpu, cpu_online_mask, &watchdog_cpumask) {
+		struct vm_stall_detect_s *cpu_stall_detect;
+
+		cpu_stall_detect = per_cpu_ptr(vm_stall_detect, cpu);
+		cpu_stall_detect->membase = membase + cpu * REG_LEN;
+		cpu_stall_detect->clock_freq = stall_detect_clock;
+		cpu_stall_detect->expiration_sec = stall_detect_timeout_sec;
+		cpu_stall_detect->ping_timeout_ms = stall_detect_timeout_sec *
+			MSEC_PER_SEC / 2;
+		smp_call_function_single(cpu, vcpu_stall_detect_start,
+					 cpu_stall_detect, true);
+	}
+
+	err = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
+					"virt/vcpu_stall_detector:online",
+					start_stall_detector_on_cpu,
+					stop_stall_detector_on_cpu);
+	if (err < 0) {
+		dev_warn(&dev->dev, "failed to install cpu hotplug");
+		ret = err;
+		goto err_withmem;
+	}
+
+	return 0;
+
+err_withmem:
+	free_percpu(vm_stall_detect);
+	return ret;
+}
+
+static int vcpu_stall_detect_remove(struct platform_device *dev)
+{
+	int cpu;
+	struct vm_stall_detect_s __percpu *vm_stall_detect;
+
+	vm_stall_detect = (struct vm_stall_detect_s __percpu *)
+		platform_get_drvdata(dev);
+	for_each_cpu_and(cpu, cpu_online_mask, &watchdog_cpumask) {
+		struct vm_stall_detect_s *cpu_stall_detect;
+
+		cpu_stall_detect = per_cpu_ptr(vm_stall_detect, cpu);
+		smp_call_function_single(cpu, vcpu_stall_detect_stop,
+					 cpu_stall_detect, true);
+	}
+
+	free_percpu(vm_stall_detect);
+	return 0;
+}
+
+static const struct of_device_id vcpu_stall_detect_of_match[] = {
+	{ .compatible = "qemu,vcpu-stall-detector", },
+	{}
+};
+
+MODULE_DEVICE_TABLE(of, vcpu_stall_detect_of_match);
+
+static struct platform_driver vcpu_stall_detect_driver = {
+	.probe  = vcpu_stall_detect_probe,
+	.remove = vcpu_stall_detect_remove,
+	.driver = {
+		.name           = KBUILD_MODNAME,
+		.of_match_table = vcpu_stall_detect_of_match,
+	},
+};
+
+module_platform_driver(vcpu_stall_detect_driver);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Sebastian Ene <sebastianene@google.com>");
+MODULE_DESCRIPTION("VCPU stall detector");
-- 
2.36.1.476.g0c4daa206d-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v6 2/2] misc: Add a mechanism to detect stalls on guest vCPUs
  2022-06-16  9:27 ` [PATCH v6 2/2] misc: Add a mechanism to detect stalls on guest vCPUs Sebastian Ene
@ 2022-06-16 10:08   ` Greg Kroah-Hartman
  2022-06-16 13:08     ` Guenter Roeck
  2022-06-16 16:01     ` Sebastian Ene
  2022-06-16 10:10   ` Greg Kroah-Hartman
  1 sibling, 2 replies; 12+ messages in thread
From: Greg Kroah-Hartman @ 2022-06-16 10:08 UTC (permalink / raw)
  To: Sebastian Ene
  Cc: Rob Herring, Arnd Bergmann, Dragan Cvetic, linux-kernel,
	devicetree, maz, will, vdonnefort, Guenter Roeck,
	kernel test robot

On Thu, Jun 16, 2022 at 09:27:39AM +0000, Sebastian Ene wrote:
> This driver creates per-cpu hrtimers which are required to do the
> periodic 'pet' operation. On a conventional watchdog-core driver, the
> userspace is responsible for delivering the 'pet' events by writing to
> the particular /dev/watchdogN node. In this case we require a strong
> thread affinity to be able to account for lost time on a per vCPU.
> 
> This part of the driver is the 'frontend' which is reponsible for
> delivering the periodic 'pet' events, configuring the virtual peripheral
> and listening for cpu hotplug events. The other part of the driver
> handles the peripheral emulation and this part accounts for lost time by
> looking at the /proc/{}/task/{}/stat entries and is located here:
> https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/3548817
> 
> Reported-by: kernel test robot <lkp@intel.com>

The robot reported stalls on vcpus?

I think you need to fix this up...

greg k-h

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v6 2/2] misc: Add a mechanism to detect stalls on guest vCPUs
  2022-06-16  9:27 ` [PATCH v6 2/2] misc: Add a mechanism to detect stalls on guest vCPUs Sebastian Ene
  2022-06-16 10:08   ` Greg Kroah-Hartman
@ 2022-06-16 10:10   ` Greg Kroah-Hartman
  2022-06-16 16:03     ` Sebastian Ene
  1 sibling, 1 reply; 12+ messages in thread
From: Greg Kroah-Hartman @ 2022-06-16 10:10 UTC (permalink / raw)
  To: Sebastian Ene
  Cc: Rob Herring, Arnd Bergmann, Dragan Cvetic, linux-kernel,
	devicetree, maz, will, vdonnefort, Guenter Roeck,
	kernel test robot

On Thu, Jun 16, 2022 at 09:27:39AM +0000, Sebastian Ene wrote:
> This driver creates per-cpu hrtimers which are required to do the
> periodic 'pet' operation. On a conventional watchdog-core driver, the
> userspace is responsible for delivering the 'pet' events by writing to
> the particular /dev/watchdogN node. In this case we require a strong
> thread affinity to be able to account for lost time on a per vCPU.
> 
> This part of the driver is the 'frontend' which is reponsible for
> delivering the periodic 'pet' events, configuring the virtual peripheral
> and listening for cpu hotplug events. The other part of the driver
> handles the peripheral emulation and this part accounts for lost time by
> looking at the /proc/{}/task/{}/stat entries and is located here:
> https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/3548817
> 
> Reported-by: kernel test robot <lkp@intel.com>
> Signed-off-by: Sebastian Ene <sebastianene@google.com>
> ---
>  drivers/misc/Kconfig               |  12 ++
>  drivers/misc/Makefile              |   1 +
>  drivers/misc/vcpu_stall_detector.c | 222 +++++++++++++++++++++++++++++
>  3 files changed, 235 insertions(+)
>  create mode 100644 drivers/misc/vcpu_stall_detector.c
> 
> diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
> index 41d2bb0ae23a..9b3cb5dfd5a7 100644
> --- a/drivers/misc/Kconfig
> +++ b/drivers/misc/Kconfig
> @@ -483,6 +483,18 @@ config OPEN_DICE
>  
>  	  If unsure, say N.
>  
> +config VCPU_STALL_DETECTOR
> +	tristate "VCPU stall detector"
> +	select LOCKUP_DETECTOR
> +	help
> +	  Detect CPU locks on the virtual machine. This driver relies on the
> +	  hrtimers which are CPU-binded to do the 'pet' operation. When a vCPU
> +	  has to do a 'pet', it exits the guest through MMIO write and the
> +	  backend driver takes into account the lost ticks for this particular
> +	  CPU.

which virtual machine framework is this for?  kvm?  xen?  hyperv?
vmware?  something else?

Specifics please...

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v6 2/2] misc: Add a mechanism to detect stalls on guest vCPUs
  2022-06-16 10:08   ` Greg Kroah-Hartman
@ 2022-06-16 13:08     ` Guenter Roeck
  2022-06-16 16:05       ` Sebastian Ene
  2022-06-16 16:01     ` Sebastian Ene
  1 sibling, 1 reply; 12+ messages in thread
From: Guenter Roeck @ 2022-06-16 13:08 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Sebastian Ene
  Cc: Rob Herring, Arnd Bergmann, Dragan Cvetic, linux-kernel,
	devicetree, maz, will, vdonnefort, kernel test robot

On 6/16/22 03:08, Greg Kroah-Hartman wrote:
> On Thu, Jun 16, 2022 at 09:27:39AM +0000, Sebastian Ene wrote:
>> This driver creates per-cpu hrtimers which are required to do the
>> periodic 'pet' operation. On a conventional watchdog-core driver, the
>> userspace is responsible for delivering the 'pet' events by writing to
>> the particular /dev/watchdogN node. In this case we require a strong
>> thread affinity to be able to account for lost time on a per vCPU.
>>
>> This part of the driver is the 'frontend' which is reponsible for
>> delivering the periodic 'pet' events, configuring the virtual peripheral
>> and listening for cpu hotplug events. The other part of the driver
>> handles the peripheral emulation and this part accounts for lost time by
>> looking at the /proc/{}/task/{}/stat entries and is located here:
>> https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/3548817
>>
>> Reported-by: kernel test robot <lkp@intel.com>
> 
> The robot reported stalls on vcpus?
> 

I have seen this a number of times when people fix issues reported by
the robot in their submissions, just because the robot asks them to
do so. This should really be part of the change log, such as

v17: Fixed the following issues issues reported by the kernel test robot:
      ...

Guenter

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v6 1/2] dt-bindings: vcpu_stall_detector: Add qemu,vcpu-stall-detector compatible
  2022-06-16  9:27 ` [PATCH v6 1/2] dt-bindings: vcpu_stall_detector: Add qemu,vcpu-stall-detector compatible Sebastian Ene
@ 2022-06-16 14:05   ` Rob Herring
  2022-06-16 15:30   ` Rob Herring
  1 sibling, 0 replies; 12+ messages in thread
From: Rob Herring @ 2022-06-16 14:05 UTC (permalink / raw)
  To: Sebastian Ene
  Cc: Greg Kroah-Hartman, Arnd Bergmann, vdonnefort, maz, will,
	Guenter Roeck, Rob Herring, Dragan Cvetic, linux-kernel,
	devicetree

On Thu, 16 Jun 2022 09:27:38 +0000, Sebastian Ene wrote:
> The VCPU stall detection mechanism allows to configure the expiration
> duration and the internal counter clock frequency measured in Hz.
> Add these properties in the schema.
> 
> While this is a memory mapped virtual device, it is expected to be loaded
> when the DT contains the compatible: "qemu,vcpu-stall-detector" node.
> In a protected VM we trust the generated DT nodes and we don't rely on
> the host to present the hardware peripherals.
> 
> Signed-off-by: Sebastian Ene <sebastianene@google.com>
> ---
>  .../bindings/misc/vcpu_stall_detector.yaml    | 49 +++++++++++++++++++
>  1 file changed, 49 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/misc/vcpu_stall_detector.yaml
> 

My bot found errors running 'make DT_CHECKER_FLAGS=-m dt_binding_check'
on your patch (DT_CHECKER_FLAGS is new in v5.13):

yamllint warnings/errors:

dtschema/dtc warnings/errors:
/builds/robherring/linux-dt-review/Documentation/devicetree/bindings/misc/vcpu_stall_detector.yaml: properties:timeout-sec: '$ref' should not be valid under {'const': '$ref'}
	hint: Standard unit suffix properties don't need a type $ref
	from schema $id: http://devicetree.org/meta-schemas/core.yaml#
/builds/robherring/linux-dt-review/Documentation/devicetree/bindings/misc/vcpu_stall_detector.yaml: ignoring, error in schema: properties: timeout-sec
Documentation/devicetree/bindings/misc/vcpu_stall_detector.example.dtb:0:0: /example-0/vmwdt@9030000: failed to match any schema with compatible: ['qemu,vcpu-stall-detector']

doc reference errors (make refcheckdocs):

See https://patchwork.ozlabs.org/patch/

This check can fail if there are any dependencies. The base for a patch
series is generally the most recent rc1.

If you already ran 'make dt_binding_check' and didn't see the above
error(s), then make sure 'yamllint' is installed and dt-schema is up to
date:

pip3 install dtschema --upgrade

Please check and re-submit.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v6 1/2] dt-bindings: vcpu_stall_detector: Add qemu,vcpu-stall-detector compatible
  2022-06-16  9:27 ` [PATCH v6 1/2] dt-bindings: vcpu_stall_detector: Add qemu,vcpu-stall-detector compatible Sebastian Ene
  2022-06-16 14:05   ` Rob Herring
@ 2022-06-16 15:30   ` Rob Herring
  1 sibling, 0 replies; 12+ messages in thread
From: Rob Herring @ 2022-06-16 15:30 UTC (permalink / raw)
  To: Sebastian Ene
  Cc: Greg Kroah-Hartman, Arnd Bergmann, Dragan Cvetic, linux-kernel,
	devicetree, maz, will, vdonnefort, Guenter Roeck

On Thu, Jun 16, 2022 at 09:27:38AM +0000, Sebastian Ene wrote:
> The VCPU stall detection mechanism allows to configure the expiration
> duration and the internal counter clock frequency measured in Hz.
> Add these properties in the schema.
> 
> While this is a memory mapped virtual device, it is expected to be loaded
> when the DT contains the compatible: "qemu,vcpu-stall-detector" node.
> In a protected VM we trust the generated DT nodes and we don't rely on
> the host to present the hardware peripherals.
> 
> Signed-off-by: Sebastian Ene <sebastianene@google.com>
> ---
>  .../bindings/misc/vcpu_stall_detector.yaml    | 49 +++++++++++++++++++

qemu,vcpu-stall-detector.yaml

>  1 file changed, 49 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/misc/vcpu_stall_detector.yaml
> 
> diff --git a/Documentation/devicetree/bindings/misc/vcpu_stall_detector.yaml b/Documentation/devicetree/bindings/misc/vcpu_stall_detector.yaml
> new file mode 100644
> index 000000000000..55323676194b
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/misc/vcpu_stall_detector.yaml
> @@ -0,0 +1,49 @@
> +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/misc/vcpu_stall_detector.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: VCPU stall detector
> +
> +description: |

Don't need '|' if no formatting.

> +  This binding describes a CPU stall detector mechanism for virtual cpus

s/cpus/CPUs/

> +  which is accessed through MMIO.
> +
> +maintainers:
> +  - Sebastian Ene <sebastianene@google.com>
> +
> +properties:
> +  compatible:
> +    enum:
> +      - qemu,vcpu-stall-detector
> +
> +  clock-frequency:
> +    $ref: /schemas/types.yaml#/definitions/uint32
> +    description: |
> +      The internal clock of the stall detector peripheral measure in Hz used
> +      to decrement its internal counter register on each tick.
> +      Defaults to 10 if unset.

       default: 10

> +
> +  timeout-sec:
> +    $ref: /schemas/types.yaml#/definitions/uint32
> +    description: |
> +      The stall detector expiration timeout measured in seconds.
> +      Defaults to 8 if unset. Please note that it also takes into account the
> +      time spent while the VCPU is not running.

       default: 8

> +
> +required:
> +  - compatible
> +
> +additionalProperties: false
> +
> +examples:
> +  - |
> +    vmwdt@9030000 {
> +      compatible = "qemu,vcpu-stall-detector";
> +      clock-frequency = <10>;
> +      timeout-sec = <8>;
> +      reg = <0x0 0x9030000 0x0 0x10000>;
> +    };
> +
> +...
> -- 
> 2.36.1.476.g0c4daa206d-goog
> 
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v6 2/2] misc: Add a mechanism to detect stalls on guest vCPUs
  2022-06-16 10:08   ` Greg Kroah-Hartman
  2022-06-16 13:08     ` Guenter Roeck
@ 2022-06-16 16:01     ` Sebastian Ene
  2022-06-16 16:09       ` Guenter Roeck
  1 sibling, 1 reply; 12+ messages in thread
From: Sebastian Ene @ 2022-06-16 16:01 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Rob Herring, Arnd Bergmann, Dragan Cvetic, linux-kernel,
	devicetree, maz, will, vdonnefort, Guenter Roeck

On Thu, Jun 16, 2022 at 12:08:55PM +0200, Greg Kroah-Hartman wrote:
> On Thu, Jun 16, 2022 at 09:27:39AM +0000, Sebastian Ene wrote:
> > This driver creates per-cpu hrtimers which are required to do the
> > periodic 'pet' operation. On a conventional watchdog-core driver, the
> > userspace is responsible for delivering the 'pet' events by writing to
> > the particular /dev/watchdogN node. In this case we require a strong
> > thread affinity to be able to account for lost time on a per vCPU.
> > 
> > This part of the driver is the 'frontend' which is reponsible for
> > delivering the periodic 'pet' events, configuring the virtual peripheral
> > and listening for cpu hotplug events. The other part of the driver
> > handles the peripheral emulation and this part accounts for lost time by
> > looking at the /proc/{}/task/{}/stat entries and is located here:
> > https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/3548817
> > 
> > Reported-by: kernel test robot <lkp@intel.com>
> 

Hi,

> The robot reported stalls on vcpus?
> 
> I think you need to fix this up...

The robot reported some issues on v5 and after fixing them it
recommended to add this tag.

> 
> greg k-h

Thanks,
Seb

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v6 2/2] misc: Add a mechanism to detect stalls on guest vCPUs
  2022-06-16 10:10   ` Greg Kroah-Hartman
@ 2022-06-16 16:03     ` Sebastian Ene
  0 siblings, 0 replies; 12+ messages in thread
From: Sebastian Ene @ 2022-06-16 16:03 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Rob Herring, Arnd Bergmann, Dragan Cvetic, linux-kernel,
	devicetree, maz, will, vdonnefort, Guenter Roeck

On Thu, Jun 16, 2022 at 12:10:02PM +0200, Greg Kroah-Hartman wrote:
> On Thu, Jun 16, 2022 at 09:27:39AM +0000, Sebastian Ene wrote:
> > This driver creates per-cpu hrtimers which are required to do the
> > periodic 'pet' operation. On a conventional watchdog-core driver, the
> > userspace is responsible for delivering the 'pet' events by writing to
> > the particular /dev/watchdogN node. In this case we require a strong
> > thread affinity to be able to account for lost time on a per vCPU.
> > 
> > This part of the driver is the 'frontend' which is reponsible for
> > delivering the periodic 'pet' events, configuring the virtual peripheral
> > and listening for cpu hotplug events. The other part of the driver
> > handles the peripheral emulation and this part accounts for lost time by
> > looking at the /proc/{}/task/{}/stat entries and is located here:
> > https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/3548817
> > 
> > Reported-by: kernel test robot <lkp@intel.com>
> > Signed-off-by: Sebastian Ene <sebastianene@google.com>
> > ---
> >  drivers/misc/Kconfig               |  12 ++
> >  drivers/misc/Makefile              |   1 +
> >  drivers/misc/vcpu_stall_detector.c | 222 +++++++++++++++++++++++++++++
> >  3 files changed, 235 insertions(+)
> >  create mode 100644 drivers/misc/vcpu_stall_detector.c
> > 
> > diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
> > index 41d2bb0ae23a..9b3cb5dfd5a7 100644
> > --- a/drivers/misc/Kconfig
> > +++ b/drivers/misc/Kconfig
> > @@ -483,6 +483,18 @@ config OPEN_DICE
> >  
> >  	  If unsure, say N.
> >  
> > +config VCPU_STALL_DETECTOR
> > +	tristate "VCPU stall detector"
> > +	select LOCKUP_DETECTOR
> > +	help
> > +	  Detect CPU locks on the virtual machine. This driver relies on the
> > +	  hrtimers which are CPU-binded to do the 'pet' operation. When a vCPU
> > +	  has to do a 'pet', it exits the guest through MMIO write and the
> > +	  backend driver takes into account the lost ticks for this particular
> > +	  CPU.

Hi,

> 
> which virtual machine framework is this for?  kvm?  xen?  hyperv?
> vmware?  something else?
> 
> Specifics please...
> 
> thanks,

I will improve my description, thanks for taking a look.

> 
> greg k-h

Seb

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v6 2/2] misc: Add a mechanism to detect stalls on guest vCPUs
  2022-06-16 13:08     ` Guenter Roeck
@ 2022-06-16 16:05       ` Sebastian Ene
  0 siblings, 0 replies; 12+ messages in thread
From: Sebastian Ene @ 2022-06-16 16:05 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Rob Herring, Arnd Bergmann, Dragan Cvetic, linux-kernel,
	devicetree, maz, will, vdonnefort, Greg Kroah-Hartman,
	kernel test robot

On Thu, Jun 16, 2022 at 06:08:51AM -0700, Guenter Roeck wrote:
> On 6/16/22 03:08, Greg Kroah-Hartman wrote:
> > On Thu, Jun 16, 2022 at 09:27:39AM +0000, Sebastian Ene wrote:
> > > This driver creates per-cpu hrtimers which are required to do the
> > > periodic 'pet' operation. On a conventional watchdog-core driver, the
> > > userspace is responsible for delivering the 'pet' events by writing to
> > > the particular /dev/watchdogN node. In this case we require a strong
> > > thread affinity to be able to account for lost time on a per vCPU.
> > > 
> > > This part of the driver is the 'frontend' which is reponsible for
> > > delivering the periodic 'pet' events, configuring the virtual peripheral
> > > and listening for cpu hotplug events. The other part of the driver
> > > handles the peripheral emulation and this part accounts for lost time by
> > > looking at the /proc/{}/task/{}/stat entries and is located here:
> > > https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/3548817
> > > 
> > > Reported-by: kernel test robot <lkp@intel.com>
> > 
> > The robot reported stalls on vcpus?
> > 

Hi,

> 
> I have seen this a number of times when people fix issues reported by
> the robot in their submissions, just because the robot asks them to
> do so. This should really be part of the change log, such as
> 
> v17: Fixed the following issues issues reported by the kernel test robot:
>      ...
> 

I will add this in the changelog for v7.

> Guenter

Thanks,
Seb

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v6 2/2] misc: Add a mechanism to detect stalls on guest vCPUs
  2022-06-16 16:01     ` Sebastian Ene
@ 2022-06-16 16:09       ` Guenter Roeck
  0 siblings, 0 replies; 12+ messages in thread
From: Guenter Roeck @ 2022-06-16 16:09 UTC (permalink / raw)
  To: Sebastian Ene, Greg Kroah-Hartman
  Cc: Rob Herring, Arnd Bergmann, Dragan Cvetic, linux-kernel,
	devicetree, maz, will, vdonnefort

On 6/16/22 09:01, Sebastian Ene wrote:
> On Thu, Jun 16, 2022 at 12:08:55PM +0200, Greg Kroah-Hartman wrote:
>> On Thu, Jun 16, 2022 at 09:27:39AM +0000, Sebastian Ene wrote:
>>> This driver creates per-cpu hrtimers which are required to do the
>>> periodic 'pet' operation. On a conventional watchdog-core driver, the
>>> userspace is responsible for delivering the 'pet' events by writing to
>>> the particular /dev/watchdogN node. In this case we require a strong
>>> thread affinity to be able to account for lost time on a per vCPU.
>>>
>>> This part of the driver is the 'frontend' which is reponsible for
>>> delivering the periodic 'pet' events, configuring the virtual peripheral
>>> and listening for cpu hotplug events. The other part of the driver
>>> handles the peripheral emulation and this part accounts for lost time by
>>> looking at the /proc/{}/task/{}/stat entries and is located here:
>>> https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/3548817
>>>
>>> Reported-by: kernel test robot <lkp@intel.com>
>>
> 
> Hi,
> 
>> The robot reported stalls on vcpus?
>>
>> I think you need to fix this up...
> 
> The robot reported some issues on v5 and after fixing them it
> recommended to add this tag.
> 

Only that doesn't make sense for patch sets which are still being worked
on. If you want to credit the robot, mention it in the change log.

Guenter

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-06-16 16:09 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-16  9:27 [PATCH v6 0/2] Detect stalls on guest vCPUS Sebastian Ene
2022-06-16  9:27 ` [PATCH v6 1/2] dt-bindings: vcpu_stall_detector: Add qemu,vcpu-stall-detector compatible Sebastian Ene
2022-06-16 14:05   ` Rob Herring
2022-06-16 15:30   ` Rob Herring
2022-06-16  9:27 ` [PATCH v6 2/2] misc: Add a mechanism to detect stalls on guest vCPUs Sebastian Ene
2022-06-16 10:08   ` Greg Kroah-Hartman
2022-06-16 13:08     ` Guenter Roeck
2022-06-16 16:05       ` Sebastian Ene
2022-06-16 16:01     ` Sebastian Ene
2022-06-16 16:09       ` Guenter Roeck
2022-06-16 10:10   ` Greg Kroah-Hartman
2022-06-16 16:03     ` Sebastian Ene

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).