kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch 0/3] cpuidle-haltpoll driver (v2)
@ 2019-06-03 22:52 Marcelo Tosatti
  2019-06-03 22:52 ` [patch 1/3] drivers/cpuidle: add cpuidle-haltpoll driver Marcelo Tosatti
                   ` (3 more replies)
  0 siblings, 4 replies; 29+ messages in thread
From: Marcelo Tosatti @ 2019-06-03 22:52 UTC (permalink / raw)
  To: kvm-devel
  Cc: Paolo Bonzini, Radim Krčmář,
	Andrea Arcangeli, Rafael J. Wysocki, Peter Zijlstra, Wanpeng Li,
	Konrad Rzeszutek Wilk, Raslan KarimAllah, Boris Ostrovsky,
	Ankur Arora, Christian Borntraeger

The cpuidle-haltpoll driver allows the guest vcpus to poll for a specified
amount of time before halting. This provides the following benefits
to host side polling:

        1) The POLL flag is set while polling is performed, which allows
           a remote vCPU to avoid sending an IPI (and the associated
           cost of handling the IPI) when performing a wakeup.

        2) The HLT VM-exit cost can be avoided.

The downside of guest side polling is that polling is performed
even with other runnable tasks in the host.

Results comparing halt_poll_ns and server/client application
where a small packet is ping-ponged:

host                                        --> 31.33
halt_poll_ns=300000 / no guest busy spin    --> 33.40   (93.8%)
halt_poll_ns=0 / guest_halt_poll_ns=300000  --> 32.73   (95.7%)

For the SAP HANA benchmarks (where idle_spin is a parameter
of the previous version of the patch, results should be the
same):

hpns == halt_poll_ns

                          idle_spin=0/   idle_spin=800/    idle_spin=0/
                          hpns=200000    hpns=0            hpns=800000
DeleteC06T03 (100 thread) 1.76           1.71 (-3%)        1.78   (+1%)
InsertC16T02 (100 thread) 2.14           2.07 (-3%)        2.18   (+1.8%)
DeleteC00T01 (1 thread)   1.34           1.28 (-4.5%)	   1.29   (-3.7%)
UpdateC00T03 (1 thread)   4.72           4.18 (-12%)	   4.53   (-5%)

V2:

- Move from x86 to generic code (Paolo/Christian).
- Add auto-tuning logic (Paolo).
- Add MSR to disable host side polling (Paolo).




^ permalink raw reply	[flat|nested] 29+ messages in thread

* [patch 1/3] drivers/cpuidle: add cpuidle-haltpoll driver
  2019-06-03 22:52 [patch 0/3] cpuidle-haltpoll driver (v2) Marcelo Tosatti
@ 2019-06-03 22:52 ` Marcelo Tosatti
  2019-06-05  8:16   ` Ankur Arora
                     ` (3 more replies)
  2019-06-03 22:52 ` [patch 2/3] kvm: x86: add host poll control msrs Marcelo Tosatti
                   ` (2 subsequent siblings)
  3 siblings, 4 replies; 29+ messages in thread
From: Marcelo Tosatti @ 2019-06-03 22:52 UTC (permalink / raw)
  To: kvm-devel
  Cc: Paolo Bonzini, Radim Krčmář,
	Andrea Arcangeli, Rafael J. Wysocki, Peter Zijlstra, Wanpeng Li,
	Konrad Rzeszutek Wilk, Raslan KarimAllah, Boris Ostrovsky,
	Ankur Arora, Christian Borntraeger

[-- Attachment #1: 01-cpuidle-haltpoll --]
[-- Type: text/plain, Size: 13014 bytes --]

The cpuidle_kvm driver allows the guest vcpus to poll for a specified
amount of time before halting. This provides the following benefits
to host side polling:

        1) The POLL flag is set while polling is performed, which allows
           a remote vCPU to avoid sending an IPI (and the associated
           cost of handling the IPI) when performing a wakeup.

        2) The HLT VM-exit cost can be avoided.

The downside of guest side polling is that polling is performed
even with other runnable tasks in the host.

Results comparing halt_poll_ns and server/client application
where a small packet is ping-ponged:

host                                        --> 31.33
halt_poll_ns=300000 / no guest busy spin    --> 33.40   (93.8%)
halt_poll_ns=0 / guest_halt_poll_ns=300000  --> 32.73   (95.7%)

For the SAP HANA benchmarks (where idle_spin is a parameter
of the previous version of the patch, results should be the
same):

hpns == halt_poll_ns

                          idle_spin=0/   idle_spin=800/    idle_spin=0/
                          hpns=200000    hpns=0            hpns=800000
DeleteC06T03 (100 thread) 1.76           1.71 (-3%)        1.78   (+1%)
InsertC16T02 (100 thread) 2.14           2.07 (-3%)        2.18   (+1.8%)
DeleteC00T01 (1 thread)   1.34           1.28 (-4.5%)	   1.29   (-3.7%)
UpdateC00T03 (1 thread)   4.72           4.18 (-12%)	   4.53   (-5%)

---
 Documentation/virtual/guest-halt-polling.txt |   78 ++++++++++++
 arch/x86/kernel/process.c                    |    2 
 drivers/cpuidle/Kconfig                      |   10 +
 drivers/cpuidle/Makefile                     |    1 
 drivers/cpuidle/cpuidle-haltpoll-trace.h     |   65 ++++++++++
 drivers/cpuidle/cpuidle-haltpoll.c           |  172 +++++++++++++++++++++++++++
 6 files changed, 327 insertions(+), 1 deletion(-)

Index: linux-2.6.git/Documentation/virtual/guest-halt-polling.txt
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.git/Documentation/virtual/guest-halt-polling.txt	2019-06-03 19:31:36.003302371 -0300
@@ -0,0 +1,78 @@
+Guest halt polling
+==================
+
+The cpuidle_haltpoll driver allows the guest vcpus to poll for a specified
+amount of time before halting. This provides the following benefits
+to host side polling:
+
+	1) The POLL flag is set while polling is performed, which allows
+	   a remote vCPU to avoid sending an IPI (and the associated
+ 	   cost of handling the IPI) when performing a wakeup.
+
+	2) The HLT VM-exit cost can be avoided.
+
+The downside of guest side polling is that polling is performed
+even with other runnable tasks in the host.
+
+The basic logic as follows: A global value, guest_halt_poll_ns,
+is configured by the user, indicating the maximum amount of
+time polling is allowed. This value is fixed.
+
+Each vcpu has an adjustable guest_halt_poll_ns
+("per-cpu guest_halt_poll_ns"), which is adjusted by the algorithm
+in response to events (explained below).
+
+Module Parameters
+=================
+
+The cpuidle_haltpoll module has 5 tunable module parameters:
+
+1) guest_halt_poll_ns:
+Maximum amount of time, in nanoseconds, that polling is
+performed before halting.
+
+Default: 200000
+
+2) guest_halt_poll_shrink:
+Division factor used to shrink per-cpu guest_halt_poll_ns when
+wakeup event occurs after the global guest_halt_poll_ns.
+
+Default: 2
+
+3) guest_halt_poll_grow:
+Multiplication factor used to grow per-cpu guest_halt_poll_ns
+when event occurs after per-cpu guest_halt_poll_ns
+but before global guest_halt_poll_ns.
+
+Default: 2
+
+4) guest_halt_poll_grow_start:
+The per-cpu guest_halt_poll_ns eventually reaches zero
+in case of an idle system. This value sets the initial
+per-cpu guest_halt_poll_ns when growing. This can
+be increased from 10000, to avoid misses during the initial
+growth stage:
+
+10000, 20000, 40000, ... (example assumes guest_halt_poll_grow=2).
+
+Default: 10000
+
+5) guest_halt_poll_allow_shrink:
+
+Bool parameter which allows shrinking. Set to N
+to avoid it (per-cpu guest_halt_poll_ns will remain
+high once achieves global guest_halt_poll_ns value).
+
+Default: Y
+
+The module parameters can be set from the debugfs files in:
+
+	/sys/module/cpuidle_haltpoll/parameters/
+
+Further Notes
+=============
+
+- Care should be taken when setting the guest_halt_poll_ns parameter as a
+large value has the potential to drive the cpu usage to 100% on a machine which
+would be almost entirely idle otherwise.
+
Index: linux-2.6.git/arch/x86/kernel/process.c
===================================================================
--- linux-2.6.git.orig/arch/x86/kernel/process.c	2019-05-29 14:46:14.527005582 -0300
+++ linux-2.6.git/arch/x86/kernel/process.c	2019-06-03 19:31:36.004302375 -0300
@@ -580,7 +580,7 @@
 	safe_halt();
 	trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id());
 }
-#ifdef CONFIG_APM_MODULE
+#if defined(CONFIG_APM_MODULE) || defined(CONFIG_HALTPOLL_CPUIDLE_MODULE)
 EXPORT_SYMBOL(default_idle);
 #endif
 
Index: linux-2.6.git/drivers/cpuidle/Kconfig
===================================================================
--- linux-2.6.git.orig/drivers/cpuidle/Kconfig	2019-05-29 14:46:14.668006053 -0300
+++ linux-2.6.git/drivers/cpuidle/Kconfig	2019-06-03 19:31:36.004302375 -0300
@@ -51,6 +51,16 @@
 source "drivers/cpuidle/Kconfig.powerpc"
 endmenu
 
+config HALTPOLL_CPUIDLE
+       tristate "Halt poll cpuidle driver"
+       depends on X86
+       default y
+       help
+         This option enables halt poll cpuidle driver, which allows to poll
+         before halting in the guest (more efficient than polling in the
+         host via halt_poll_ns for some scenarios).
+
+
 endif
 
 config ARCH_NEEDS_CPU_IDLE_COUPLED
Index: linux-2.6.git/drivers/cpuidle/Makefile
===================================================================
--- linux-2.6.git.orig/drivers/cpuidle/Makefile	2019-05-29 14:44:43.030700871 -0300
+++ linux-2.6.git/drivers/cpuidle/Makefile	2019-06-03 19:31:36.004302375 -0300
@@ -7,6 +7,7 @@
 obj-$(CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED) += coupled.o
 obj-$(CONFIG_DT_IDLE_STATES)		  += dt_idle_states.o
 obj-$(CONFIG_ARCH_HAS_CPU_RELAX)	  += poll_state.o
+obj-$(CONFIG_HALTPOLL_CPUIDLE)		  += cpuidle-haltpoll.o
 
 ##################################################################################
 # ARM SoC drivers
Index: linux-2.6.git/drivers/cpuidle/cpuidle-haltpoll-trace.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.git/drivers/cpuidle/cpuidle-haltpoll-trace.h	2019-06-03 19:31:36.005302378 -0300
@@ -0,0 +1,65 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#if !defined(_HALTPOLL_TRACE_H_) || defined(TRACE_HEADER_MULTI_READ)
+#define _HALTPOLL_TRACE_H_
+
+#include <linux/stringify.h>
+#include <linux/types.h>
+#include <linux/tracepoint.h>
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM cpuidle_haltpoll
+
+TRACE_EVENT(cpuidle_haltpoll_success,
+	    TP_PROTO(unsigned int cpu_halt_poll_ns, u64 block_ns),
+	    TP_ARGS(cpu_halt_poll_ns, block_ns),
+
+	    TP_STRUCT__entry(
+			     __field(unsigned int, cpu_halt_poll_ns)
+			     __field(u64,	   block_ns)
+			    ),
+
+	    TP_fast_assign(
+			   __entry->cpu_halt_poll_ns = cpu_halt_poll_ns;
+			   __entry->block_ns = block_ns;
+			  ),
+
+	    TP_printk("cpu_halt_poll_ns %u block_ns %lld",
+			  __entry->cpu_halt_poll_ns,
+			  __entry->block_ns)
+);
+
+TRACE_EVENT(cpuidle_haltpoll_fail,
+		TP_PROTO(unsigned int prev_cpu_halt_poll_ns,
+			 unsigned int cpu_halt_poll_ns,
+			 u64 block_ns),
+		TP_ARGS(prev_cpu_halt_poll_ns, cpu_halt_poll_ns, block_ns),
+
+		TP_STRUCT__entry(
+				__field(unsigned int, prev_cpu_halt_poll_ns)
+				__field(unsigned int, cpu_halt_poll_ns)
+				__field(u64,	   block_ns)
+				),
+
+		TP_fast_assign(
+			      __entry->prev_cpu_halt_poll_ns =
+					prev_cpu_halt_poll_ns;
+			      __entry->cpu_halt_poll_ns = cpu_halt_poll_ns;
+			      __entry->block_ns = block_ns;
+			      ),
+
+		TP_printk("prev_cpu_halt_poll_ns %u cpu_halt_poll_ns %u block_ns %lld",
+			      __entry->prev_cpu_halt_poll_ns,
+			      __entry->cpu_halt_poll_ns,
+			      __entry->block_ns)
+);
+
+#endif /* _HALTPOLL_TRACE_H_ */
+
+/* This part must be outside protection */
+#undef TRACE_INCLUDE_PATH
+#define TRACE_INCLUDE_PATH ../../drivers/cpuidle/
+#undef TRACE_INCLUDE_FILE
+#define TRACE_INCLUDE_FILE cpuidle-haltpoll-trace
+#include <trace/define_trace.h>
+
+
Index: linux-2.6.git/drivers/cpuidle/cpuidle-haltpoll.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.git/drivers/cpuidle/cpuidle-haltpoll.c	2019-06-03 19:31:36.005302378 -0300
@@ -0,0 +1,172 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * cpuidle driver for halt polling.
+ *
+ * Copyright 2019 Red Hat, Inc. and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ * Authors: Marcelo Tosatti <mtosatti@redhat.com>
+ */
+
+#include <linux/init.h>
+#include <linux/cpuidle.h>
+#include <linux/module.h>
+#include <linux/timekeeping.h>
+#include <linux/sched/idle.h>
+#define CREATE_TRACE_POINTS
+#include "cpuidle-haltpoll-trace.h"
+
+unsigned int guest_halt_poll_ns = 200000;
+module_param(guest_halt_poll_ns, uint, 0644);
+
+/* division factor to shrink halt_poll_ns */
+unsigned int guest_halt_poll_shrink = 2;
+module_param(guest_halt_poll_shrink, uint, 0644);
+
+/* multiplication factor to grow per-cpu halt_poll_ns */
+unsigned int guest_halt_poll_grow = 2;
+module_param(guest_halt_poll_grow, uint, 0644);
+
+/* value in ns to start growing per-cpu halt_poll_ns */
+unsigned int guest_halt_poll_grow_start = 10000;
+module_param(guest_halt_poll_grow_start, uint, 0644);
+
+/* value in ns to start growing per-cpu halt_poll_ns */
+bool guest_halt_poll_allow_shrink = true;
+module_param(guest_halt_poll_allow_shrink, bool, 0644);
+
+static DEFINE_PER_CPU(unsigned int, halt_poll_ns);
+
+static void adjust_haltpoll_ns(unsigned int block_ns,
+			       unsigned int *cpu_halt_poll_ns)
+{
+	unsigned int val;
+	unsigned int prev_halt_poll_ns = *cpu_halt_poll_ns;
+
+	/* Grow cpu_halt_poll_ns if
+	 * cpu_halt_poll_ns < block_ns < guest_halt_poll_ns
+	 */
+	if (block_ns > *cpu_halt_poll_ns && block_ns <= guest_halt_poll_ns) {
+		val = *cpu_halt_poll_ns * guest_halt_poll_grow;
+
+		if (val < guest_halt_poll_grow_start)
+			val = guest_halt_poll_grow_start;
+		if (val > guest_halt_poll_ns)
+			val = guest_halt_poll_ns;
+
+		*cpu_halt_poll_ns = val;
+	} else if (block_ns > guest_halt_poll_ns &&
+		   guest_halt_poll_allow_shrink) {
+		unsigned int shrink = guest_halt_poll_shrink;
+
+		val = *cpu_halt_poll_ns;
+		if (shrink == 0)
+			val = 0;
+		else
+			val /= shrink;
+		*cpu_halt_poll_ns = val;
+	}
+
+	trace_cpuidle_haltpoll_fail(prev_halt_poll_ns, *cpu_halt_poll_ns,
+				    block_ns);
+}
+
+static int haltpoll_enter_idle(struct cpuidle_device *dev,
+			  struct cpuidle_driver *drv, int index)
+{
+	int do_halt = 0;
+	unsigned int *cpu_halt_poll_ns;
+	ktime_t start, now;
+	int cpu = smp_processor_id();
+
+	cpu_halt_poll_ns = per_cpu_ptr(&halt_poll_ns, cpu);
+
+	/* No polling */
+	if (guest_halt_poll_ns == 0) {
+		if (current_clr_polling_and_test()) {
+			local_irq_enable();
+			return index;
+		}
+		default_idle();
+		return index;
+	}
+
+	local_irq_enable();
+
+	now = start = ktime_get();
+	if (!current_set_polling_and_test()) {
+		ktime_t end_spin;
+
+		end_spin = ktime_add_ns(now, *cpu_halt_poll_ns);
+
+		while (!need_resched()) {
+			cpu_relax();
+			now = ktime_get();
+
+			if (!ktime_before(now, end_spin)) {
+				do_halt = 1;
+				break;
+			}
+		}
+	}
+
+	if (do_halt) {
+		u64 block_ns;
+
+		/*
+		 * No events while busy spin window passed,
+		 * halt.
+		 */
+		local_irq_disable();
+		if (current_clr_polling_and_test()) {
+			local_irq_enable();
+			return index;
+		}
+		default_idle();
+		block_ns = ktime_to_ns(ktime_sub(ktime_get(), start));
+		adjust_haltpoll_ns(block_ns, cpu_halt_poll_ns);
+	} else {
+		u64 block_ns = ktime_to_ns(ktime_sub(now, start));
+
+		trace_cpuidle_haltpoll_success(*cpu_halt_poll_ns, block_ns);
+		current_clr_polling();
+	}
+
+	return index;
+}
+
+static struct cpuidle_driver haltpoll_driver = {
+	.name = "haltpoll_idle",
+	.owner = THIS_MODULE,
+	.states = {
+		{ /* entry 0 is for polling */ },
+		{
+			.enter			= haltpoll_enter_idle,
+			.exit_latency		= 0,
+			.target_residency	= 0,
+			.power_usage		= -1,
+			.name			= "Halt poll",
+			.desc			= "Halt poll idle",
+		},
+	},
+	.safe_state_index = 0,
+	.state_count = 2,
+};
+
+static int __init haltpoll_init(void)
+{
+	return cpuidle_register(&haltpoll_driver, NULL);
+}
+
+static void __exit haltpoll_exit(void)
+{
+	cpuidle_unregister(&haltpoll_driver);
+}
+
+module_init(haltpoll_init);
+module_exit(haltpoll_exit);
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Marcelo Tosatti <mtosatti@redhat.com>");
+



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [patch 2/3] kvm: x86: add host poll control msrs
  2019-06-03 22:52 [patch 0/3] cpuidle-haltpoll driver (v2) Marcelo Tosatti
  2019-06-03 22:52 ` [patch 1/3] drivers/cpuidle: add cpuidle-haltpoll driver Marcelo Tosatti
@ 2019-06-03 22:52 ` Marcelo Tosatti
  2019-06-06 12:04   ` Paolo Bonzini
  2019-06-03 22:52 ` [patch 3/3] cpuidle-haltpoll: disable host side polling when kvm virtualized Marcelo Tosatti
  2019-06-07  9:49 ` [patch 0/3] cpuidle-haltpoll driver (v2) Rafael J. Wysocki
  3 siblings, 1 reply; 29+ messages in thread
From: Marcelo Tosatti @ 2019-06-03 22:52 UTC (permalink / raw)
  To: kvm-devel
  Cc: Paolo Bonzini, Radim Krčmář,
	Andrea Arcangeli, Rafael J. Wysocki, Peter Zijlstra, Wanpeng Li,
	Konrad Rzeszutek Wilk, Raslan KarimAllah, Boris Ostrovsky,
	Ankur Arora, Christian Borntraeger, Marcelo Tosatti

[-- Attachment #1: 02-pollcontrol-host.patch --]
[-- Type: text/plain, Size: 5255 bytes --]

Add an MSRs which allows the guest to disable 
host polling (specifically the cpuidle-haltpoll, 
when performing polling in the guest, disables
host side polling).

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

---
 Documentation/virtual/kvm/msr.txt    |    9 +++++++++
 arch/x86/include/asm/kvm_host.h      |    2 ++
 arch/x86/include/uapi/asm/kvm_para.h |    2 ++
 arch/x86/kvm/Kconfig                 |    1 +
 arch/x86/kvm/cpuid.c                 |    3 ++-
 arch/x86/kvm/x86.c                   |   23 +++++++++++++++++++++++
 6 files changed, 39 insertions(+), 1 deletion(-)

Index: linux-2.6.git/Documentation/virtual/kvm/msr.txt
===================================================================
--- linux-2.6.git.orig/Documentation/virtual/kvm/msr.txt	2018-05-18 15:40:19.697438928 -0300
+++ linux-2.6.git/Documentation/virtual/kvm/msr.txt	2019-06-03 19:37:49.618543527 -0300
@@ -273,3 +273,12 @@
 	guest must both read the least significant bit in the memory area and
 	clear it using a single CPU instruction, such as test and clear, or
 	compare and exchange.
+
+MSR_KVM_POLL_CONTROL: 0x4b564d05
+	Control host side polling.
+
+	data: Bit 0 enables (1) or disables (0) host halt poll
+	logic.
+	KVM guests can disable host halt polling when performing
+	polling themselves.
+
Index: linux-2.6.git/arch/x86/include/asm/kvm_host.h
===================================================================
--- linux-2.6.git.orig/arch/x86/include/asm/kvm_host.h	2019-05-29 14:46:14.516005546 -0300
+++ linux-2.6.git/arch/x86/include/asm/kvm_host.h	2019-06-03 19:37:49.619543530 -0300
@@ -755,6 +755,8 @@
 		struct gfn_to_hva_cache data;
 	} pv_eoi;
 
+	u64 msr_kvm_poll_control;
+
 	/*
 	 * Indicate whether the access faults on its page table in guest
 	 * which is set when fix page fault and used to detect unhandeable
Index: linux-2.6.git/arch/x86/include/uapi/asm/kvm_para.h
===================================================================
--- linux-2.6.git.orig/arch/x86/include/uapi/asm/kvm_para.h	2019-01-04 12:07:15.936947406 -0200
+++ linux-2.6.git/arch/x86/include/uapi/asm/kvm_para.h	2019-06-03 19:37:49.620543534 -0300
@@ -29,6 +29,7 @@
 #define KVM_FEATURE_PV_TLB_FLUSH	9
 #define KVM_FEATURE_ASYNC_PF_VMEXIT	10
 #define KVM_FEATURE_PV_SEND_IPI	11
+#define KVM_FEATURE_POLL_CONTROL	12
 
 #define KVM_HINTS_REALTIME      0
 
@@ -47,6 +48,7 @@
 #define MSR_KVM_ASYNC_PF_EN 0x4b564d02
 #define MSR_KVM_STEAL_TIME  0x4b564d03
 #define MSR_KVM_PV_EOI_EN      0x4b564d04
+#define MSR_KVM_POLL_CONTROL	0x4b564d05
 
 struct kvm_steal_time {
 	__u64 steal;
Index: linux-2.6.git/arch/x86/kvm/Kconfig
===================================================================
--- linux-2.6.git.orig/arch/x86/kvm/Kconfig	2019-05-29 14:46:14.530005593 -0300
+++ linux-2.6.git/arch/x86/kvm/Kconfig	2019-06-03 19:37:49.620543534 -0300
@@ -41,6 +41,7 @@
 	select PERF_EVENTS
 	select HAVE_KVM_MSI
 	select HAVE_KVM_CPU_RELAX_INTERCEPT
+	select HAVE_KVM_NO_POLL
 	select KVM_GENERIC_DIRTYLOG_READ_PROTECT
 	select KVM_VFIO
 	select SRCU
Index: linux-2.6.git/arch/x86/kvm/cpuid.c
===================================================================
--- linux-2.6.git.orig/arch/x86/kvm/cpuid.c	2019-05-29 14:46:14.530005593 -0300
+++ linux-2.6.git/arch/x86/kvm/cpuid.c	2019-06-03 19:37:49.621543537 -0300
@@ -643,7 +643,8 @@
 			     (1 << KVM_FEATURE_PV_UNHALT) |
 			     (1 << KVM_FEATURE_PV_TLB_FLUSH) |
 			     (1 << KVM_FEATURE_ASYNC_PF_VMEXIT) |
-			     (1 << KVM_FEATURE_PV_SEND_IPI);
+			     (1 << KVM_FEATURE_PV_SEND_IPI) |
+			     (1 << KVM_FEATURE_POLL_CONTROL);
 
 		if (sched_info_on())
 			entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
Index: linux-2.6.git/arch/x86/kvm/x86.c
===================================================================
--- linux-2.6.git.orig/arch/x86/kvm/x86.c	2019-05-29 14:46:14.537005616 -0300
+++ linux-2.6.git/arch/x86/kvm/x86.c	2019-06-03 19:37:49.624543547 -0300
@@ -1177,6 +1177,7 @@
 	MSR_IA32_POWER_CTL,
 
 	MSR_K7_HWCR,
+	MSR_KVM_POLL_CONTROL,
 };
 
 static unsigned num_emulated_msrs;
@@ -2628,6 +2629,14 @@
 			return 1;
 		break;
 
+	case MSR_KVM_POLL_CONTROL:
+		/* only enable bit supported */
+		if (data & (-1ULL << 1))
+			return 1;
+
+		vcpu->arch.msr_kvm_poll_control = data;
+		break;
+
 	case MSR_IA32_MCG_CTL:
 	case MSR_IA32_MCG_STATUS:
 	case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1:
@@ -2877,6 +2886,9 @@
 	case MSR_KVM_PV_EOI_EN:
 		msr_info->data = vcpu->arch.pv_eoi.msr_val;
 		break;
+	case MSR_KVM_POLL_CONTROL:
+		msr_info->data = vcpu->arch.msr_kvm_poll_control;
+		break;
 	case MSR_IA32_P5_MC_ADDR:
 	case MSR_IA32_P5_MC_TYPE:
 	case MSR_IA32_MCG_CAP:
@@ -8874,6 +8886,10 @@
 	msr.host_initiated = true;
 	kvm_write_tsc(vcpu, &msr);
 	vcpu_put(vcpu);
+
+	/* poll control enabled by default */
+	vcpu->arch.msr_kvm_poll_control = 1;
+
 	mutex_unlock(&vcpu->mutex);
 
 	if (!kvmclock_periodic_sync)
@@ -9948,6 +9964,13 @@
 }
 EXPORT_SYMBOL_GPL(kvm_vector_hashing_enabled);
 
+bool kvm_arch_no_poll(struct kvm_vcpu *vcpu)
+{
+	return (vcpu->arch.msr_kvm_poll_control & 1) == 0;
+}
+EXPORT_SYMBOL_GPL(kvm_arch_no_poll);
+
+
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_fast_mmio);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [patch 3/3] cpuidle-haltpoll: disable host side polling when kvm virtualized
  2019-06-03 22:52 [patch 0/3] cpuidle-haltpoll driver (v2) Marcelo Tosatti
  2019-06-03 22:52 ` [patch 1/3] drivers/cpuidle: add cpuidle-haltpoll driver Marcelo Tosatti
  2019-06-03 22:52 ` [patch 2/3] kvm: x86: add host poll control msrs Marcelo Tosatti
@ 2019-06-03 22:52 ` Marcelo Tosatti
  2019-06-04  1:26   ` kbuild test robot
  2019-06-04 12:24   ` [patch v2 " Marcelo Tosatti
  2019-06-07  9:49 ` [patch 0/3] cpuidle-haltpoll driver (v2) Rafael J. Wysocki
  3 siblings, 2 replies; 29+ messages in thread
From: Marcelo Tosatti @ 2019-06-03 22:52 UTC (permalink / raw)
  To: kvm-devel
  Cc: Paolo Bonzini, Radim Krčmář,
	Andrea Arcangeli, Rafael J. Wysocki, Peter Zijlstra, Wanpeng Li,
	Konrad Rzeszutek Wilk, Raslan KarimAllah, Boris Ostrovsky,
	Ankur Arora, Christian Borntraeger, Marcelo Tosatti

[-- Attachment #1: 03-pollcontrol-guest.patch --]
[-- Type: text/plain, Size: 4652 bytes --]

When performing guest side polling, it is not necessary to 
also perform host side polling. 

So disable host side polling, via the new MSR interface, 
when loading cpuidle-haltpoll driver.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

---
 arch/x86/Kconfig                        |    7 +++++
 arch/x86/include/asm/cpuidle_haltpoll.h |    8 ++++++
 arch/x86/kernel/kvm.c                   |   40 ++++++++++++++++++++++++++++++++
 drivers/cpuidle/cpuidle-haltpoll.c      |    9 ++++++-
 include/linux/cpuidle_haltpoll.h        |   16 ++++++++++++
 5 files changed, 79 insertions(+), 1 deletion(-)

Index: linux-2.6.git/arch/x86/include/asm/cpuidle_haltpoll.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.git/arch/x86/include/asm/cpuidle_haltpoll.h	2019-06-03 19:38:42.328718617 -0300
@@ -0,0 +1,8 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ARCH_HALTPOLL_H
+#define _ARCH_HALTPOLL_H
+
+void arch_haltpoll_enable(void);
+void arch_haltpoll_disable(void);
+
+#endif
Index: linux-2.6.git/drivers/cpuidle/cpuidle-haltpoll.c
===================================================================
--- linux-2.6.git.orig/drivers/cpuidle/cpuidle-haltpoll.c	2019-06-03 19:38:12.376619124 -0300
+++ linux-2.6.git/drivers/cpuidle/cpuidle-haltpoll.c	2019-06-03 19:38:42.328718617 -0300
@@ -15,6 +15,7 @@
 #include <linux/module.h>
 #include <linux/timekeeping.h>
 #include <linux/sched/idle.h>
+#include <linux/cpuidle_haltpoll.h>
 #define CREATE_TRACE_POINTS
 #include "cpuidle-haltpoll-trace.h"
 
@@ -157,11 +158,17 @@
 
 static int __init haltpoll_init(void)
 {
-	return cpuidle_register(&haltpoll_driver, NULL);
+	int ret = cpuidle_register(&haltpoll_driver, NULL);
+
+	if (ret == 0)
+		arch_haltpoll_enable();
+
+	return ret;
 }
 
 static void __exit haltpoll_exit(void)
 {
+	arch_haltpoll_disable();
 	cpuidle_unregister(&haltpoll_driver);
 }
 
Index: linux-2.6.git/include/linux/cpuidle_haltpoll.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.git/include/linux/cpuidle_haltpoll.h	2019-06-03 19:41:57.293366260 -0300
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _CPUIDLE_HALTPOLL_H
+#define _CPUIDLE_HALTPOLL_H
+
+#ifdef CONFIG_ARCH_CPUIDLE_HALTPOLL
+#include <asm/cpuidle_haltpoll.h>
+#else
+static inline void arch_haltpoll_enable(void)
+{
+}
+
+static inline void arch_haltpoll_disable(void)
+{
+}
+#endif
+#endif
Index: linux-2.6.git/arch/x86/Kconfig
===================================================================
--- linux-2.6.git.orig/arch/x86/Kconfig	2019-06-03 19:38:12.376619124 -0300
+++ linux-2.6.git/arch/x86/Kconfig	2019-06-03 19:42:34.478489868 -0300
@@ -787,6 +787,7 @@
 	bool "KVM Guest support (including kvmclock)"
 	depends on PARAVIRT
 	select PARAVIRT_CLOCK
+	select ARCH_CPUIDLE_HALTPOLL
 	default y
 	---help---
 	  This option enables various optimizations for running under the KVM
@@ -795,6 +796,12 @@
 	  underlying device model, the host provides the guest with
 	  timing infrastructure such as time of day, and system time
 
+config ARCH_CPUIDLE_HALTPOLL
+        def_bool n
+        prompt "Disable host haltpoll when loading haltpoll driver"
+        help
+	  If virtualized under KVM, disable host haltpoll.
+
 config PVH
 	bool "Support for running PVH guests"
 	---help---
Index: linux-2.6.git/arch/x86/kernel/kvm.c
===================================================================
--- linux-2.6.git.orig/arch/x86/kernel/kvm.c	2019-06-03 19:38:12.376619124 -0300
+++ linux-2.6.git/arch/x86/kernel/kvm.c	2019-06-03 19:40:14.359024312 -0300
@@ -853,3 +853,43 @@
 }
 
 #endif	/* CONFIG_PARAVIRT_SPINLOCKS */
+
+#ifdef CONFIG_ARCH_CPUIDLE_HALTPOLL
+
+void kvm_disable_host_haltpoll(void *i)
+{
+	wrmsrl(MSR_KVM_POLL_CONTROL, 0);
+}
+
+void kvm_enable_host_haltpoll(void *i)
+{
+	wrmsrl(MSR_KVM_POLL_CONTROL, 1);
+}
+
+void arch_haltpoll_enable(void)
+{
+	if (!kvm_para_has_feature(KVM_FEATURE_POLL_CONTROL))
+		return;
+
+	preempt_disable();
+	/* Enabling guest halt poll disables host halt poll */
+	kvm_disable_host_haltpoll(NULL);
+	smp_call_function(kvm_disable_host_haltpoll, NULL, 1);
+	preempt_enable();
+}
+EXPORT_SYMBOL_GPL(arch_haltpoll_enable);
+
+void arch_haltpoll_disable(void)
+{
+	if (!kvm_para_has_feature(KVM_FEATURE_POLL_CONTROL))
+		return;
+
+	preempt_disable();
+	/* Enabling guest halt poll disables host halt poll */
+	kvm_enable_host_haltpoll(NULL);
+	smp_call_function(kvm_enable_host_haltpoll, NULL, 1);
+	preempt_enable();
+}
+}
+EXPORT_SYMBOL_GPL(arch_haltpoll_disable);
+#endif



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch 3/3] cpuidle-haltpoll: disable host side polling when kvm virtualized
  2019-06-03 22:52 ` [patch 3/3] cpuidle-haltpoll: disable host side polling when kvm virtualized Marcelo Tosatti
@ 2019-06-04  1:26   ` kbuild test robot
  2019-06-04 12:24   ` [patch v2 " Marcelo Tosatti
  1 sibling, 0 replies; 29+ messages in thread
From: kbuild test robot @ 2019-06-04  1:26 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: kbuild-all, kvm-devel, Paolo Bonzini,
	Radim Krčmář,
	Andrea Arcangeli, Rafael J. Wysocki, Peter Zijlstra, Wanpeng Li,
	Konrad Rzeszutek Wilk, Raslan KarimAllah, Boris Ostrovsky,
	Ankur Arora, Christian Borntraeger, Marcelo Tosatti

[-- Attachment #1: Type: text/plain, Size: 1419 bytes --]

Hi Marcelo,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on kvm/linux-next]
[also build test ERROR on v5.2-rc3 next-20190603]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Marcelo-Tosatti/cpuidle-haltpoll-driver-v2/20190604-083002
base:   https://git.kernel.org/pub/scm/virt/kvm/kvm.git linux-next
config: x86_64-allyesconfig (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> arch/x86//kernel/kvm.c:906:1: error: expected identifier or '(' before '}' token
    }
    ^

vim +906 arch/x86//kernel/kvm.c

   894	
   895	void arch_haltpoll_disable(void)
   896	{
   897		if (!kvm_para_has_feature(KVM_FEATURE_POLL_CONTROL))
   898			return;
   899	
   900		preempt_disable();
   901		/* Enabling guest halt poll disables host halt poll */
   902		kvm_enable_host_haltpoll(NULL);
   903		smp_call_function(kvm_enable_host_haltpoll, NULL, 1);
   904		preempt_enable();
   905	}
 > 906	}

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 68871 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [patch v2 3/3] cpuidle-haltpoll: disable host side polling when kvm virtualized
  2019-06-03 22:52 ` [patch 3/3] cpuidle-haltpoll: disable host side polling when kvm virtualized Marcelo Tosatti
  2019-06-04  1:26   ` kbuild test robot
@ 2019-06-04 12:24   ` Marcelo Tosatti
  2019-06-06 18:25     ` Joao Martins
  1 sibling, 1 reply; 29+ messages in thread
From: Marcelo Tosatti @ 2019-06-04 12:24 UTC (permalink / raw)
  To: kvm-devel
  Cc: Paolo Bonzini, Radim Krcmar, Andrea Arcangeli, Rafael J. Wysocki,
	Peter Zijlstra, Wanpeng Li, Konrad Rzeszutek Wilk,
	Raslan KarimAllah, Boris Ostrovsky, Ankur Arora,
	Christian Borntraeger


When performing guest side polling, it is not necessary to 
also perform host side polling. 

So disable host side polling, via the new MSR interface, 
when loading cpuidle-haltpoll driver.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

---

v2: remove extra "}"

 arch/x86/Kconfig                        |    7 +++++
 arch/x86/include/asm/cpuidle_haltpoll.h |    8 ++++++
 arch/x86/kernel/kvm.c                   |   40 ++++++++++++++++++++++++++++++++
 drivers/cpuidle/cpuidle-haltpoll.c      |    9 ++++++-
 include/linux/cpuidle_haltpoll.h        |   16 ++++++++++++
 5 files changed, 79 insertions(+), 1 deletion(-)

Index: linux-2.6.git/arch/x86/include/asm/cpuidle_haltpoll.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.git/arch/x86/include/asm/cpuidle_haltpoll.h	2019-06-03 19:38:42.328718617 -0300
@@ -0,0 +1,8 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ARCH_HALTPOLL_H
+#define _ARCH_HALTPOLL_H
+
+void arch_haltpoll_enable(void);
+void arch_haltpoll_disable(void);
+
+#endif
Index: linux-2.6.git/drivers/cpuidle/cpuidle-haltpoll.c
===================================================================
--- linux-2.6.git.orig/drivers/cpuidle/cpuidle-haltpoll.c	2019-06-03 19:38:12.376619124 -0300
+++ linux-2.6.git/drivers/cpuidle/cpuidle-haltpoll.c	2019-06-03 19:38:42.328718617 -0300
@@ -15,6 +15,7 @@
 #include <linux/module.h>
 #include <linux/timekeeping.h>
 #include <linux/sched/idle.h>
+#include <linux/cpuidle_haltpoll.h>
 #define CREATE_TRACE_POINTS
 #include "cpuidle-haltpoll-trace.h"
 
@@ -157,11 +158,17 @@
 
 static int __init haltpoll_init(void)
 {
-	return cpuidle_register(&haltpoll_driver, NULL);
+	int ret = cpuidle_register(&haltpoll_driver, NULL);
+
+	if (ret == 0)
+		arch_haltpoll_enable();
+
+	return ret;
 }
 
 static void __exit haltpoll_exit(void)
 {
+	arch_haltpoll_disable();
 	cpuidle_unregister(&haltpoll_driver);
 }
 
Index: linux-2.6.git/include/linux/cpuidle_haltpoll.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.git/include/linux/cpuidle_haltpoll.h	2019-06-03 19:41:57.293366260 -0300
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _CPUIDLE_HALTPOLL_H
+#define _CPUIDLE_HALTPOLL_H
+
+#ifdef CONFIG_ARCH_CPUIDLE_HALTPOLL
+#include <asm/cpuidle_haltpoll.h>
+#else
+static inline void arch_haltpoll_enable(void)
+{
+}
+
+static inline void arch_haltpoll_disable(void)
+{
+}
+#endif
+#endif
Index: linux-2.6.git/arch/x86/Kconfig
===================================================================
--- linux-2.6.git.orig/arch/x86/Kconfig	2019-06-03 19:38:12.376619124 -0300
+++ linux-2.6.git/arch/x86/Kconfig	2019-06-03 19:42:34.478489868 -0300
@@ -787,6 +787,7 @@
 	bool "KVM Guest support (including kvmclock)"
 	depends on PARAVIRT
 	select PARAVIRT_CLOCK
+	select ARCH_CPUIDLE_HALTPOLL
 	default y
 	---help---
 	  This option enables various optimizations for running under the KVM
@@ -795,6 +796,12 @@
 	  underlying device model, the host provides the guest with
 	  timing infrastructure such as time of day, and system time
 
+config ARCH_CPUIDLE_HALTPOLL
+        def_bool n
+        prompt "Disable host haltpoll when loading haltpoll driver"
+        help
+	  If virtualized under KVM, disable host haltpoll.
+
 config PVH
 	bool "Support for running PVH guests"
 	---help---
Index: linux-2.6.git/arch/x86/kernel/kvm.c
===================================================================
--- linux-2.6.git.orig/arch/x86/kernel/kvm.c	2019-06-03 19:38:12.376619124 -0300
+++ linux-2.6.git/arch/x86/kernel/kvm.c	2019-06-03 19:40:14.359024312 -0300
@@ -853,3 +853,43 @@
 }
 
 #endif	/* CONFIG_PARAVIRT_SPINLOCKS */
+
+#ifdef CONFIG_ARCH_CPUIDLE_HALTPOLL
+
+void kvm_disable_host_haltpoll(void *i)
+{
+	wrmsrl(MSR_KVM_POLL_CONTROL, 0);
+}
+
+void kvm_enable_host_haltpoll(void *i)
+{
+	wrmsrl(MSR_KVM_POLL_CONTROL, 1);
+}
+
+void arch_haltpoll_enable(void)
+{
+	if (!kvm_para_has_feature(KVM_FEATURE_POLL_CONTROL))
+		return;
+
+	preempt_disable();
+	/* Enabling guest halt poll disables host halt poll */
+	kvm_disable_host_haltpoll(NULL);
+	smp_call_function(kvm_disable_host_haltpoll, NULL, 1);
+	preempt_enable();
+}
+EXPORT_SYMBOL_GPL(arch_haltpoll_enable);
+
+void arch_haltpoll_disable(void)
+{
+	if (!kvm_para_has_feature(KVM_FEATURE_POLL_CONTROL))
+		return;
+
+	preempt_disable();
+	/* Enabling guest halt poll disables host halt poll */
+	kvm_enable_host_haltpoll(NULL);
+	smp_call_function(kvm_enable_host_haltpoll, NULL, 1);
+	preempt_enable();
+}
+
+EXPORT_SYMBOL_GPL(arch_haltpoll_disable);
+#endif


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch 1/3] drivers/cpuidle: add cpuidle-haltpoll driver
  2019-06-03 22:52 ` [patch 1/3] drivers/cpuidle: add cpuidle-haltpoll driver Marcelo Tosatti
@ 2019-06-05  8:16   ` Ankur Arora
  2019-06-06 17:51   ` Andrea Arcangeli
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 29+ messages in thread
From: Ankur Arora @ 2019-06-05  8:16 UTC (permalink / raw)
  To: Marcelo Tosatti, kvm-devel
  Cc: Paolo Bonzini, Radim Krčmář,
	Andrea Arcangeli, Rafael J. Wysocki, Peter Zijlstra, Wanpeng Li,
	Konrad Rzeszutek Wilk, Raslan KarimAllah, Boris Ostrovsky,
	Christian Borntraeger

On 2019-06-03 3:52 p.m., Marcelo Tosatti wrote:
> The cpuidle_kvm driver allows the guest vcpus to poll for a specified
> amount of time before halting. This provides the following benefits
> to host side polling:
> 
>          1) The POLL flag is set while polling is performed, which allows
>             a remote vCPU to avoid sending an IPI (and the associated
>             cost of handling the IPI) when performing a wakeup.
> 
>          2) The HLT VM-exit cost can be avoided.
> 
> The downside of guest side polling is that polling is performed
> even with other runnable tasks in the host.
> 
> Results comparing halt_poll_ns and server/client application
> where a small packet is ping-ponged:
> 
> host                                        --> 31.33
> halt_poll_ns=300000 / no guest busy spin    --> 33.40   (93.8%)
> halt_poll_ns=0 / guest_halt_poll_ns=300000  --> 32.73   (95.7%)
> 
> For the SAP HANA benchmarks (where idle_spin is a parameter
> of the previous version of the patch, results should be the
> same):
> 
> hpns == halt_poll_ns
> 
>                            idle_spin=0/   idle_spin=800/    idle_spin=0/
>                            hpns=200000    hpns=0            hpns=800000
> DeleteC06T03 (100 thread) 1.76           1.71 (-3%)        1.78   (+1%)
> InsertC16T02 (100 thread) 2.14           2.07 (-3%)        2.18   (+1.8%)
> DeleteC00T01 (1 thread)   1.34           1.28 (-4.5%)	   1.29   (-3.7%)
> UpdateC00T03 (1 thread)   4.72           4.18 (-12%)	   4.53   (-5%)
> 
> ---
>   Documentation/virtual/guest-halt-polling.txt |   78 ++++++++++++
>   arch/x86/kernel/process.c                    |    2
>   drivers/cpuidle/Kconfig                      |   10 +
>   drivers/cpuidle/Makefile                     |    1
>   drivers/cpuidle/cpuidle-haltpoll-trace.h     |   65 ++++++++++
>   drivers/cpuidle/cpuidle-haltpoll.c           |  172 +++++++++++++++++++++++++++
>   6 files changed, 327 insertions(+), 1 deletion(-)
> 
> Index: linux-2.6.git/Documentation/virtual/guest-halt-polling.txt
> ===================================================================
> --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> +++ linux-2.6.git/Documentation/virtual/guest-halt-polling.txt	2019-06-03 19:31:36.003302371 -0300
> @@ -0,0 +1,78 @@
> +Guest halt polling
> +==================
> +
> +The cpuidle_haltpoll driver allows the guest vcpus to poll for a specified
> +amount of time before halting. This provides the following benefits
> +to host side polling:
> +
> +	1) The POLL flag is set while polling is performed, which allows
> +	   a remote vCPU to avoid sending an IPI (and the associated
> + 	   cost of handling the IPI) when performing a wakeup.
> +
> +	2) The HLT VM-exit cost can be avoided.
Nit: s/HLT// (also in the commit message). We might be using
MWAIT passthrough via default_idle_call()

> +
> +The downside of guest side polling is that polling is performed
> +even with other runnable tasks in the host.
> +
> +The basic logic as follows: A global value, guest_halt_poll_ns,
> +is configured by the user, indicating the maximum amount of
> +time polling is allowed. This value is fixed.
> +
> +Each vcpu has an adjustable guest_halt_poll_ns
> +("per-cpu guest_halt_poll_ns"), which is adjusted by the algorithm
I believe the per-cpu variable is halt_poll_ns (below as well.)


> +in response to events (explained below).
> +
> +Module Parameters
> +=================
> +
> +The cpuidle_haltpoll module has 5 tunable module parameters:
> +
> +1) guest_halt_poll_ns:
> +Maximum amount of time, in nanoseconds, that polling is
> +performed before halting.
> +
> +Default: 200000
> +
> +2) guest_halt_poll_shrink:
> +Division factor used to shrink per-cpu guest_halt_poll_ns when
> +wakeup event occurs after the global guest_halt_poll_ns.
> +
> +Default: 2
> +
> +3) guest_halt_poll_grow:
> +Multiplication factor used to grow per-cpu guest_halt_poll_ns
> +when event occurs after per-cpu guest_halt_poll_ns
> +but before global guest_halt_poll_ns.
> +
> +Default: 2
> +
> +4) guest_halt_poll_grow_start:
> +The per-cpu guest_halt_poll_ns eventually reaches zero
> +in case of an idle system. This value sets the initial
> +per-cpu guest_halt_poll_ns when growing. This can
> +be increased from 10000, to avoid misses during the initial
> +growth stage:
> +
> +10000, 20000, 40000, ... (example assumes guest_halt_poll_grow=2).
> +
> +Default: 10000
> +
> +5) guest_halt_poll_allow_shrink:
> +
> +Bool parameter which allows shrinking. Set to N
> +to avoid it (per-cpu guest_halt_poll_ns will remain
> +high once achieves global guest_halt_poll_ns value).
> +
> +Default: Y
> +
> +The module parameters can be set from the debugfs files in:
> +
> +	/sys/module/cpuidle_haltpoll/parameters/
> +
> +Further Notes
> +=============
> +
> +- Care should be taken when setting the guest_halt_poll_ns parameter as a
> +large value has the potential to drive the cpu usage to 100% on a machine which
> +would be almost entirely idle otherwise.
> +
> Index: linux-2.6.git/arch/x86/kernel/process.c
> ===================================================================
> --- linux-2.6.git.orig/arch/x86/kernel/process.c	2019-05-29 14:46:14.527005582 -0300
> +++ linux-2.6.git/arch/x86/kernel/process.c	2019-06-03 19:31:36.004302375 -0300
> @@ -580,7 +580,7 @@
>   	safe_halt();
>   	trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id());
>   }
> -#ifdef CONFIG_APM_MODULE
> +#if defined(CONFIG_APM_MODULE) || defined(CONFIG_HALTPOLL_CPUIDLE_MODULE)
>   EXPORT_SYMBOL(default_idle);
>   #endif
>   
> Index: linux-2.6.git/drivers/cpuidle/Kconfig
> ===================================================================
> --- linux-2.6.git.orig/drivers/cpuidle/Kconfig	2019-05-29 14:46:14.668006053 -0300
> +++ linux-2.6.git/drivers/cpuidle/Kconfig	2019-06-03 19:31:36.004302375 -0300
> @@ -51,6 +51,16 @@
>   source "drivers/cpuidle/Kconfig.powerpc"
>   endmenu
>   
> +config HALTPOLL_CPUIDLE
> +       tristate "Halt poll cpuidle driver"
> +       depends on X86
> +       default y
> +       help
> +         This option enables halt poll cpuidle driver, which allows to poll
> +         before halting in the guest (more efficient than polling in the
> +         host via halt_poll_ns for some scenarios).
> +
> +
>   endif
>   
>   config ARCH_NEEDS_CPU_IDLE_COUPLED
> Index: linux-2.6.git/drivers/cpuidle/Makefile
> ===================================================================
> --- linux-2.6.git.orig/drivers/cpuidle/Makefile	2019-05-29 14:44:43.030700871 -0300
> +++ linux-2.6.git/drivers/cpuidle/Makefile	2019-06-03 19:31:36.004302375 -0300
> @@ -7,6 +7,7 @@
>   obj-$(CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED) += coupled.o
>   obj-$(CONFIG_DT_IDLE_STATES)		  += dt_idle_states.o
>   obj-$(CONFIG_ARCH_HAS_CPU_RELAX)	  += poll_state.o
> +obj-$(CONFIG_HALTPOLL_CPUIDLE)		  += cpuidle-haltpoll.o
>   
>   ##################################################################################
>   # ARM SoC drivers
> Index: linux-2.6.git/drivers/cpuidle/cpuidle-haltpoll-trace.h
> ===================================================================
> --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> +++ linux-2.6.git/drivers/cpuidle/cpuidle-haltpoll-trace.h	2019-06-03 19:31:36.005302378 -0300
> @@ -0,0 +1,65 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#if !defined(_HALTPOLL_TRACE_H_) || defined(TRACE_HEADER_MULTI_READ)
> +#define _HALTPOLL_TRACE_H_
> +
> +#include <linux/stringify.h>
> +#include <linux/types.h>
> +#include <linux/tracepoint.h>
> +
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM cpuidle_haltpoll
> +
> +TRACE_EVENT(cpuidle_haltpoll_success,
> +	    TP_PROTO(unsigned int cpu_halt_poll_ns, u64 block_ns),
> +	    TP_ARGS(cpu_halt_poll_ns, block_ns),
> +
> +	    TP_STRUCT__entry(
> +			     __field(unsigned int, cpu_halt_poll_ns)
> +			     __field(u64,	   block_ns)
> +			    ),
> +
> +	    TP_fast_assign(
> +			   __entry->cpu_halt_poll_ns = cpu_halt_poll_ns;
> +			   __entry->block_ns = block_ns;
> +			  ),
> +
> +	    TP_printk("cpu_halt_poll_ns %u block_ns %lld",
> +			  __entry->cpu_halt_poll_ns,
> +			  __entry->block_ns)
> +);
> +
> +TRACE_EVENT(cpuidle_haltpoll_fail,
> +		TP_PROTO(unsigned int prev_cpu_halt_poll_ns,
> +			 unsigned int cpu_halt_poll_ns,
> +			 u64 block_ns),
> +		TP_ARGS(prev_cpu_halt_poll_ns, cpu_halt_poll_ns, block_ns),
> +
> +		TP_STRUCT__entry(
> +				__field(unsigned int, prev_cpu_halt_poll_ns)
> +				__field(unsigned int, cpu_halt_poll_ns)
> +				__field(u64,	   block_ns)
> +				),
> +
> +		TP_fast_assign(
> +			      __entry->prev_cpu_halt_poll_ns =
> +					prev_cpu_halt_poll_ns;
> +			      __entry->cpu_halt_poll_ns = cpu_halt_poll_ns;
> +			      __entry->block_ns = block_ns;
> +			      ),
> +
> +		TP_printk("prev_cpu_halt_poll_ns %u cpu_halt_poll_ns %u block_ns %lld",
> +			      __entry->prev_cpu_halt_poll_ns,
> +			      __entry->cpu_halt_poll_ns,
> +			      __entry->block_ns)
> +);
> +
> +#endif /* _HALTPOLL_TRACE_H_ */
> +
> +/* This part must be outside protection */
> +#undef TRACE_INCLUDE_PATH
> +#define TRACE_INCLUDE_PATH ../../drivers/cpuidle/
> +#undef TRACE_INCLUDE_FILE
> +#define TRACE_INCLUDE_FILE cpuidle-haltpoll-trace
> +#include <trace/define_trace.h>
> +
> +
> Index: linux-2.6.git/drivers/cpuidle/cpuidle-haltpoll.c
> ===================================================================
> --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> +++ linux-2.6.git/drivers/cpuidle/cpuidle-haltpoll.c	2019-06-03 19:31:36.005302378 -0300
> @@ -0,0 +1,172 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * cpuidle driver for halt polling.
> + *
> + * Copyright 2019 Red Hat, Inc. and/or its affiliates.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.  See
> + * the COPYING file in the top-level directory.
> + *
> + * Authors: Marcelo Tosatti <mtosatti@redhat.com>
> + */
> +
> +#include <linux/init.h>
> +#include <linux/cpuidle.h>
> +#include <linux/module.h>
> +#include <linux/timekeeping.h>
> +#include <linux/sched/idle.h>
> +#define CREATE_TRACE_POINTS
> +#include "cpuidle-haltpoll-trace.h"
> +
> +unsigned int guest_halt_poll_ns = 200000;
> +module_param(guest_halt_poll_ns, uint, 0644);
> +
> +/* division factor to shrink halt_poll_ns */
> +unsigned int guest_halt_poll_shrink = 2;
> +module_param(guest_halt_poll_shrink, uint, 0644);
> +
> +/* multiplication factor to grow per-cpu halt_poll_ns */
> +unsigned int guest_halt_poll_grow = 2;
> +module_param(guest_halt_poll_grow, uint, 0644);
> +
> +/* value in ns to start growing per-cpu halt_poll_ns */
> +unsigned int guest_halt_poll_grow_start = 10000;
> +module_param(guest_halt_poll_grow_start, uint, 0644);
> +
> +/* value in ns to start growing per-cpu halt_poll_ns */
> +bool guest_halt_poll_allow_shrink = true;
> +module_param(guest_halt_poll_allow_shrink, bool, 0644);
> +
> +static DEFINE_PER_CPU(unsigned int, halt_poll_ns);
> +
> +static void adjust_haltpoll_ns(unsigned int block_ns,
> +			       unsigned int *cpu_halt_poll_ns)
> +{
> +	unsigned int val;
> +	unsigned int prev_halt_poll_ns = *cpu_halt_poll_ns;
> +
> +	/* Grow cpu_halt_poll_ns if
> +	 * cpu_halt_poll_ns < block_ns < guest_halt_poll_ns
> +	 */
> +	if (block_ns > *cpu_halt_poll_ns && block_ns <= guest_halt_poll_ns) {
> +		val = *cpu_halt_poll_ns * guest_halt_poll_grow;
> +
> +		if (val < guest_halt_poll_grow_start)
> +			val = guest_halt_poll_grow_start;
> +		if (val > guest_halt_poll_ns)
> +			val = guest_halt_poll_ns;
> +
> +		*cpu_halt_poll_ns = val;
> +	} else if (block_ns > guest_halt_poll_ns &&
> +		   guest_halt_poll_allow_shrink) {
> +		unsigned int shrink = guest_halt_poll_shrink;
> +
> +		val = *cpu_halt_poll_ns;
> +		if (shrink == 0)
> +			val = 0;
> +		else
> +			val /= shrink;
Nit, feel free to ignore: we could check for guest_halt_poll_shrink
along with guest_halt_poll_allow_shrink and get rid of this whole
conditional.

Ankur

> +		*cpu_halt_poll_ns = val;
> +	}
> +
> +	trace_cpuidle_haltpoll_fail(prev_halt_poll_ns, *cpu_halt_poll_ns,
> +				    block_ns);
> +}
> +
> +static int haltpoll_enter_idle(struct cpuidle_device *dev,
> +			  struct cpuidle_driver *drv, int index)
> +{
> +	int do_halt = 0;
> +	unsigned int *cpu_halt_poll_ns;
> +	ktime_t start, now;
> +	int cpu = smp_processor_id();
> +
> +	cpu_halt_poll_ns = per_cpu_ptr(&halt_poll_ns, cpu);
> +
> +	/* No polling */
> +	if (guest_halt_poll_ns == 0) {
> +		if (current_clr_polling_and_test()) {
> +			local_irq_enable();
> +			return index;
> +		}
> +		default_idle();
> +		return index;
> +	}
> +
> +	local_irq_enable();
> +
> +	now = start = ktime_get();
> +	if (!current_set_polling_and_test()) {
> +		ktime_t end_spin;
> +
> +		end_spin = ktime_add_ns(now, *cpu_halt_poll_ns);
> +
> +		while (!need_resched()) {
> +			cpu_relax();
> +			now = ktime_get();
> +
> +			if (!ktime_before(now, end_spin)) {
> +				do_halt = 1;
> +				break;
> +			}
> +		}
> +	}
> +
> +	if (do_halt) {
> +		u64 block_ns;
> +
> +		/*
> +		 * No events while busy spin window passed,
> +		 * halt.
> +		 */
> +		local_irq_disable();
> +		if (current_clr_polling_and_test()) {
> +			local_irq_enable();
> +			return index;
> +		}
> +		default_idle();
> +		block_ns = ktime_to_ns(ktime_sub(ktime_get(), start));
> +		adjust_haltpoll_ns(block_ns, cpu_halt_poll_ns);
> +	} else {
> +		u64 block_ns = ktime_to_ns(ktime_sub(now, start));
> +
> +		trace_cpuidle_haltpoll_success(*cpu_halt_poll_ns, block_ns);
> +		current_clr_polling();
> +	}
> +
> +	return index;
> +}
> +
> +static struct cpuidle_driver haltpoll_driver = {
> +	.name = "haltpoll_idle",
> +	.owner = THIS_MODULE,
> +	.states = {
> +		{ /* entry 0 is for polling */ },
> +		{
> +			.enter			= haltpoll_enter_idle,
> +			.exit_latency		= 0,
> +			.target_residency	= 0,
> +			.power_usage		= -1,
> +			.name			= "Halt poll",
> +			.desc			= "Halt poll idle",
> +		},
> +	},
> +	.safe_state_index = 0,
> +	.state_count = 2,
> +};
> +
> +static int __init haltpoll_init(void)
> +{
> +	return cpuidle_register(&haltpoll_driver, NULL);
> +}
> +
> +static void __exit haltpoll_exit(void)
> +{
> +	cpuidle_unregister(&haltpoll_driver);
> +}
> +
> +module_init(haltpoll_init);
> +module_exit(haltpoll_exit);
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("Marcelo Tosatti <mtosatti@redhat.com>");
> +
> 
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch 2/3] kvm: x86: add host poll control msrs
  2019-06-03 22:52 ` [patch 2/3] kvm: x86: add host poll control msrs Marcelo Tosatti
@ 2019-06-06 12:04   ` Paolo Bonzini
  0 siblings, 0 replies; 29+ messages in thread
From: Paolo Bonzini @ 2019-06-06 12:04 UTC (permalink / raw)
  To: Marcelo Tosatti, kvm-devel
  Cc: Radim Krčmář,
	Andrea Arcangeli, Rafael J. Wysocki, Peter Zijlstra, Wanpeng Li,
	Konrad Rzeszutek Wilk, Raslan KarimAllah, Boris Ostrovsky,
	Ankur Arora, Christian Borntraeger

> Add an MSRs which allows the guest to disable 
> host polling (specifically the cpuidle-haltpoll, 
> when performing polling in the guest, disables
> host side polling).
> 
> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Queued this patch while waiting for Rafal's review.

Paolo

> ---
>  Documentation/virtual/kvm/msr.txt    |    9 +++++++++
>  arch/x86/include/asm/kvm_host.h      |    2 ++
>  arch/x86/include/uapi/asm/kvm_para.h |    2 ++
>  arch/x86/kvm/Kconfig                 |    1 +
>  arch/x86/kvm/cpuid.c                 |    3 ++-
>  arch/x86/kvm/x86.c                   |   23 +++++++++++++++++++++++
>  6 files changed, 39 insertions(+), 1 deletion(-)
> 
> Index: linux-2.6.git/Documentation/virtual/kvm/msr.txt
> ===================================================================
> --- linux-2.6.git.orig/Documentation/virtual/kvm/msr.txt	2018-05-18 15:40:19.697438928 -0300
> +++ linux-2.6.git/Documentation/virtual/kvm/msr.txt	2019-06-03 19:37:49.618543527 -0300
> @@ -273,3 +273,12 @@
>  	guest must both read the least significant bit in the memory area and
>  	clear it using a single CPU instruction, such as test and clear, or
>  	compare and exchange.
> +
> +MSR_KVM_POLL_CONTROL: 0x4b564d05
> +	Control host side polling.
> +
> +	data: Bit 0 enables (1) or disables (0) host halt poll
> +	logic.
> +	KVM guests can disable host halt polling when performing
> +	polling themselves.
> +
> Index: linux-2.6.git/arch/x86/include/asm/kvm_host.h
> ===================================================================
> --- linux-2.6.git.orig/arch/x86/include/asm/kvm_host.h	2019-05-29 14:46:14.516005546 -0300
> +++ linux-2.6.git/arch/x86/include/asm/kvm_host.h	2019-06-03 19:37:49.619543530 -0300
> @@ -755,6 +755,8 @@
>  		struct gfn_to_hva_cache data;
>  	} pv_eoi;
>  
> +	u64 msr_kvm_poll_control;
> +
>  	/*
>  	 * Indicate whether the access faults on its page table in guest
>  	 * which is set when fix page fault and used to detect unhandeable
> Index: linux-2.6.git/arch/x86/include/uapi/asm/kvm_para.h
> ===================================================================
> --- linux-2.6.git.orig/arch/x86/include/uapi/asm/kvm_para.h	2019-01-04 12:07:15.936947406 -0200
> +++ linux-2.6.git/arch/x86/include/uapi/asm/kvm_para.h	2019-06-03 19:37:49.620543534 -0300
> @@ -29,6 +29,7 @@
>  #define KVM_FEATURE_PV_TLB_FLUSH	9
>  #define KVM_FEATURE_ASYNC_PF_VMEXIT	10
>  #define KVM_FEATURE_PV_SEND_IPI	11
> +#define KVM_FEATURE_POLL_CONTROL	12
>  
>  #define KVM_HINTS_REALTIME      0
>  
> @@ -47,6 +48,7 @@
>  #define MSR_KVM_ASYNC_PF_EN 0x4b564d02
>  #define MSR_KVM_STEAL_TIME  0x4b564d03
>  #define MSR_KVM_PV_EOI_EN      0x4b564d04
> +#define MSR_KVM_POLL_CONTROL	0x4b564d05
>  
>  struct kvm_steal_time {
>  	__u64 steal;
> Index: linux-2.6.git/arch/x86/kvm/Kconfig
> ===================================================================
> --- linux-2.6.git.orig/arch/x86/kvm/Kconfig	2019-05-29 14:46:14.530005593 -0300
> +++ linux-2.6.git/arch/x86/kvm/Kconfig	2019-06-03 19:37:49.620543534 -0300
> @@ -41,6 +41,7 @@
>  	select PERF_EVENTS
>  	select HAVE_KVM_MSI
>  	select HAVE_KVM_CPU_RELAX_INTERCEPT
> +	select HAVE_KVM_NO_POLL
>  	select KVM_GENERIC_DIRTYLOG_READ_PROTECT
>  	select KVM_VFIO
>  	select SRCU
> Index: linux-2.6.git/arch/x86/kvm/cpuid.c
> ===================================================================
> --- linux-2.6.git.orig/arch/x86/kvm/cpuid.c	2019-05-29 14:46:14.530005593 -0300
> +++ linux-2.6.git/arch/x86/kvm/cpuid.c	2019-06-03 19:37:49.621543537 -0300
> @@ -643,7 +643,8 @@
>  			     (1 << KVM_FEATURE_PV_UNHALT) |
>  			     (1 << KVM_FEATURE_PV_TLB_FLUSH) |
>  			     (1 << KVM_FEATURE_ASYNC_PF_VMEXIT) |
> -			     (1 << KVM_FEATURE_PV_SEND_IPI);
> +			     (1 << KVM_FEATURE_PV_SEND_IPI) |
> +			     (1 << KVM_FEATURE_POLL_CONTROL);
>  
>  		if (sched_info_on())
>  			entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
> Index: linux-2.6.git/arch/x86/kvm/x86.c
> ===================================================================
> --- linux-2.6.git.orig/arch/x86/kvm/x86.c	2019-05-29 14:46:14.537005616 -0300
> +++ linux-2.6.git/arch/x86/kvm/x86.c	2019-06-03 19:37:49.624543547 -0300
> @@ -1177,6 +1177,7 @@
>  	MSR_IA32_POWER_CTL,
>  
>  	MSR_K7_HWCR,
> +	MSR_KVM_POLL_CONTROL,
>  };
>  
>  static unsigned num_emulated_msrs;
> @@ -2628,6 +2629,14 @@
>  			return 1;
>  		break;
>  
> +	case MSR_KVM_POLL_CONTROL:
> +		/* only enable bit supported */
> +		if (data & (-1ULL << 1))
> +			return 1;
> +
> +		vcpu->arch.msr_kvm_poll_control = data;
> +		break;
> +
>  	case MSR_IA32_MCG_CTL:
>  	case MSR_IA32_MCG_STATUS:
>  	case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1:
> @@ -2877,6 +2886,9 @@
>  	case MSR_KVM_PV_EOI_EN:
>  		msr_info->data = vcpu->arch.pv_eoi.msr_val;
>  		break;
> +	case MSR_KVM_POLL_CONTROL:
> +		msr_info->data = vcpu->arch.msr_kvm_poll_control;
> +		break;
>  	case MSR_IA32_P5_MC_ADDR:
>  	case MSR_IA32_P5_MC_TYPE:
>  	case MSR_IA32_MCG_CAP:
> @@ -8874,6 +8886,10 @@
>  	msr.host_initiated = true;
>  	kvm_write_tsc(vcpu, &msr);
>  	vcpu_put(vcpu);
> +
> +	/* poll control enabled by default */
> +	vcpu->arch.msr_kvm_poll_control = 1;
> +
>  	mutex_unlock(&vcpu->mutex);
>  
>  	if (!kvmclock_periodic_sync)
> @@ -9948,6 +9964,13 @@
>  }
>  EXPORT_SYMBOL_GPL(kvm_vector_hashing_enabled);
>  
> +bool kvm_arch_no_poll(struct kvm_vcpu *vcpu)
> +{
> +	return (vcpu->arch.msr_kvm_poll_control & 1) == 0;
> +}
> +EXPORT_SYMBOL_GPL(kvm_arch_no_poll);
> +
> +
>  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
>  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_fast_mmio);
>  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
> 
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch 1/3] drivers/cpuidle: add cpuidle-haltpoll driver
  2019-06-03 22:52 ` [patch 1/3] drivers/cpuidle: add cpuidle-haltpoll driver Marcelo Tosatti
  2019-06-05  8:16   ` Ankur Arora
@ 2019-06-06 17:51   ` Andrea Arcangeli
  2019-06-07 20:20     ` Marcelo Tosatti
  2019-06-06 19:03   ` Peter Zijlstra
  2019-06-07  9:54   ` Rafael J. Wysocki
  3 siblings, 1 reply; 29+ messages in thread
From: Andrea Arcangeli @ 2019-06-06 17:51 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: kvm-devel, Paolo Bonzini, Radim KrÄ?máÅ?,
	Rafael J. Wysocki, Peter Zijlstra, Wanpeng Li,
	Konrad Rzeszutek Wilk, Raslan KarimAllah, Boris Ostrovsky,
	Ankur Arora, Christian Borntraeger

Hello,

On Mon, Jun 03, 2019 at 07:52:43PM -0300, Marcelo Tosatti wrote:
> +unsigned int guest_halt_poll_ns = 200000;
> +module_param(guest_halt_poll_ns, uint, 0644);
> +
> +/* division factor to shrink halt_poll_ns */
> +unsigned int guest_halt_poll_shrink = 2;
> +module_param(guest_halt_poll_shrink, uint, 0644);
> +
> +/* multiplication factor to grow per-cpu halt_poll_ns */
> +unsigned int guest_halt_poll_grow = 2;
> +module_param(guest_halt_poll_grow, uint, 0644);
> +
> +/* value in ns to start growing per-cpu halt_poll_ns */
> +unsigned int guest_halt_poll_grow_start = 10000;
> +module_param(guest_halt_poll_grow_start, uint, 0644);
> +
> +/* value in ns to start growing per-cpu halt_poll_ns */
> +bool guest_halt_poll_allow_shrink = true;
> +module_param(guest_halt_poll_allow_shrink, bool, 0644);

These variables can all be static. They also should be __read_mostly
to be sure not to unnecessarily hit false sharing while going idle.

> +		while (!need_resched()) {
> +			cpu_relax();
> +			now = ktime_get();
> +
> +			if (!ktime_before(now, end_spin)) {
> +				do_halt = 1;
> +				break;
> +			}
> +		}

On skylake pause takes ~75 cycles with ple_gap=0 (and Marcelo found it
takes 6 cycles with pause loop exiting enabled but that shall be fixed
in the CPU and we can ignore it).

So we could call ktime_get() only once every 100 times or more and
we'd be still accurate down to at least 1usec.

Ideally we'd like a ktime_try_get() that will break the seqcount loop
if read_seqcount_retry fails. Something like below pseudocode:

#define KTIME_ERR ((ktime_t) { .tv64 = 0 })

ktime_t ktime_try_get(void)
{
	[..]
	seq = read_seqcount_begin(&timekeeper_seq);
	secs = tk->xtime_sec + tk->wall_to_monotonic.tv_sec;
	nsecs = timekeeping_get_ns(&tk->tkr_mono) +
		tk->wall_to_monotonic.tv_nsec;
	if (unlikely(read_seqcount_retry(&timekeeper_seq, seq)))
		return KTIME_ERR;
	[..]
}

If it ktime_try_get() fails we keep calling it at every iteration of
the loop, when finally it succeeds we call it again only after 100
pause instructions or more. So we continue polling need_resched()
while we wait timerkeeper_seq to be released (and hopefully by looping
100 times or more we'll reduce the frequency when we find
timekeeper_seq locked).

All we care is to react to need_resched ASAP and to have a resolution
of the order of 1usec for the spin time.

If 100 is too wired a new module parameter in __read_mostly section
configured to 100 or more by default, sounds preferable than hitting
every 75nsec on the timekeeper_seq seqcount cacheline.

I doubt it'd make any measurable difference with a few vcpus, but with
hundred of host CPUs and vcpus perhaps it's worth it.

This of course can be done later once the patch is merged and if it's
confirmed the above makes sense in practice and not just in theory. I
wouldn't want to delay the merging for a possible micro optimization.

Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>

Thanks,
Andrea

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch v2 3/3] cpuidle-haltpoll: disable host side polling when kvm virtualized
  2019-06-04 12:24   ` [patch v2 " Marcelo Tosatti
@ 2019-06-06 18:25     ` Joao Martins
  2019-06-06 18:36       ` Andrea Arcangeli
  2019-06-07 20:25       ` Marcelo Tosatti
  0 siblings, 2 replies; 29+ messages in thread
From: Joao Martins @ 2019-06-06 18:25 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: kvm-devel, Paolo Bonzini, Radim Krcmar, Andrea Arcangeli,
	Rafael J. Wysocki, Peter Zijlstra, Wanpeng Li,
	Konrad Rzeszutek Wilk, Raslan KarimAllah, Boris Ostrovsky,
	Ankur Arora, Christian Borntraeger

On 6/4/19 1:24 PM, Marcelo Tosatti wrote:
> 
> When performing guest side polling, it is not necessary to 
> also perform host side polling. 
> 
> So disable host side polling, via the new MSR interface, 
> when loading cpuidle-haltpoll driver.
> 
> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> 
> ---
> 
> v2: remove extra "}"
> 
>  arch/x86/Kconfig                        |    7 +++++
>  arch/x86/include/asm/cpuidle_haltpoll.h |    8 ++++++
>  arch/x86/kernel/kvm.c                   |   40 ++++++++++++++++++++++++++++++++
>  drivers/cpuidle/cpuidle-haltpoll.c      |    9 ++++++-
>  include/linux/cpuidle_haltpoll.h        |   16 ++++++++++++
>  5 files changed, 79 insertions(+), 1 deletion(-)
> 
> Index: linux-2.6.git/arch/x86/include/asm/cpuidle_haltpoll.h
> ===================================================================
> --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> +++ linux-2.6.git/arch/x86/include/asm/cpuidle_haltpoll.h	2019-06-03 19:38:42.328718617 -0300
> @@ -0,0 +1,8 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _ARCH_HALTPOLL_H
> +#define _ARCH_HALTPOLL_H
> +
> +void arch_haltpoll_enable(void);
> +void arch_haltpoll_disable(void);
> +
> +#endif
> Index: linux-2.6.git/drivers/cpuidle/cpuidle-haltpoll.c
> ===================================================================
> --- linux-2.6.git.orig/drivers/cpuidle/cpuidle-haltpoll.c	2019-06-03 19:38:12.376619124 -0300
> +++ linux-2.6.git/drivers/cpuidle/cpuidle-haltpoll.c	2019-06-03 19:38:42.328718617 -0300
> @@ -15,6 +15,7 @@
>  #include <linux/module.h>
>  #include <linux/timekeeping.h>
>  #include <linux/sched/idle.h>
> +#include <linux/cpuidle_haltpoll.h>
>  #define CREATE_TRACE_POINTS
>  #include "cpuidle-haltpoll-trace.h"
>  
> @@ -157,11 +158,17 @@
>  
>  static int __init haltpoll_init(void)
>  {
> -	return cpuidle_register(&haltpoll_driver, NULL);
> +	int ret = cpuidle_register(&haltpoll_driver, NULL);
> +
> +	if (ret == 0)
> +		arch_haltpoll_enable();
> +
> +	return ret;
>  }
>  
>  static void __exit haltpoll_exit(void)
>  {
> +	arch_haltpoll_disable();
>  	cpuidle_unregister(&haltpoll_driver);
>  }
>  
> Index: linux-2.6.git/include/linux/cpuidle_haltpoll.h
> ===================================================================
> --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> +++ linux-2.6.git/include/linux/cpuidle_haltpoll.h	2019-06-03 19:41:57.293366260 -0300
> @@ -0,0 +1,16 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _CPUIDLE_HALTPOLL_H
> +#define _CPUIDLE_HALTPOLL_H
> +
> +#ifdef CONFIG_ARCH_CPUIDLE_HALTPOLL
> +#include <asm/cpuidle_haltpoll.h>
> +#else
> +static inline void arch_haltpoll_enable(void)
> +{
> +}
> +
> +static inline void arch_haltpoll_disable(void)
> +{
> +}
> +#endif
> +#endif
> Index: linux-2.6.git/arch/x86/Kconfig
> ===================================================================
> --- linux-2.6.git.orig/arch/x86/Kconfig	2019-06-03 19:38:12.376619124 -0300
> +++ linux-2.6.git/arch/x86/Kconfig	2019-06-03 19:42:34.478489868 -0300
> @@ -787,6 +787,7 @@
>  	bool "KVM Guest support (including kvmclock)"
>  	depends on PARAVIRT
>  	select PARAVIRT_CLOCK
> +	select ARCH_CPUIDLE_HALTPOLL
>  	default y
>  	---help---
>  	  This option enables various optimizations for running under the KVM
> @@ -795,6 +796,12 @@
>  	  underlying device model, the host provides the guest with
>  	  timing infrastructure such as time of day, and system time
>  
> +config ARCH_CPUIDLE_HALTPOLL
> +        def_bool n
> +        proUmpt "Disable host haltpoll when loading haltpoll driver"
> +        help
> +	  If virtualized under KVM, disable host haltpoll.
> +
>  config PVH
>  	bool "Support for running PVH guests"
>  	---help---
> Index: linux-2.6.git/arch/x86/kernel/kvm.c
> ===================================================================
> --- linux-2.6.git.orig/arch/x86/kernel/kvm.c	2019-06-03 19:38:12.376619124 -0300
> +++ linux-2.6.git/arch/x86/kernel/kvm.c	2019-06-03 19:40:14.359024312 -0300
> @@ -853,3 +853,43 @@
>  }
>  
>  #endif	/* CONFIG_PARAVIRT_SPINLOCKS */
> +
> +#ifdef CONFIG_ARCH_CPUIDLE_HALTPOLL
> +
> +void kvm_disable_host_haltpoll(void *i)
> +{
> +	wrmsrl(MSR_KVM_POLL_CONTROL, 0);
> +}
> +
> +void kvm_enable_host_haltpoll(void *i)
> +{
> +	wrmsrl(MSR_KVM_POLL_CONTROL, 1);
> +}
> +
> +void arch_haltpoll_enable(void)
> +{
> +	if (!kvm_para_has_feature(KVM_FEATURE_POLL_CONTROL))
> +		return;
> +

Perhaps warn the user when failing to disable host poll e.g.:

if (!kvm_para_has_feature(KVM_FEATURE_POLL_CONTROL)) {
	pr_warn_once("haltpoll: Failed to disable host halt polling\n");
	return;
}

But I wonder whether we should fail to load cpuidle-haltpoll when host halt
polling can't be disabled[*]? That is to avoid polling in both host and guest
and *possibly* avoid chances for performance regressions when running on older
hypervisors?

[*] with guest still able load with lack of host polling control via modparam

> +	preempt_disable();
> +	/* Enabling guest halt poll disables host halt poll */
> +	kvm_disable_host_haltpoll(NULL);
> +	smp_call_function(kvm_disable_host_haltpoll, NULL, 1);
> +	preempt_enable();
> +}
> +EXPORT_SYMBOL_GPL(arch_haltpoll_enable);
> +
> +void arch_haltpoll_disable(void)
> +{
> +	if (!kvm_para_has_feature(KVM_FEATURE_POLL_CONTROL))
> +		return;
> +
> +	preempt_disable();
> +	/* Enabling guest halt poll disables host halt poll */
> +	kvm_enable_host_haltpoll(NULL);
> +	smp_call_function(kvm_enable_host_haltpoll, NULL, 1);
> +	preempt_enable();
> +}
> +
> +EXPORT_SYMBOL_GPL(arch_haltpoll_disable);
> +#endif
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch v2 3/3] cpuidle-haltpoll: disable host side polling when kvm virtualized
  2019-06-06 18:25     ` Joao Martins
@ 2019-06-06 18:36       ` Andrea Arcangeli
  2019-06-06 18:51         ` Joao Martins
  2019-06-07 20:38         ` Marcelo Tosatti
  2019-06-07 20:25       ` Marcelo Tosatti
  1 sibling, 2 replies; 29+ messages in thread
From: Andrea Arcangeli @ 2019-06-06 18:36 UTC (permalink / raw)
  To: Joao Martins
  Cc: Marcelo Tosatti, kvm-devel, Paolo Bonzini, Radim Krcmar,
	Rafael J. Wysocki, Peter Zijlstra, Wanpeng Li,
	Konrad Rzeszutek Wilk, Raslan KarimAllah, Boris Ostrovsky,
	Ankur Arora, Christian Borntraeger

Hello,

On Thu, Jun 06, 2019 at 07:25:28PM +0100, Joao Martins wrote:
> But I wonder whether we should fail to load cpuidle-haltpoll when host halt
> polling can't be disabled[*]? That is to avoid polling in both host and guest
> and *possibly* avoid chances for performance regressions when running on older
> hypervisors?

I don't think it's necessary: that would force an upgrade of the host
KVM version in order to use the guest haltpoll feature with an
upgraded guest kernel that can use the guest haltpoll.

The guest haltpoll is self contained in the guest, so there's no
reason to prevent that by design or to force upgrade of the KVM host
version. It'd be more than enough to reload kvm.ko in the host with
the host haltpoll set to zero with the module parameter already
available, to achieve the same runtime without requiring a forced host
upgrade.

The warning however sounds sensible.

Thanks,
Andrea

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch v2 3/3] cpuidle-haltpoll: disable host side polling when kvm virtualized
  2019-06-06 18:36       ` Andrea Arcangeli
@ 2019-06-06 18:51         ` Joao Martins
  2019-06-06 19:22           ` Joao Martins
  2019-06-07 20:38         ` Marcelo Tosatti
  1 sibling, 1 reply; 29+ messages in thread
From: Joao Martins @ 2019-06-06 18:51 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Marcelo Tosatti, kvm-devel, Paolo Bonzini, Radim Krcmar,
	Rafael J. Wysocki, Peter Zijlstra, Wanpeng Li,
	Konrad Rzeszutek Wilk, Raslan KarimAllah, Boris Ostrovsky,
	Ankur Arora, Christian Borntraeger

On 6/6/19 7:36 PM, Andrea Arcangeli wrote:
> Hello,
> 
> On Thu, Jun 06, 2019 at 07:25:28PM +0100, Joao Martins wrote:
>> But I wonder whether we should fail to load cpuidle-haltpoll when host halt
>> polling can't be disabled[*]? That is to avoid polling in both host and guest
>> and *possibly* avoid chances for performance regressions when running on older
>> hypervisors?
> 
> I don't think it's necessary: that would force an upgrade of the host
> KVM version in order to use the guest haltpoll feature with an
> upgraded guest kernel that can use the guest haltpoll.
> 

Hence why I was suggesting a *guest* cpuidle-haltpoll module parameter to still
allow it to load or otherwise (or allow guest to pick).

> The guest haltpoll is self contained in the guest, so there's no
> reason to prevent that by design or to force upgrade of the KVM host
> version. It'd be more than enough to reload kvm.ko in the host with
> the host haltpoll set to zero with the module parameter already
> available, to achieve the same runtime without requiring a forced host
> upgrade.
> 
It's just with the new driver we unilaterally poll on both sides, just felt I
would point it out should this raise unattended performance side effects ;)

> The warning however sounds sensible.
> 

Cool.

	Joao

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch 1/3] drivers/cpuidle: add cpuidle-haltpoll driver
  2019-06-03 22:52 ` [patch 1/3] drivers/cpuidle: add cpuidle-haltpoll driver Marcelo Tosatti
  2019-06-05  8:16   ` Ankur Arora
  2019-06-06 17:51   ` Andrea Arcangeli
@ 2019-06-06 19:03   ` Peter Zijlstra
  2019-06-07  9:54   ` Rafael J. Wysocki
  3 siblings, 0 replies; 29+ messages in thread
From: Peter Zijlstra @ 2019-06-06 19:03 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: kvm-devel, Paolo Bonzini, Radim KrÄ?máÅ?,
	Andrea Arcangeli, Rafael J. Wysocki, Wanpeng Li,
	Konrad Rzeszutek Wilk, Raslan KarimAllah, Boris Ostrovsky,
	Ankur Arora, Christian Borntraeger

On Mon, Jun 03, 2019 at 07:52:43PM -0300, Marcelo Tosatti wrote:
> +static int haltpoll_enter_idle(struct cpuidle_device *dev,
> +			  struct cpuidle_driver *drv, int index)
> +{
> +	int do_halt = 0;
> +	unsigned int *cpu_halt_poll_ns;
> +	ktime_t start, now;
> +	int cpu = smp_processor_id();
> +
> +	cpu_halt_poll_ns = per_cpu_ptr(&halt_poll_ns, cpu);
> +
> +	/* No polling */
> +	if (guest_halt_poll_ns == 0) {
> +		if (current_clr_polling_and_test()) {
> +			local_irq_enable();
> +			return index;
> +		}
> +		default_idle();
> +		return index;
> +	}
> +
> +	local_irq_enable();
> +
> +	now = start = ktime_get();
> +	if (!current_set_polling_and_test()) {
> +		ktime_t end_spin;
> +
> +		end_spin = ktime_add_ns(now, *cpu_halt_poll_ns);
> +
> +		while (!need_resched()) {
> +			cpu_relax();
> +			now = ktime_get();
> +
> +			if (!ktime_before(now, end_spin)) {
> +				do_halt = 1;
> +				break;
> +			}
> +		}
> +	}
> +
> +	if (do_halt) {
> +		u64 block_ns;
> +
> +		/*
> +		 * No events while busy spin window passed,
> +		 * halt.
> +		 */
> +		local_irq_disable();
> +		if (current_clr_polling_and_test()) {
> +			local_irq_enable();
> +			return index;
> +		}
> +		default_idle();
> +		block_ns = ktime_to_ns(ktime_sub(ktime_get(), start));
> +		adjust_haltpoll_ns(block_ns, cpu_halt_poll_ns);
> +	} else {
> +		u64 block_ns = ktime_to_ns(ktime_sub(now, start));
> +
> +		trace_cpuidle_haltpoll_success(*cpu_halt_poll_ns, block_ns);
> +		current_clr_polling();
> +	}
> +
> +	return index;
> +}

You might want to look at using sched_clock() here instead of
ktime_get(). ktime_get() can get _very_ expensive when it drops back to
HPET or things like that, where sched_clock() will always keep using
TSC, even when it is not globally synchronized.

(and since this code runs with preemption disabled, we don't care about
the clock being globally sane)


So something like this:

	start = sched_clock();
	if (current_set_polling_and_test()) {
		local_irq_enable();
		goto out;
	}

	local_irq_enable();
	for (;;) {
		if (need_resched()) {
			current_clr_polling();
			trace_..();
			goto out;
		}

		now = sched_clock();
		if (now - start > cpu_halt_poll_ns)
			break;

		cpu_relax();
	}

	local_irq_disable();
	if (current_clr_polling_and_test()) {
		local_irq_enable();
		goto out;
	}

	default_idle();
	block_ns = sched_clock() - start;
	adjust_haltpoll_ns(block_ns, cpu_halt_poll_ns);

out:
	return index;


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch v2 3/3] cpuidle-haltpoll: disable host side polling when kvm virtualized
  2019-06-06 18:51         ` Joao Martins
@ 2019-06-06 19:22           ` Joao Martins
  2019-06-06 21:01             ` Andrea Arcangeli
  0 siblings, 1 reply; 29+ messages in thread
From: Joao Martins @ 2019-06-06 19:22 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Marcelo Tosatti, kvm-devel, Paolo Bonzini, Radim Krcmar,
	Rafael J. Wysocki, Peter Zijlstra, Wanpeng Li,
	Konrad Rzeszutek Wilk, Raslan KarimAllah, Boris Ostrovsky,
	Ankur Arora, Christian Borntraeger

On 6/6/19 7:51 PM, Joao Martins wrote:
> On 6/6/19 7:36 PM, Andrea Arcangeli wrote:
>> Hello,
>>
>> On Thu, Jun 06, 2019 at 07:25:28PM +0100, Joao Martins wrote:
>>> But I wonder whether we should fail to load cpuidle-haltpoll when host halt
>>> polling can't be disabled[*]? That is to avoid polling in both host and guest
>>> and *possibly* avoid chances for performance regressions when running on older
>>> hypervisors?
>>
>> I don't think it's necessary: that would force an upgrade of the host
>> KVM version in order to use the guest haltpoll feature with an
>> upgraded guest kernel that can use the guest haltpoll.
>>
> Hence why I was suggesting a *guest* cpuidle-haltpoll module parameter to still
> allow it to load or otherwise (or allow guest to pick).
> 
By 'still allow it to load', I meant specifically to handle the case when host
polling control is not supported and what to do in that case.

>> The guest haltpoll is self contained in the guest, so there's no
>> reason to prevent that by design or to force upgrade of the KVM host
>> version. It'd be more than enough to reload kvm.ko in the host with
>> the host haltpoll set to zero with the module parameter already
>> available, to achieve the same runtime without requiring a forced host
>> upgrade.
>>
> It's just with the new driver we unilaterally poll on both sides, just felt I
> would point it out should this raise unattended performance side effects ;)
> 
To be clear: by 'unilaterally' I was trying to refer to hosts KVM without
polling control (which is safe to say that it is the majority atm?).

Alternatively, there's always the option that if guest sees any issues on that
case (with polling on both sides=, that it can always blacklist
cpuidle-haltpoll. But may this is not an issue and perhaps majority of users
still observes benefit when polling is enabled on guest and host.

>> The warning however sounds sensible.
>>
> 
> Cool.
> 
> 	Joao
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch v2 3/3] cpuidle-haltpoll: disable host side polling when kvm virtualized
  2019-06-06 19:22           ` Joao Martins
@ 2019-06-06 21:01             ` Andrea Arcangeli
  0 siblings, 0 replies; 29+ messages in thread
From: Andrea Arcangeli @ 2019-06-06 21:01 UTC (permalink / raw)
  To: Joao Martins
  Cc: Marcelo Tosatti, kvm-devel, Paolo Bonzini, Radim Krcmar,
	Rafael J. Wysocki, Peter Zijlstra, Wanpeng Li,
	Konrad Rzeszutek Wilk, Raslan KarimAllah, Boris Ostrovsky,
	Ankur Arora, Christian Borntraeger

On Thu, Jun 06, 2019 at 08:22:40PM +0100, Joao Martins wrote:
> On 6/6/19 7:51 PM, Joao Martins wrote:
> > On 6/6/19 7:36 PM, Andrea Arcangeli wrote:
> >> Hello,
> >>
> >> On Thu, Jun 06, 2019 at 07:25:28PM +0100, Joao Martins wrote:
> >>> But I wonder whether we should fail to load cpuidle-haltpoll when host halt
> >>> polling can't be disabled[*]? That is to avoid polling in both host and guest
> >>> and *possibly* avoid chances for performance regressions when running on older
> >>> hypervisors?
> >>
> >> I don't think it's necessary: that would force an upgrade of the host
> >> KVM version in order to use the guest haltpoll feature with an
> >> upgraded guest kernel that can use the guest haltpoll.
> >>
> > Hence why I was suggesting a *guest* cpuidle-haltpoll module parameter to still
> > allow it to load or otherwise (or allow guest to pick).
> > 
> By 'still allow it to load', I meant specifically to handle the case when host
> polling control is not supported and what to do in that case.

All right, we could add a force=1 parameter to force loading as an
opt-in (and fail load by default with force=0).

> >> The guest haltpoll is self contained in the guest, so there's no
> >> reason to prevent that by design or to force upgrade of the KVM host
> >> version. It'd be more than enough to reload kvm.ko in the host with
> >> the host haltpoll set to zero with the module parameter already
> >> available, to achieve the same runtime without requiring a forced host
> >> upgrade.
> >>
> > It's just with the new driver we unilaterally poll on both sides, just felt I
> > would point it out should this raise unattended performance side effects ;)
> > 
> To be clear: by 'unilaterally' I was trying to refer to hosts KVM without
> polling control (which is safe to say that it is the majority atm?).
> 
> Alternatively, there's always the option that if guest sees any issues on that
> case (with polling on both sides=, that it can always blacklist
> cpuidle-haltpoll. But may this is not an issue and perhaps majority of users
> still observes benefit when polling is enabled on guest and host.

It should be workload dependent if it increases performance to
haltpoll both in both host kernel and guest, the only sure cons is
that it'd burn some more energy..

By default the cpuidle-haltpoll driver shouldn't get loaded if it's
built as a module (the expectation is the default will be =m), and it
can still easily disabled with rmmod if it hurts performance.

So the policy if to activate guest haltpoll or not will still reside
in the guest userland: the guest userland code or guest admin, without
a force parameter has to decide if to load or not to load the module,
with the force=1 parameter it'll have to decide if to load it with =0
or =1 (or load it first with =0 and then decide if to try to load it
again with =1 which would be the benefit the force parameter
provides). To decide if to load or not to load it, the guest userland
could check if there's no support for disabling the host haltpoll,
which can be verified with rdmsr too.

# rdmsr 0x4b564d05
rdmsr: CPU 0 cannot read MSR 0x4b564d05

Adding a force=1 parameter to force loading will just add a few lines
more of kernel code, I'm neutral on that but it looks fine either
ways.

Thanks,
Andrea

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch 0/3] cpuidle-haltpoll driver (v2)
  2019-06-03 22:52 [patch 0/3] cpuidle-haltpoll driver (v2) Marcelo Tosatti
                   ` (2 preceding siblings ...)
  2019-06-03 22:52 ` [patch 3/3] cpuidle-haltpoll: disable host side polling when kvm virtualized Marcelo Tosatti
@ 2019-06-07  9:49 ` Rafael J. Wysocki
  2019-06-07 17:16   ` Marcelo Tosatti
  2019-06-10 14:59   ` Marcelo Tosatti
  3 siblings, 2 replies; 29+ messages in thread
From: Rafael J. Wysocki @ 2019-06-07  9:49 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: kvm-devel, Paolo Bonzini,
	Radim Krčmář,
	Andrea Arcangeli, Peter Zijlstra, Wanpeng Li,
	Konrad Rzeszutek Wilk, Raslan KarimAllah, Boris Ostrovsky,
	Ankur Arora, Christian Borntraeger, Linux PM, Peter Zijlstra

On 6/4/2019 12:52 AM, Marcelo Tosatti wrote:
> The cpuidle-haltpoll driver allows the guest vcpus to poll for a specified
> amount of time before halting. This provides the following benefits
> to host side polling:
>
>          1) The POLL flag is set while polling is performed, which allows
>             a remote vCPU to avoid sending an IPI (and the associated
>             cost of handling the IPI) when performing a wakeup.
>
>          2) The HLT VM-exit cost can be avoided.
>
> The downside of guest side polling is that polling is performed
> even with other runnable tasks in the host.
>
> Results comparing halt_poll_ns and server/client application
> where a small packet is ping-ponged:
>
> host                                        --> 31.33
> halt_poll_ns=300000 / no guest busy spin    --> 33.40   (93.8%)
> halt_poll_ns=0 / guest_halt_poll_ns=300000  --> 32.73   (95.7%)
>
> For the SAP HANA benchmarks (where idle_spin is a parameter
> of the previous version of the patch, results should be the
> same):
>
> hpns == halt_poll_ns
>
>                            idle_spin=0/   idle_spin=800/    idle_spin=0/
>                            hpns=200000    hpns=0            hpns=800000
> DeleteC06T03 (100 thread) 1.76           1.71 (-3%)        1.78   (+1%)
> InsertC16T02 (100 thread) 2.14           2.07 (-3%)        2.18   (+1.8%)
> DeleteC00T01 (1 thread)   1.34           1.28 (-4.5%)	   1.29   (-3.7%)
> UpdateC00T03 (1 thread)   4.72           4.18 (-12%)	   4.53   (-5%)
>
> V2:
>
> - Move from x86 to generic code (Paolo/Christian).
> - Add auto-tuning logic (Paolo).
> - Add MSR to disable host side polling (Paolo).
>
>
>
First of all, please CC power management patches (including cpuidle, 
cpufreq etc) to linux-pm@vger.kernel.org (there are people on that list 
who may want to see your changes before they go in) and CC cpuidle 
material (in particular) to Peter Zijlstra.

Second, I'm not a big fan of this approach to be honest, as it kind of 
is a driver trying to play the role of a governor.

We have a "polling state" already that could be used here in principle 
so I wonder what would be wrong with that.  Also note that there seems 
to be at least some code duplication between your code and the "polling 
state" implementation, so maybe it would be possible to do some things 
in a common way?



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch 1/3] drivers/cpuidle: add cpuidle-haltpoll driver
  2019-06-03 22:52 ` [patch 1/3] drivers/cpuidle: add cpuidle-haltpoll driver Marcelo Tosatti
                     ` (2 preceding siblings ...)
  2019-06-06 19:03   ` Peter Zijlstra
@ 2019-06-07  9:54   ` Rafael J. Wysocki
  3 siblings, 0 replies; 29+ messages in thread
From: Rafael J. Wysocki @ 2019-06-07  9:54 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: kvm-devel, Paolo Bonzini,
	Radim Krčmář,
	Andrea Arcangeli, Peter Zijlstra, Wanpeng Li,
	Konrad Rzeszutek Wilk, Raslan KarimAllah, Boris Ostrovsky,
	Ankur Arora, Christian Borntraeger, Linux PM

On 6/4/2019 12:52 AM, Marcelo Tosatti wrote:

It is rather inconvenient to comment patches posted as attachments.

Anyway, IMO adding trace events at this point is premature, please move 
adding them to a separate patch at the end of the series.



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch 0/3] cpuidle-haltpoll driver (v2)
  2019-06-07  9:49 ` [patch 0/3] cpuidle-haltpoll driver (v2) Rafael J. Wysocki
@ 2019-06-07 17:16   ` Marcelo Tosatti
  2019-06-07 18:22     ` Paolo Bonzini
  2019-06-10 14:59   ` Marcelo Tosatti
  1 sibling, 1 reply; 29+ messages in thread
From: Marcelo Tosatti @ 2019-06-07 17:16 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: kvm-devel, Paolo Bonzini, Radim KrÄ?máÅ?,
	Andrea Arcangeli, Peter Zijlstra, Wanpeng Li,
	Konrad Rzeszutek Wilk, Raslan KarimAllah, Boris Ostrovsky,
	Ankur Arora, Christian Borntraeger, Linux PM

On Fri, Jun 07, 2019 at 11:49:51AM +0200, Rafael J. Wysocki wrote:
> On 6/4/2019 12:52 AM, Marcelo Tosatti wrote:
> >The cpuidle-haltpoll driver allows the guest vcpus to poll for a specified
> >amount of time before halting. This provides the following benefits
> >to host side polling:
> >
> >         1) The POLL flag is set while polling is performed, which allows
> >            a remote vCPU to avoid sending an IPI (and the associated
> >            cost of handling the IPI) when performing a wakeup.
> >
> >         2) The HLT VM-exit cost can be avoided.
> >
> >The downside of guest side polling is that polling is performed
> >even with other runnable tasks in the host.
> >
> >Results comparing halt_poll_ns and server/client application
> >where a small packet is ping-ponged:
> >
> >host                                        --> 31.33
> >halt_poll_ns=300000 / no guest busy spin    --> 33.40   (93.8%)
> >halt_poll_ns=0 / guest_halt_poll_ns=300000  --> 32.73   (95.7%)
> >
> >For the SAP HANA benchmarks (where idle_spin is a parameter
> >of the previous version of the patch, results should be the
> >same):
> >
> >hpns == halt_poll_ns
> >
> >                           idle_spin=0/   idle_spin=800/    idle_spin=0/
> >                           hpns=200000    hpns=0            hpns=800000
> >DeleteC06T03 (100 thread) 1.76           1.71 (-3%)        1.78   (+1%)
> >InsertC16T02 (100 thread) 2.14           2.07 (-3%)        2.18   (+1.8%)
> >DeleteC00T01 (1 thread)   1.34           1.28 (-4.5%)	   1.29   (-3.7%)
> >UpdateC00T03 (1 thread)   4.72           4.18 (-12%)	   4.53   (-5%)
> >
> >V2:
> >
> >- Move from x86 to generic code (Paolo/Christian).
> >- Add auto-tuning logic (Paolo).
> >- Add MSR to disable host side polling (Paolo).
> >
> >
> >
> First of all, please CC power management patches (including cpuidle,
> cpufreq etc) to linux-pm@vger.kernel.org (there are people on that
> list who may want to see your changes before they go in) and CC
> cpuidle material (in particular) to Peter Zijlstra.

Ok, Peter is CC'ed, will include linux-pm@vger in the next patches.

> Second, I'm not a big fan of this approach to be honest, as it kind
> of is a driver trying to play the role of a governor.

Well, its not trying to choose which idle state to enter, because
there is only one idle state to enter when virtualized (HLT).

> We have a "polling state" already that could be used here in
> principle so I wonder what would be wrong with that.  

There is no "target residency" concept in the virtualized use-case 
(which is what poll_state.c uses to calculate the poll time).

Moreover the cpuidle-haltpoll driver uses an adaptive logic to
tune poll time (which appparently does not make sense for poll_state).

The only thing they share is the main loop structure:

"while (!need_resched()) {
	cpu_relax();
	now = ktime_get();
}"

> Also note that
> there seems to be at least some code duplication between your code
> and the "polling state" implementation, so maybe it would be
> possible to do some things in a common way?

Again, its just the main loop structure that is shared:
there is no target residency in the virtualized case, 
and we want an adaptive scheme.

Lets think about deduplication: you would have a cpuidle driver,
with a fake "target residency". 

Now, it makes no sense to use a governor for the virtualized case
(again, there is only one idle state: HLT, the host governor is
used for the actual idle state decision in the host).

So i fail to see how i would go about integrating these two 
and what are the advantages of doing so ? 



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch 0/3] cpuidle-haltpoll driver (v2)
  2019-06-07 17:16   ` Marcelo Tosatti
@ 2019-06-07 18:22     ` Paolo Bonzini
  2019-06-07 21:38       ` Marcelo Tosatti
  0 siblings, 1 reply; 29+ messages in thread
From: Paolo Bonzini @ 2019-06-07 18:22 UTC (permalink / raw)
  To: Marcelo Tosatti, Rafael J. Wysocki
  Cc: kvm-devel, Radim KrÄ?máÅ?,
	Andrea Arcangeli, Peter Zijlstra, Wanpeng Li,
	Konrad Rzeszutek Wilk, Raslan KarimAllah, Boris Ostrovsky,
	Ankur Arora, Christian Borntraeger, Linux PM

On 07/06/19 19:16, Marcelo Tosatti wrote:
> There is no "target residency" concept in the virtualized use-case 
> (which is what poll_state.c uses to calculate the poll time).

Actually there is: it is the cost of a vmexit, and it be calibrated with
a very short CPUID loop (e.g. run 100 CPUID instructions and take the
smallest TSC interval---it should take less than 50 microseconds, and
less than a millisecond even on nested virt).

I think it would make sense to improve poll_state.c to use an adaptive
algorithm similar to the one you implemented, which includes optionally
allowing to poll for an interval larger than the target residency.

Paolo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch 1/3] drivers/cpuidle: add cpuidle-haltpoll driver
  2019-06-06 17:51   ` Andrea Arcangeli
@ 2019-06-07 20:20     ` Marcelo Tosatti
  0 siblings, 0 replies; 29+ messages in thread
From: Marcelo Tosatti @ 2019-06-07 20:20 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: kvm-devel, Paolo Bonzini, Radim Krcmar, Rafael J. Wysocki,
	Peter Zijlstra, Wanpeng Li, Konrad Rzeszutek Wilk,
	Raslan KarimAllah, Boris Ostrovsky, Ankur Arora,
	Christian Borntraeger

Hi Andrea,

On Thu, Jun 06, 2019 at 01:51:03PM -0400, Andrea Arcangeli wrote:
> Hello,
> 
> On Mon, Jun 03, 2019 at 07:52:43PM -0300, Marcelo Tosatti wrote:
> > +unsigned int guest_halt_poll_ns = 200000;
> > +module_param(guest_halt_poll_ns, uint, 0644);
> > +
> > +/* division factor to shrink halt_poll_ns */
> > +unsigned int guest_halt_poll_shrink = 2;
> > +module_param(guest_halt_poll_shrink, uint, 0644);
> > +
> > +/* multiplication factor to grow per-cpu halt_poll_ns */
> > +unsigned int guest_halt_poll_grow = 2;
> > +module_param(guest_halt_poll_grow, uint, 0644);
> > +
> > +/* value in ns to start growing per-cpu halt_poll_ns */
> > +unsigned int guest_halt_poll_grow_start = 10000;
> > +module_param(guest_halt_poll_grow_start, uint, 0644);
> > +
> > +/* value in ns to start growing per-cpu halt_poll_ns */
> > +bool guest_halt_poll_allow_shrink = true;
> > +module_param(guest_halt_poll_allow_shrink, bool, 0644);
> 
> These variables can all be static. They also should be __read_mostly
> to be sure not to unnecessarily hit false sharing while going idle.

Fixed.

> 
> > +		while (!need_resched()) {
> > +			cpu_relax();
> > +			now = ktime_get();
> > +
> > +			if (!ktime_before(now, end_spin)) {
> > +				do_halt = 1;
> > +				break;
> > +			}
> > +		}
> 
> On skylake pause takes ~75 cycles with ple_gap=0 (and Marcelo found it
> takes 6 cycles with pause loop exiting enabled but that shall be fixed
> in the CPU and we can ignore it).

Right, that is a generic problem.

> So we could call ktime_get() only once every 100 times or more and
> we'd be still accurate down to at least 1usec.
> 
> Ideally we'd like a ktime_try_get() that will break the seqcount loop
> if read_seqcount_retry fails. Something like below pseudocode:
> 
> #define KTIME_ERR ((ktime_t) { .tv64 = 0 })
> 
> ktime_t ktime_try_get(void)
> {
> 	[..]
> 	seq = read_seqcount_begin(&timekeeper_seq);
> 	secs = tk->xtime_sec + tk->wall_to_monotonic.tv_sec;
> 	nsecs = timekeeping_get_ns(&tk->tkr_mono) +
> 		tk->wall_to_monotonic.tv_nsec;
> 	if (unlikely(read_seqcount_retry(&timekeeper_seq, seq)))
> 		return KTIME_ERR;
> 	[..]
> }
> 
> If it ktime_try_get() fails we keep calling it at every iteration of
> the loop, when finally it succeeds we call it again only after 100
> pause instructions or more. So we continue polling need_resched()
> while we wait timerkeeper_seq to be released (and hopefully by looping
> 100 times or more we'll reduce the frequency when we find
> timekeeper_seq locked).
> 
> All we care is to react to need_resched ASAP and to have a resolution
> of the order of 1usec for the spin time.
> 
> If 100 is too wired a new module parameter in __read_mostly section
> configured to 100 or more by default, sounds preferable than hitting
> every 75nsec on the timekeeper_seq seqcount cacheline.

I'll switch to the cheaper sched_clock as suggested by Peter. 

Thanks!

> 
> I doubt it'd make any measurable difference with a few vcpus, but with
> hundred of host CPUs and vcpus perhaps it's worth it.
> 
> This of course can be done later once the patch is merged and if it's
> confirmed the above makes sense in practice and not just in theory. I
> wouldn't want to delay the merging for a possible micro optimization.
> 
> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
> 
> Thanks,
> Andrea

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch v2 3/3] cpuidle-haltpoll: disable host side polling when kvm virtualized
  2019-06-06 18:25     ` Joao Martins
  2019-06-06 18:36       ` Andrea Arcangeli
@ 2019-06-07 20:25       ` Marcelo Tosatti
  1 sibling, 0 replies; 29+ messages in thread
From: Marcelo Tosatti @ 2019-06-07 20:25 UTC (permalink / raw)
  To: Joao Martins
  Cc: kvm-devel, Paolo Bonzini, Radim Krcmar, Andrea Arcangeli,
	Rafael J. Wysocki, Peter Zijlstra, Wanpeng Li,
	Konrad Rzeszutek Wilk, Raslan KarimAllah, Boris Ostrovsky,
	Ankur Arora, Christian Borntraeger

Hi Joao,

On Thu, Jun 06, 2019 at 07:25:28PM +0100, Joao Martins wrote:
> On 6/4/19 1:24 PM, Marcelo Tosatti wrote:
> > 
> > When performing guest side polling, it is not necessary to 
> > also perform host side polling. 
> > 
> > So disable host side polling, via the new MSR interface, 
> > when loading cpuidle-haltpoll driver.
> > 
> > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> > 
> > ---
> > 
> > v2: remove extra "}"
> > 
> >  arch/x86/Kconfig                        |    7 +++++
> >  arch/x86/include/asm/cpuidle_haltpoll.h |    8 ++++++
> >  arch/x86/kernel/kvm.c                   |   40 ++++++++++++++++++++++++++++++++
> >  drivers/cpuidle/cpuidle-haltpoll.c      |    9 ++++++-
> >  include/linux/cpuidle_haltpoll.h        |   16 ++++++++++++
> >  5 files changed, 79 insertions(+), 1 deletion(-)
> > 
> > Index: linux-2.6.git/arch/x86/include/asm/cpuidle_haltpoll.h
> > ===================================================================
> > --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> > +++ linux-2.6.git/arch/x86/include/asm/cpuidle_haltpoll.h	2019-06-03 19:38:42.328718617 -0300
> > @@ -0,0 +1,8 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +#ifndef _ARCH_HALTPOLL_H
> > +#define _ARCH_HALTPOLL_H
> > +
> > +void arch_haltpoll_enable(void);
> > +void arch_haltpoll_disable(void);
> > +
> > +#endif
> > Index: linux-2.6.git/drivers/cpuidle/cpuidle-haltpoll.c
> > ===================================================================
> > --- linux-2.6.git.orig/drivers/cpuidle/cpuidle-haltpoll.c	2019-06-03 19:38:12.376619124 -0300
> > +++ linux-2.6.git/drivers/cpuidle/cpuidle-haltpoll.c	2019-06-03 19:38:42.328718617 -0300
> > @@ -15,6 +15,7 @@
> >  #include <linux/module.h>
> >  #include <linux/timekeeping.h>
> >  #include <linux/sched/idle.h>
> > +#include <linux/cpuidle_haltpoll.h>
> >  #define CREATE_TRACE_POINTS
> >  #include "cpuidle-haltpoll-trace.h"
> >  
> > @@ -157,11 +158,17 @@
> >  
> >  static int __init haltpoll_init(void)
> >  {
> > -	return cpuidle_register(&haltpoll_driver, NULL);
> > +	int ret = cpuidle_register(&haltpoll_driver, NULL);
> > +
> > +	if (ret == 0)
> > +		arch_haltpoll_enable();
> > +
> > +	return ret;
> >  }
> >  
> >  static void __exit haltpoll_exit(void)
> >  {
> > +	arch_haltpoll_disable();
> >  	cpuidle_unregister(&haltpoll_driver);
> >  }
> >  
> > Index: linux-2.6.git/include/linux/cpuidle_haltpoll.h
> > ===================================================================
> > --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> > +++ linux-2.6.git/include/linux/cpuidle_haltpoll.h	2019-06-03 19:41:57.293366260 -0300
> > @@ -0,0 +1,16 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +#ifndef _CPUIDLE_HALTPOLL_H
> > +#define _CPUIDLE_HALTPOLL_H
> > +
> > +#ifdef CONFIG_ARCH_CPUIDLE_HALTPOLL
> > +#include <asm/cpuidle_haltpoll.h>
> > +#else
> > +static inline void arch_haltpoll_enable(void)
> > +{
> > +}
> > +
> > +static inline void arch_haltpoll_disable(void)
> > +{
> > +}
> > +#endif
> > +#endif
> > Index: linux-2.6.git/arch/x86/Kconfig
> > ===================================================================
> > --- linux-2.6.git.orig/arch/x86/Kconfig	2019-06-03 19:38:12.376619124 -0300
> > +++ linux-2.6.git/arch/x86/Kconfig	2019-06-03 19:42:34.478489868 -0300
> > @@ -787,6 +787,7 @@
> >  	bool "KVM Guest support (including kvmclock)"
> >  	depends on PARAVIRT
> >  	select PARAVIRT_CLOCK
> > +	select ARCH_CPUIDLE_HALTPOLL
> >  	default y
> >  	---help---
> >  	  This option enables various optimizations for running under the KVM
> > @@ -795,6 +796,12 @@
> >  	  underlying device model, the host provides the guest with
> >  	  timing infrastructure such as time of day, and system time
> >  
> > +config ARCH_CPUIDLE_HALTPOLL
> > +        def_bool n
> > +        proUmpt "Disable host haltpoll when loading haltpoll driver"
> > +        help
> > +	  If virtualized under KVM, disable host haltpoll.
> > +
> >  config PVH
> >  	bool "Support for running PVH guests"
> >  	---help---
> > Index: linux-2.6.git/arch/x86/kernel/kvm.c
> > ===================================================================
> > --- linux-2.6.git.orig/arch/x86/kernel/kvm.c	2019-06-03 19:38:12.376619124 -0300
> > +++ linux-2.6.git/arch/x86/kernel/kvm.c	2019-06-03 19:40:14.359024312 -0300
> > @@ -853,3 +853,43 @@
> >  }
> >  
> >  #endif	/* CONFIG_PARAVIRT_SPINLOCKS */
> > +
> > +#ifdef CONFIG_ARCH_CPUIDLE_HALTPOLL
> > +
> > +void kvm_disable_host_haltpoll(void *i)
> > +{
> > +	wrmsrl(MSR_KVM_POLL_CONTROL, 0);
> > +}
> > +
> > +void kvm_enable_host_haltpoll(void *i)
> > +{
> > +	wrmsrl(MSR_KVM_POLL_CONTROL, 1);
> > +}
> > +
> > +void arch_haltpoll_enable(void)
> > +{
> > +	if (!kvm_para_has_feature(KVM_FEATURE_POLL_CONTROL))
> > +		return;
> > +
> 
> Perhaps warn the user when failing to disable host poll e.g.:
> 
> if (!kvm_para_has_feature(KVM_FEATURE_POLL_CONTROL)) {
> 	pr_warn_once("haltpoll: Failed to disable host halt polling\n");
> 	return;
> }
> 
> But I wonder whether we should fail to load cpuidle-haltpoll when host halt
> polling can't be disabled[*]? That is to avoid polling in both host and guest
> and *possibly* avoid chances for performance regressions when running on older
> hypervisors?
> 
> [*] with guest still able load with lack of host polling control via modparam

I don't think so. First, the driver reduces the halt poll interval when
the system is idle, so overhead is reduced to zero in such cases.

Moreover, when both host and guest side polling are performed, 
power might be wasted, but i haven't seen a significant 
performance regression caused by it.

Thanks.

> 
> > +	preempt_disable();
> > +	/* Enabling guest halt poll disables host halt poll */
> > +	kvm_disable_host_haltpoll(NULL);
> > +	smp_call_function(kvm_disable_host_haltpoll, NULL, 1);
> > +	preempt_enable();
> > +}
> > +EXPORT_SYMBOL_GPL(arch_haltpoll_enable);
> > +
> > +void arch_haltpoll_disable(void)
> > +{
> > +	if (!kvm_para_has_feature(KVM_FEATURE_POLL_CONTROL))
> > +		return;
> > +
> > +	preempt_disable();
> > +	/* Enabling guest halt poll disables host halt poll */
> > +	kvm_enable_host_haltpoll(NULL);
> > +	smp_call_function(kvm_enable_host_haltpoll, NULL, 1);
> > +	preempt_enable();
> > +}
> > +
> > +EXPORT_SYMBOL_GPL(arch_haltpoll_disable);
> > +#endif
> > 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch v2 3/3] cpuidle-haltpoll: disable host side polling when kvm virtualized
  2019-06-06 18:36       ` Andrea Arcangeli
  2019-06-06 18:51         ` Joao Martins
@ 2019-06-07 20:38         ` Marcelo Tosatti
  1 sibling, 0 replies; 29+ messages in thread
From: Marcelo Tosatti @ 2019-06-07 20:38 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Joao Martins, kvm-devel, Paolo Bonzini, Radim Krcmar,
	Rafael J. Wysocki, Peter Zijlstra, Wanpeng Li,
	Konrad Rzeszutek Wilk, Raslan KarimAllah, Boris Ostrovsky,
	Ankur Arora, Christian Borntraeger

On Thu, Jun 06, 2019 at 02:36:32PM -0400, Andrea Arcangeli wrote:
> Hello,
> 
> On Thu, Jun 06, 2019 at 07:25:28PM +0100, Joao Martins wrote:
> > But I wonder whether we should fail to load cpuidle-haltpoll when host halt
> > polling can't be disabled[*]? That is to avoid polling in both host and guest
> > and *possibly* avoid chances for performance regressions when running on older
> > hypervisors?
> 
> I don't think it's necessary: that would force an upgrade of the host
> KVM version in order to use the guest haltpoll feature with an
> upgraded guest kernel that can use the guest haltpoll.
> 
> The guest haltpoll is self contained in the guest, so there's no
> reason to prevent that by design or to force upgrade of the KVM host
> version. It'd be more than enough to reload kvm.ko in the host with
> the host haltpoll set to zero with the module parameter already
> available, to achieve the same runtime without requiring a forced host
> upgrade.
> 
> The warning however sounds sensible.

Alright, added a warning:

void arch_haltpoll_enable(void)
{
        if (!kvm_para_has_feature(KVM_FEATURE_POLL_CONTROL)) {
                WARN_ONCE(1, "kvm: host does not support poll control, can't "
                             "disable host polling (host upgrade "
                             " recommended).\n"
                return;
        }

Thanks


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch 0/3] cpuidle-haltpoll driver (v2)
  2019-06-07 18:22     ` Paolo Bonzini
@ 2019-06-07 21:38       ` Marcelo Tosatti
  0 siblings, 0 replies; 29+ messages in thread
From: Marcelo Tosatti @ 2019-06-07 21:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Rafael J. Wysocki, kvm-devel, Radim KrÄ?máÅ?,
	Andrea Arcangeli, Peter Zijlstra, Wanpeng Li,
	Konrad Rzeszutek Wilk, Raslan KarimAllah, Boris Ostrovsky,
	Ankur Arora, Christian Borntraeger, Linux PM

On Fri, Jun 07, 2019 at 08:22:35PM +0200, Paolo Bonzini wrote:
> On 07/06/19 19:16, Marcelo Tosatti wrote:
> > There is no "target residency" concept in the virtualized use-case 
> > (which is what poll_state.c uses to calculate the poll time).
> 
> Actually there is: it is the cost of a vmexit, and it be calibrated with
> a very short CPUID loop (e.g. run 100 CPUID instructions and take the
> smallest TSC interval---it should take less than 50 microseconds, and
> less than a millisecond even on nested virt).

For a given application, you want to configure the poll time to the
maximum time an event happen after starting the idle procedure. For SAP
HANA, that value is between 200us - 800us (most tests require less than
800us, but some require 800us, to significantly avoid the IPIs).

"The target residency is the minimum time the hardware must spend in the
given state, including the time needed to enter it (which may be
substantial), in order to save more energy than it would save by
entering one of the shallower idle states instead."

Clearly these are two different things...

> I think it would make sense to improve poll_state.c to use an adaptive
> algorithm similar to the one you implemented, which includes optionally
> allowing to poll for an interval larger than the target residency.
> 
> Paolo

Ok, so i'll move the adaptive code to poll_state.c, where
the driver selects whether to use target_residency or the 
adaptive value (based on module parameters).

Not sure if its an adaptible value is desirable for 
the non virtualized case.



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch 0/3] cpuidle-haltpoll driver (v2)
  2019-06-07  9:49 ` [patch 0/3] cpuidle-haltpoll driver (v2) Rafael J. Wysocki
  2019-06-07 17:16   ` Marcelo Tosatti
@ 2019-06-10 14:59   ` Marcelo Tosatti
  2019-06-10 22:03     ` Rafael J. Wysocki
  1 sibling, 1 reply; 29+ messages in thread
From: Marcelo Tosatti @ 2019-06-10 14:59 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: kvm-devel, Paolo Bonzini, Radim KrÄ?máÅ?,
	Andrea Arcangeli, Peter Zijlstra, Wanpeng Li,
	Konrad Rzeszutek Wilk, Raslan KarimAllah, Boris Ostrovsky,
	Ankur Arora, Christian Borntraeger, Linux PM

On Fri, Jun 07, 2019 at 11:49:51AM +0200, Rafael J. Wysocki wrote:
> On 6/4/2019 12:52 AM, Marcelo Tosatti wrote:
> >The cpuidle-haltpoll driver allows the guest vcpus to poll for a specified
> >amount of time before halting. This provides the following benefits
> >to host side polling:
> >
> >         1) The POLL flag is set while polling is performed, which allows
> >            a remote vCPU to avoid sending an IPI (and the associated
> >            cost of handling the IPI) when performing a wakeup.
> >
> >         2) The HLT VM-exit cost can be avoided.
> >
> >The downside of guest side polling is that polling is performed
> >even with other runnable tasks in the host.
> >
> >Results comparing halt_poll_ns and server/client application
> >where a small packet is ping-ponged:
> >
> >host                                        --> 31.33
> >halt_poll_ns=300000 / no guest busy spin    --> 33.40   (93.8%)
> >halt_poll_ns=0 / guest_halt_poll_ns=300000  --> 32.73   (95.7%)
> >
> >For the SAP HANA benchmarks (where idle_spin is a parameter
> >of the previous version of the patch, results should be the
> >same):
> >
> >hpns == halt_poll_ns
> >
> >                           idle_spin=0/   idle_spin=800/    idle_spin=0/
> >                           hpns=200000    hpns=0            hpns=800000
> >DeleteC06T03 (100 thread) 1.76           1.71 (-3%)        1.78   (+1%)
> >InsertC16T02 (100 thread) 2.14           2.07 (-3%)        2.18   (+1.8%)
> >DeleteC00T01 (1 thread)   1.34           1.28 (-4.5%)	   1.29   (-3.7%)
> >UpdateC00T03 (1 thread)   4.72           4.18 (-12%)	   4.53   (-5%)
> >
> >V2:
> >
> >- Move from x86 to generic code (Paolo/Christian).
> >- Add auto-tuning logic (Paolo).
> >- Add MSR to disable host side polling (Paolo).
> >
> >
> >
> First of all, please CC power management patches (including cpuidle,
> cpufreq etc) to linux-pm@vger.kernel.org (there are people on that
> list who may want to see your changes before they go in) and CC
> cpuidle material (in particular) to Peter Zijlstra.
> 
> Second, I'm not a big fan of this approach to be honest, as it kind
> of is a driver trying to play the role of a governor.
> 
> We have a "polling state" already that could be used here in
> principle so I wonder what would be wrong with that.  Also note that
> there seems to be at least some code duplication between your code
> and the "polling state" implementation, so maybe it would be
> possible to do some things in a common way?

Hi Rafael,

After modifying poll_state.c to use a generic "poll time" driver 
callback [1] (since using a variable "target_residency" for that 
looks really ugly), would need a governor which does:

haltpoll_governor_select_next_state()
	if (prev_state was poll and evt happened on prev poll window) -> POLL.
	if (prev_state == HLT)	-> POLL
	otherwise 		-> HLT

And a "default_idle" cpuidle driver that:

defaultidle_idle()
	if (current_clr_polling_and_test()) {
		local_irq_enable();
		return index;
	}
	default_idle(); 
	return

Using such governor with any other cpuidle driver would 
be pointless (since it would enter the first state only
and therefore not save power).

Not certain about using the default_idle driver with 
other governors: one would rather use a driver that 
supports all states on a given machine.

This combination of governor/driver pair, for the sake
of sharing the idle loop, seems awkward to me.
And fails the governor/driver separation: one will use the
pair in practice.

But i have no problem with it, so i'll proceed with that.

Let me know otherwise.

Thanks.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch 0/3] cpuidle-haltpoll driver (v2)
  2019-06-10 14:59   ` Marcelo Tosatti
@ 2019-06-10 22:03     ` Rafael J. Wysocki
  2019-06-11 14:26       ` Marcelo Tosatti
  0 siblings, 1 reply; 29+ messages in thread
From: Rafael J. Wysocki @ 2019-06-10 22:03 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Rafael J. Wysocki, kvm-devel, Paolo Bonzini,
	Radim KrÄ?máÅ?,
	Andrea Arcangeli, Peter Zijlstra, Wanpeng Li,
	Konrad Rzeszutek Wilk, Raslan KarimAllah, Boris Ostrovsky,
	Ankur Arora, Christian Borntraeger, Linux PM

On Mon, Jun 10, 2019 at 5:00 PM Marcelo Tosatti <mtosatti@redhat.com> wrote:
>
> On Fri, Jun 07, 2019 at 11:49:51AM +0200, Rafael J. Wysocki wrote:
> > On 6/4/2019 12:52 AM, Marcelo Tosatti wrote:
> > >The cpuidle-haltpoll driver allows the guest vcpus to poll for a specified
> > >amount of time before halting. This provides the following benefits
> > >to host side polling:
> > >
> > >         1) The POLL flag is set while polling is performed, which allows
> > >            a remote vCPU to avoid sending an IPI (and the associated
> > >            cost of handling the IPI) when performing a wakeup.
> > >
> > >         2) The HLT VM-exit cost can be avoided.
> > >
> > >The downside of guest side polling is that polling is performed
> > >even with other runnable tasks in the host.
> > >
> > >Results comparing halt_poll_ns and server/client application
> > >where a small packet is ping-ponged:
> > >
> > >host                                        --> 31.33
> > >halt_poll_ns=300000 / no guest busy spin    --> 33.40   (93.8%)
> > >halt_poll_ns=0 / guest_halt_poll_ns=300000  --> 32.73   (95.7%)
> > >
> > >For the SAP HANA benchmarks (where idle_spin is a parameter
> > >of the previous version of the patch, results should be the
> > >same):
> > >
> > >hpns == halt_poll_ns
> > >
> > >                           idle_spin=0/   idle_spin=800/    idle_spin=0/
> > >                           hpns=200000    hpns=0            hpns=800000
> > >DeleteC06T03 (100 thread) 1.76           1.71 (-3%)        1.78   (+1%)
> > >InsertC16T02 (100 thread) 2.14           2.07 (-3%)        2.18   (+1.8%)
> > >DeleteC00T01 (1 thread)   1.34           1.28 (-4.5%)           1.29   (-3.7%)
> > >UpdateC00T03 (1 thread)   4.72           4.18 (-12%)    4.53   (-5%)
> > >
> > >V2:
> > >
> > >- Move from x86 to generic code (Paolo/Christian).
> > >- Add auto-tuning logic (Paolo).
> > >- Add MSR to disable host side polling (Paolo).
> > >
> > >
> > >
> > First of all, please CC power management patches (including cpuidle,
> > cpufreq etc) to linux-pm@vger.kernel.org (there are people on that
> > list who may want to see your changes before they go in) and CC
> > cpuidle material (in particular) to Peter Zijlstra.
> >
> > Second, I'm not a big fan of this approach to be honest, as it kind
> > of is a driver trying to play the role of a governor.
> >
> > We have a "polling state" already that could be used here in
> > principle so I wonder what would be wrong with that.  Also note that
> > there seems to be at least some code duplication between your code
> > and the "polling state" implementation, so maybe it would be
> > possible to do some things in a common way?
>
> Hi Rafael,
>
> After modifying poll_state.c to use a generic "poll time" driver
> callback [1] (since using a variable "target_residency" for that
> looks really ugly), would need a governor which does:
>
> haltpoll_governor_select_next_state()
>         if (prev_state was poll and evt happened on prev poll window) -> POLL.
>         if (prev_state == HLT)  -> POLL
>         otherwise               -> HLT
>
> And a "default_idle" cpuidle driver that:
>
> defaultidle_idle()
>         if (current_clr_polling_and_test()) {
>                 local_irq_enable();
>                 return index;
>         }
>         default_idle();
>         return
>
> Using such governor with any other cpuidle driver would
> be pointless (since it would enter the first state only
> and therefore not save power).
>
> Not certain about using the default_idle driver with
> other governors: one would rather use a driver that
> supports all states on a given machine.
>
> This combination of governor/driver pair, for the sake
> of sharing the idle loop, seems awkward to me.
> And fails the governor/driver separation: one will use the
> pair in practice.
>
> But i have no problem with it, so i'll proceed with that.
>
> Let me know otherwise.

If my understanding of your argumentation is correct, it is only
necessary to take the default_idle_call() branch of
cpuidle_idle_call() in the VM case, so it should be sufficient to
provide a suitable default_idle_call() which is what you seem to be
trying to do.

I might have been confused by the terminology used in the patch series
if that's the case.

Also, if that's the case, this is not cpuidle matter really.  It is a
matter of providing a better default_idle_call() for the arch at hand.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch 0/3] cpuidle-haltpoll driver (v2)
  2019-06-10 22:03     ` Rafael J. Wysocki
@ 2019-06-11 14:26       ` Marcelo Tosatti
  2019-06-11 21:24         ` Rafael J. Wysocki
  0 siblings, 1 reply; 29+ messages in thread
From: Marcelo Tosatti @ 2019-06-11 14:26 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, kvm-devel, Paolo Bonzini,
	Radim KrÄ?máÅ?,
	Andrea Arcangeli, Peter Zijlstra, Wanpeng Li,
	Konrad Rzeszutek Wilk, Raslan KarimAllah, Boris Ostrovsky,
	Ankur Arora, Christian Borntraeger, Linux PM

On Tue, Jun 11, 2019 at 12:03:26AM +0200, Rafael J. Wysocki wrote:
> On Mon, Jun 10, 2019 at 5:00 PM Marcelo Tosatti <mtosatti@redhat.com> wrote:
> >
> > On Fri, Jun 07, 2019 at 11:49:51AM +0200, Rafael J. Wysocki wrote:
> > > On 6/4/2019 12:52 AM, Marcelo Tosatti wrote:
> > > >The cpuidle-haltpoll driver allows the guest vcpus to poll for a specified
> > > >amount of time before halting. This provides the following benefits
> > > >to host side polling:
> > > >
> > > >         1) The POLL flag is set while polling is performed, which allows
> > > >            a remote vCPU to avoid sending an IPI (and the associated
> > > >            cost of handling the IPI) when performing a wakeup.
> > > >
> > > >         2) The HLT VM-exit cost can be avoided.
> > > >
> > > >The downside of guest side polling is that polling is performed
> > > >even with other runnable tasks in the host.
> > > >
> > > >Results comparing halt_poll_ns and server/client application
> > > >where a small packet is ping-ponged:
> > > >
> > > >host                                        --> 31.33
> > > >halt_poll_ns=300000 / no guest busy spin    --> 33.40   (93.8%)
> > > >halt_poll_ns=0 / guest_halt_poll_ns=300000  --> 32.73   (95.7%)
> > > >
> > > >For the SAP HANA benchmarks (where idle_spin is a parameter
> > > >of the previous version of the patch, results should be the
> > > >same):
> > > >
> > > >hpns == halt_poll_ns
> > > >
> > > >                           idle_spin=0/   idle_spin=800/    idle_spin=0/
> > > >                           hpns=200000    hpns=0            hpns=800000
> > > >DeleteC06T03 (100 thread) 1.76           1.71 (-3%)        1.78   (+1%)
> > > >InsertC16T02 (100 thread) 2.14           2.07 (-3%)        2.18   (+1.8%)
> > > >DeleteC00T01 (1 thread)   1.34           1.28 (-4.5%)           1.29   (-3.7%)
> > > >UpdateC00T03 (1 thread)   4.72           4.18 (-12%)    4.53   (-5%)
> > > >
> > > >V2:
> > > >
> > > >- Move from x86 to generic code (Paolo/Christian).
> > > >- Add auto-tuning logic (Paolo).
> > > >- Add MSR to disable host side polling (Paolo).
> > > >
> > > >
> > > >
> > > First of all, please CC power management patches (including cpuidle,
> > > cpufreq etc) to linux-pm@vger.kernel.org (there are people on that
> > > list who may want to see your changes before they go in) and CC
> > > cpuidle material (in particular) to Peter Zijlstra.
> > >
> > > Second, I'm not a big fan of this approach to be honest, as it kind
> > > of is a driver trying to play the role of a governor.
> > >
> > > We have a "polling state" already that could be used here in
> > > principle so I wonder what would be wrong with that.  Also note that
> > > there seems to be at least some code duplication between your code
> > > and the "polling state" implementation, so maybe it would be
> > > possible to do some things in a common way?
> >
> > Hi Rafael,
> >
> > After modifying poll_state.c to use a generic "poll time" driver
> > callback [1] (since using a variable "target_residency" for that
> > looks really ugly), would need a governor which does:
> >
> > haltpoll_governor_select_next_state()
> >         if (prev_state was poll and evt happened on prev poll window) -> POLL.
> >         if (prev_state == HLT)  -> POLL
> >         otherwise               -> HLT
> >
> > And a "default_idle" cpuidle driver that:
> >
> > defaultidle_idle()
> >         if (current_clr_polling_and_test()) {
> >                 local_irq_enable();
> >                 return index;
> >         }
> >         default_idle();
> >         return
> >
> > Using such governor with any other cpuidle driver would
> > be pointless (since it would enter the first state only
> > and therefore not save power).
> >
> > Not certain about using the default_idle driver with
> > other governors: one would rather use a driver that
> > supports all states on a given machine.
> >
> > This combination of governor/driver pair, for the sake
> > of sharing the idle loop, seems awkward to me.
> > And fails the governor/driver separation: one will use the
> > pair in practice.
> >
> > But i have no problem with it, so i'll proceed with that.
> >
> > Let me know otherwise.
> 
> If my understanding of your argumentation is correct, it is only
> necessary to take the default_idle_call() branch of
> cpuidle_idle_call() in the VM case, so it should be sufficient to
> provide a suitable default_idle_call() which is what you seem to be
> trying to do.

In the VM case, we need to poll before actually halting (this is because
its tricky to implement MWAIT in guests, so polling for some amount 
of time allows the IPI avoidance optimization,
see trace_sched_wake_idle_without_ipi, to take place). 

The amount of time we poll is variable and adjusted (see adjust_haltpoll_ns 
in the patchset).

> I might have been confused by the terminology used in the patch series
> if that's the case.
> 
> Also, if that's the case, this is not cpuidle matter really.  It is a
> matter of providing a better default_idle_call() for the arch at hand.

Peter Zijlstra suggested a cpuidle driver for this. 

Also, other architectures will use the same "poll before exiting to VM"
logic (so we'd rather avoid duplicating this code): PPC, x86, S/390,
MIPS... So in my POV it makes sense to unify this.

So, back to your initial suggestion: 

Q) "Can you unify code with poll_state.c?" 
A) Yes, but it requires a new governor, which seems overkill and unfit
for the purpose.

Moreover, the logic in menu to decide whether its necessary or not
to stop sched tick is useful for us (so a default_idle_call is not
sufficient), because the cost of enabling/disabling the sched tick is
high on VMs.

So i'll fix the comments of the cpuidle driver (which everyone seems
to agree with, except your understandable distate for it) and repost.




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch 0/3] cpuidle-haltpoll driver (v2)
  2019-06-11 14:26       ` Marcelo Tosatti
@ 2019-06-11 21:24         ` Rafael J. Wysocki
  2019-06-17 15:57           ` Peter Zijlstra
  0 siblings, 1 reply; 29+ messages in thread
From: Rafael J. Wysocki @ 2019-06-11 21:24 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Rafael J. Wysocki, Rafael J. Wysocki, kvm-devel, Paolo Bonzini,
	Radim KrÄ?máÅ?,
	Andrea Arcangeli, Peter Zijlstra, Wanpeng Li,
	Konrad Rzeszutek Wilk, Raslan KarimAllah, Boris Ostrovsky,
	Ankur Arora, Christian Borntraeger, Linux PM

On Tue, Jun 11, 2019 at 4:27 PM Marcelo Tosatti <mtosatti@redhat.com> wrote:
>
> On Tue, Jun 11, 2019 at 12:03:26AM +0200, Rafael J. Wysocki wrote:
> > On Mon, Jun 10, 2019 at 5:00 PM Marcelo Tosatti <mtosatti@redhat.com> wrote:
> > >
> > > On Fri, Jun 07, 2019 at 11:49:51AM +0200, Rafael J. Wysocki wrote:
> > > > On 6/4/2019 12:52 AM, Marcelo Tosatti wrote:
> > > > >The cpuidle-haltpoll driver allows the guest vcpus to poll for a specified
> > > > >amount of time before halting. This provides the following benefits
> > > > >to host side polling:
> > > > >
> > > > >         1) The POLL flag is set while polling is performed, which allows
> > > > >            a remote vCPU to avoid sending an IPI (and the associated
> > > > >            cost of handling the IPI) when performing a wakeup.
> > > > >
> > > > >         2) The HLT VM-exit cost can be avoided.
> > > > >
> > > > >The downside of guest side polling is that polling is performed
> > > > >even with other runnable tasks in the host.
> > > > >
> > > > >Results comparing halt_poll_ns and server/client application
> > > > >where a small packet is ping-ponged:
> > > > >
> > > > >host                                        --> 31.33
> > > > >halt_poll_ns=300000 / no guest busy spin    --> 33.40   (93.8%)
> > > > >halt_poll_ns=0 / guest_halt_poll_ns=300000  --> 32.73   (95.7%)
> > > > >
> > > > >For the SAP HANA benchmarks (where idle_spin is a parameter
> > > > >of the previous version of the patch, results should be the
> > > > >same):
> > > > >
> > > > >hpns == halt_poll_ns
> > > > >
> > > > >                           idle_spin=0/   idle_spin=800/    idle_spin=0/
> > > > >                           hpns=200000    hpns=0            hpns=800000
> > > > >DeleteC06T03 (100 thread) 1.76           1.71 (-3%)        1.78   (+1%)
> > > > >InsertC16T02 (100 thread) 2.14           2.07 (-3%)        2.18   (+1.8%)
> > > > >DeleteC00T01 (1 thread)   1.34           1.28 (-4.5%)           1.29   (-3.7%)
> > > > >UpdateC00T03 (1 thread)   4.72           4.18 (-12%)    4.53   (-5%)
> > > > >
> > > > >V2:
> > > > >
> > > > >- Move from x86 to generic code (Paolo/Christian).
> > > > >- Add auto-tuning logic (Paolo).
> > > > >- Add MSR to disable host side polling (Paolo).
> > > > >
> > > > >
> > > > >
> > > > First of all, please CC power management patches (including cpuidle,
> > > > cpufreq etc) to linux-pm@vger.kernel.org (there are people on that
> > > > list who may want to see your changes before they go in) and CC
> > > > cpuidle material (in particular) to Peter Zijlstra.
> > > >
> > > > Second, I'm not a big fan of this approach to be honest, as it kind
> > > > of is a driver trying to play the role of a governor.
> > > >
> > > > We have a "polling state" already that could be used here in
> > > > principle so I wonder what would be wrong with that.  Also note that
> > > > there seems to be at least some code duplication between your code
> > > > and the "polling state" implementation, so maybe it would be
> > > > possible to do some things in a common way?
> > >
> > > Hi Rafael,
> > >
> > > After modifying poll_state.c to use a generic "poll time" driver
> > > callback [1] (since using a variable "target_residency" for that
> > > looks really ugly), would need a governor which does:
> > >
> > > haltpoll_governor_select_next_state()
> > >         if (prev_state was poll and evt happened on prev poll window) -> POLL.
> > >         if (prev_state == HLT)  -> POLL
> > >         otherwise               -> HLT
> > >
> > > And a "default_idle" cpuidle driver that:
> > >
> > > defaultidle_idle()
> > >         if (current_clr_polling_and_test()) {
> > >                 local_irq_enable();
> > >                 return index;
> > >         }
> > >         default_idle();
> > >         return
> > >
> > > Using such governor with any other cpuidle driver would
> > > be pointless (since it would enter the first state only
> > > and therefore not save power).
> > >
> > > Not certain about using the default_idle driver with
> > > other governors: one would rather use a driver that
> > > supports all states on a given machine.
> > >
> > > This combination of governor/driver pair, for the sake
> > > of sharing the idle loop, seems awkward to me.
> > > And fails the governor/driver separation: one will use the
> > > pair in practice.
> > >
> > > But i have no problem with it, so i'll proceed with that.
> > >
> > > Let me know otherwise.
> >
> > If my understanding of your argumentation is correct, it is only
> > necessary to take the default_idle_call() branch of
> > cpuidle_idle_call() in the VM case, so it should be sufficient to
> > provide a suitable default_idle_call() which is what you seem to be
> > trying to do.
>
> In the VM case, we need to poll before actually halting (this is because
> its tricky to implement MWAIT in guests, so polling for some amount
> of time allows the IPI avoidance optimization,
> see trace_sched_wake_idle_without_ipi, to take place).
>
> The amount of time we poll is variable and adjusted (see adjust_haltpoll_ns
> in the patchset).
>
> > I might have been confused by the terminology used in the patch series
> > if that's the case.
> >
> > Also, if that's the case, this is not cpuidle matter really.  It is a
> > matter of providing a better default_idle_call() for the arch at hand.
>
> Peter Zijlstra suggested a cpuidle driver for this.

So I wonder what his rationale was.

> Also, other architectures will use the same "poll before exiting to VM"
> logic (so we'd rather avoid duplicating this code): PPC, x86, S/390,
> MIPS... So in my POV it makes sense to unify this.

The logic is fine IMO, but the implementation here is questionable.

> So, back to your initial suggestion:
>
> Q) "Can you unify code with poll_state.c?"
> A) Yes, but it requires a new governor, which seems overkill and unfit
> for the purpose.
>
> Moreover, the logic in menu to decide whether its necessary or not
> to stop sched tick is useful for us (so a default_idle_call is not
> sufficient), because the cost of enabling/disabling the sched tick is
> high on VMs.

So in fact you need a governor, but you really only need it to decide
whether or not to stop the tick for you.

menu has a quite high overhead for that. :-)

> So i'll fix the comments of the cpuidle driver (which everyone seems
> to agree with, except your understandable distate for it) and repost.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch 0/3] cpuidle-haltpoll driver (v2)
  2019-06-11 21:24         ` Rafael J. Wysocki
@ 2019-06-17 15:57           ` Peter Zijlstra
  0 siblings, 0 replies; 29+ messages in thread
From: Peter Zijlstra @ 2019-06-17 15:57 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Marcelo Tosatti, Rafael J. Wysocki, kvm-devel, Paolo Bonzini,
	Radim KrÄ?máÅ?,
	Andrea Arcangeli, Wanpeng Li, Konrad Rzeszutek Wilk,
	Raslan KarimAllah, Boris Ostrovsky, Ankur Arora,
	Christian Borntraeger, Linux PM

On Tue, Jun 11, 2019 at 11:24:39PM +0200, Rafael J. Wysocki wrote:
> > Peter Zijlstra suggested a cpuidle driver for this.
> 
> So I wonder what his rationale was.

I was thinking we don't need this hard-coded in the idle loop when virt
can load a special driver.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [patch 2/3] kvm: x86: add host poll control msrs
  2019-06-11 19:40 [patch 0/3] cpuidle-haltpoll driver (v3) Marcelo Tosatti
@ 2019-06-11 19:40 ` Marcelo Tosatti
  0 siblings, 0 replies; 29+ messages in thread
From: Marcelo Tosatti @ 2019-06-11 19:40 UTC (permalink / raw)
  To: kvm-devel
  Cc: Paolo Bonzini, Radim Krcmar, Andrea Arcangeli, Rafael J. Wysocki,
	Peter Zijlstra, Wanpeng Li, Konrad Rzeszutek Wilk,
	Raslan KarimAllah, Boris Ostrovsky, Ankur Arora,
	Christian Borntraeger, linux-pm, Marcelo Tosatti

[-- Attachment #1: 02-pollcontrol-host.patch --]
[-- Type: text/plain, Size: 5252 bytes --]

Add an MSRs which allows the guest to disable 
host polling (specifically the cpuidle-haltpoll, 
when performing polling in the guest, disables
host side polling).

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

---
 Documentation/virtual/kvm/msr.txt    |    9 +++++++++
 arch/x86/include/asm/kvm_host.h      |    2 ++
 arch/x86/include/uapi/asm/kvm_para.h |    2 ++
 arch/x86/kvm/Kconfig                 |    1 +
 arch/x86/kvm/cpuid.c                 |    3 ++-
 arch/x86/kvm/x86.c                   |   22 ++++++++++++++++++++++
 6 files changed, 38 insertions(+), 1 deletion(-)

Index: linux-2.6.git/Documentation/virtual/kvm/msr.txt
===================================================================
--- linux-2.6.git.orig/Documentation/virtual/kvm/msr.txt	2019-06-11 15:54:16.500695464 -0300
+++ linux-2.6.git/Documentation/virtual/kvm/msr.txt	2019-06-11 16:04:09.917453903 -0300
@@ -273,3 +273,12 @@
 	guest must both read the least significant bit in the memory area and
 	clear it using a single CPU instruction, such as test and clear, or
 	compare and exchange.
+
+MSR_KVM_POLL_CONTROL: 0x4b564d05
+	Control host side polling.
+
+	data: Bit 0 enables (1) or disables (0) host halt poll
+	logic.
+	KVM guests can disable host halt polling when performing
+	polling themselves.
+
Index: linux-2.6.git/arch/x86/include/asm/kvm_host.h
===================================================================
--- linux-2.6.git.orig/arch/x86/include/asm/kvm_host.h	2019-06-11 15:54:16.500695464 -0300
+++ linux-2.6.git/arch/x86/include/asm/kvm_host.h	2019-06-11 16:04:09.917453903 -0300
@@ -755,6 +755,8 @@
 		struct gfn_to_hva_cache data;
 	} pv_eoi;
 
+	u64 msr_kvm_poll_control;
+
 	/*
 	 * Indicate whether the access faults on its page table in guest
 	 * which is set when fix page fault and used to detect unhandeable
Index: linux-2.6.git/arch/x86/include/uapi/asm/kvm_para.h
===================================================================
--- linux-2.6.git.orig/arch/x86/include/uapi/asm/kvm_para.h	2019-06-11 15:54:16.500695464 -0300
+++ linux-2.6.git/arch/x86/include/uapi/asm/kvm_para.h	2019-06-11 16:04:09.917453903 -0300
@@ -29,6 +29,7 @@
 #define KVM_FEATURE_PV_TLB_FLUSH	9
 #define KVM_FEATURE_ASYNC_PF_VMEXIT	10
 #define KVM_FEATURE_PV_SEND_IPI	11
+#define KVM_FEATURE_POLL_CONTROL	12
 
 #define KVM_HINTS_REALTIME      0
 
@@ -47,6 +48,7 @@
 #define MSR_KVM_ASYNC_PF_EN 0x4b564d02
 #define MSR_KVM_STEAL_TIME  0x4b564d03
 #define MSR_KVM_PV_EOI_EN      0x4b564d04
+#define MSR_KVM_POLL_CONTROL	0x4b564d05
 
 struct kvm_steal_time {
 	__u64 steal;
Index: linux-2.6.git/arch/x86/kvm/Kconfig
===================================================================
--- linux-2.6.git.orig/arch/x86/kvm/Kconfig	2019-06-11 15:54:16.500695464 -0300
+++ linux-2.6.git/arch/x86/kvm/Kconfig	2019-06-11 16:04:09.917453903 -0300
@@ -41,6 +41,7 @@
 	select PERF_EVENTS
 	select HAVE_KVM_MSI
 	select HAVE_KVM_CPU_RELAX_INTERCEPT
+	select HAVE_KVM_NO_POLL
 	select KVM_GENERIC_DIRTYLOG_READ_PROTECT
 	select KVM_VFIO
 	select SRCU
Index: linux-2.6.git/arch/x86/kvm/cpuid.c
===================================================================
--- linux-2.6.git.orig/arch/x86/kvm/cpuid.c	2019-06-11 15:54:16.500695464 -0300
+++ linux-2.6.git/arch/x86/kvm/cpuid.c	2019-06-11 16:04:09.918453923 -0300
@@ -643,7 +643,8 @@
 			     (1 << KVM_FEATURE_PV_UNHALT) |
 			     (1 << KVM_FEATURE_PV_TLB_FLUSH) |
 			     (1 << KVM_FEATURE_ASYNC_PF_VMEXIT) |
-			     (1 << KVM_FEATURE_PV_SEND_IPI);
+			     (1 << KVM_FEATURE_PV_SEND_IPI) |
+			     (1 << KVM_FEATURE_POLL_CONTROL);
 
 		if (sched_info_on())
 			entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
Index: linux-2.6.git/arch/x86/kvm/x86.c
===================================================================
--- linux-2.6.git.orig/arch/x86/kvm/x86.c	2019-06-11 15:54:16.500695464 -0300
+++ linux-2.6.git/arch/x86/kvm/x86.c	2019-06-11 16:04:48.002209329 -0300
@@ -1177,6 +1177,7 @@
 	MSR_IA32_POWER_CTL,
 
 	MSR_K7_HWCR,
+	MSR_KVM_POLL_CONTROL,
 };
 
 static unsigned num_emulated_msrs;
@@ -2628,6 +2629,14 @@
 			return 1;
 		break;
 
+	case MSR_KVM_POLL_CONTROL:
+		/* only enable bit supported */
+		if (data & (-1ULL << 1))
+			return 1;
+
+		vcpu->arch.msr_kvm_poll_control = data;
+		break;
+
 	case MSR_IA32_MCG_CTL:
 	case MSR_IA32_MCG_STATUS:
 	case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1:
@@ -2877,6 +2886,9 @@
 	case MSR_KVM_PV_EOI_EN:
 		msr_info->data = vcpu->arch.pv_eoi.msr_val;
 		break;
+	case MSR_KVM_POLL_CONTROL:
+		msr_info->data = vcpu->arch.msr_kvm_poll_control;
+		break;
 	case MSR_IA32_P5_MC_ADDR:
 	case MSR_IA32_P5_MC_TYPE:
 	case MSR_IA32_MCG_CAP:
@@ -8874,6 +8886,10 @@
 	msr.host_initiated = true;
 	kvm_write_tsc(vcpu, &msr);
 	vcpu_put(vcpu);
+
+	/* poll control enabled by default */
+	vcpu->arch.msr_kvm_poll_control = 1;
+
 	mutex_unlock(&vcpu->mutex);
 
 	if (!kvmclock_periodic_sync)
@@ -9948,6 +9964,12 @@
 }
 EXPORT_SYMBOL_GPL(kvm_vector_hashing_enabled);
 
+bool kvm_arch_no_poll(struct kvm_vcpu *vcpu)
+{
+	return (vcpu->arch.msr_kvm_poll_control & 1) == 0;
+}
+EXPORT_SYMBOL_GPL(kvm_arch_no_poll);
+
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_fast_mmio);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);



^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2019-06-17 15:57 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-03 22:52 [patch 0/3] cpuidle-haltpoll driver (v2) Marcelo Tosatti
2019-06-03 22:52 ` [patch 1/3] drivers/cpuidle: add cpuidle-haltpoll driver Marcelo Tosatti
2019-06-05  8:16   ` Ankur Arora
2019-06-06 17:51   ` Andrea Arcangeli
2019-06-07 20:20     ` Marcelo Tosatti
2019-06-06 19:03   ` Peter Zijlstra
2019-06-07  9:54   ` Rafael J. Wysocki
2019-06-03 22:52 ` [patch 2/3] kvm: x86: add host poll control msrs Marcelo Tosatti
2019-06-06 12:04   ` Paolo Bonzini
2019-06-03 22:52 ` [patch 3/3] cpuidle-haltpoll: disable host side polling when kvm virtualized Marcelo Tosatti
2019-06-04  1:26   ` kbuild test robot
2019-06-04 12:24   ` [patch v2 " Marcelo Tosatti
2019-06-06 18:25     ` Joao Martins
2019-06-06 18:36       ` Andrea Arcangeli
2019-06-06 18:51         ` Joao Martins
2019-06-06 19:22           ` Joao Martins
2019-06-06 21:01             ` Andrea Arcangeli
2019-06-07 20:38         ` Marcelo Tosatti
2019-06-07 20:25       ` Marcelo Tosatti
2019-06-07  9:49 ` [patch 0/3] cpuidle-haltpoll driver (v2) Rafael J. Wysocki
2019-06-07 17:16   ` Marcelo Tosatti
2019-06-07 18:22     ` Paolo Bonzini
2019-06-07 21:38       ` Marcelo Tosatti
2019-06-10 14:59   ` Marcelo Tosatti
2019-06-10 22:03     ` Rafael J. Wysocki
2019-06-11 14:26       ` Marcelo Tosatti
2019-06-11 21:24         ` Rafael J. Wysocki
2019-06-17 15:57           ` Peter Zijlstra
2019-06-11 19:40 [patch 0/3] cpuidle-haltpoll driver (v3) Marcelo Tosatti
2019-06-11 19:40 ` [patch 2/3] kvm: x86: add host poll control msrs Marcelo Tosatti

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).