All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/6] Dynamic Pause Loop Exiting window.
@ 2014-08-20 20:53 Radim Krčmář
  2014-08-20 20:53 ` [PATCH v2 1/6] KVM: add kvm_arch_sched_in Radim Krčmář
                   ` (6 more replies)
  0 siblings, 7 replies; 28+ messages in thread
From: Radim Krčmář @ 2014-08-20 20:53 UTC (permalink / raw)
  To: kvm
  Cc: linux-kernel, Paolo Bonzini, Gleb Natapov, Raghavendra KT,
	Vinod Chegu, Hui-Zhi, Christian Borntraeger

v1 -> v2:
 * squashed [v1 4/9] and [v1 5/9] (clamping)
 * dropped [v1 7/9] (CPP abstractions)
 * merged core of [v1 9/9] into [v1 4/9] (automatic maximum)
 * reworked kernel_param_ops: closer to pure int [v2 6/6]
 * introduced ple_window_actual_max & reworked clamping [v2 4/6]
 * added seqlock for parameter modifications [v2 6/6]

---
PLE does not scale in its current form.  When increasing VCPU count
above 150, one can hit soft lockups because of runqueue lock contention.
(Which says a lot about performance.)

The main reason is that kvm_ple_loop cycles through all VCPUs.
Replacing it with a scalable solution would be ideal, but it has already
been well optimized for various workloads, so this series tries to
alleviate one different major problem while minimizing a chance of
regressions: we have too many useless PLE exits.

Just increasing PLE window would help some cases, but it still spirals
out of control.  By increasing the window after every PLE exit, we can
limit the amount of useless ones, so we don't reach the state where CPUs
spend 99% of the time waiting for a lock.

HP confirmed that this series prevents soft lockups and TSC sync errors
on large guests.

Radim Krčmář (6):
  KVM: add kvm_arch_sched_in
  KVM: x86: introduce sched_in to kvm_x86_ops
  KVM: VMX: make PLE window per-VCPU
  KVM: VMX: dynamise PLE window
  KVM: trace kvm_ple_window
  KVM: VMX: runtime knobs for dynamic PLE window

 arch/arm/kvm/arm.c              |   4 ++
 arch/mips/kvm/mips.c            |   4 ++
 arch/powerpc/kvm/powerpc.c      |   4 ++
 arch/s390/kvm/kvm-s390.c        |   4 ++
 arch/x86/include/asm/kvm_host.h |   2 +
 arch/x86/kvm/svm.c              |   6 ++
 arch/x86/kvm/trace.h            |  25 ++++++++
 arch/x86/kvm/vmx.c              | 124 ++++++++++++++++++++++++++++++++++++++--
 arch/x86/kvm/x86.c              |   6 ++
 include/linux/kvm_host.h        |   2 +
 virt/kvm/kvm_main.c             |   2 +
 11 files changed, 179 insertions(+), 4 deletions(-)

-- 
2.0.4


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2 1/6] KVM: add kvm_arch_sched_in
  2014-08-20 20:53 [PATCH v2 0/6] Dynamic Pause Loop Exiting window Radim Krčmář
@ 2014-08-20 20:53 ` Radim Krčmář
  2014-08-21  8:29   ` Paolo Bonzini
  2014-08-20 20:53 ` [PATCH v2 2/6] KVM: x86: introduce sched_in to kvm_x86_ops Radim Krčmář
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 28+ messages in thread
From: Radim Krčmář @ 2014-08-20 20:53 UTC (permalink / raw)
  To: kvm
  Cc: linux-kernel, Paolo Bonzini, Gleb Natapov, Raghavendra KT,
	Vinod Chegu, Hui-Zhi, Christian Borntraeger

Introduce preempt notifiers for architecture specific code.
Advantage over creating a new notifier in every arch is slightly simpler
code and guaranteed call order with respect to kvm_sched_in.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 arch/arm/kvm/arm.c         | 4 ++++
 arch/mips/kvm/mips.c       | 4 ++++
 arch/powerpc/kvm/powerpc.c | 4 ++++
 arch/s390/kvm/kvm-s390.c   | 4 ++++
 arch/x86/kvm/x86.c         | 4 ++++
 include/linux/kvm_host.h   | 2 ++
 virt/kvm/kvm_main.c        | 2 ++
 7 files changed, 24 insertions(+)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index a99e0cd..9f788eb 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -288,6 +288,10 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
 {
 }
 
+void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu)
+{
+}
+
 void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
 	vcpu->cpu = cpu;
diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index cd71141..2362df2 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -1002,6 +1002,10 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
 {
 }
 
+void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu)
+{
+}
+
 int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
 				  struct kvm_translation *tr)
 {
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 4c79284..cbc432f 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -720,6 +720,10 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
 	kvmppc_subarch_vcpu_uninit(vcpu);
 }
 
+void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu)
+{
+}
+
 void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
 #ifdef CONFIG_BOOKE
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index ce81eb2..a3c324e 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -555,6 +555,10 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
 	/* Nothing todo */
 }
 
+void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu)
+{
+}
+
 void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
 	save_fp_ctl(&vcpu->arch.host_fpregs.fpc);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8f1e22d..d7c214f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7146,6 +7146,10 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
 		static_key_slow_dec(&kvm_no_apic_vcpu);
 }
 
+void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu)
+{
+}
+
 int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 {
 	if (type)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index a4c33b3..ebd7236 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -624,6 +624,8 @@ void kvm_arch_exit(void);
 int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu);
 void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu);
 
+void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu);
+
 void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu);
 void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 33712fb..d3c3ed0 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3123,6 +3123,8 @@ static void kvm_sched_in(struct preempt_notifier *pn, int cpu)
 	if (vcpu->preempted)
 		vcpu->preempted = false;
 
+	kvm_arch_sched_in(vcpu, cpu);
+
 	kvm_arch_vcpu_load(vcpu, cpu);
 }
 
-- 
2.0.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 2/6] KVM: x86: introduce sched_in to kvm_x86_ops
  2014-08-20 20:53 [PATCH v2 0/6] Dynamic Pause Loop Exiting window Radim Krčmář
  2014-08-20 20:53 ` [PATCH v2 1/6] KVM: add kvm_arch_sched_in Radim Krčmář
@ 2014-08-20 20:53 ` Radim Krčmář
  2014-08-20 20:53 ` [PATCH v2 3/6] KVM: VMX: make PLE window per-VCPU Radim Krčmář
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 28+ messages in thread
From: Radim Krčmář @ 2014-08-20 20:53 UTC (permalink / raw)
  To: kvm
  Cc: linux-kernel, Paolo Bonzini, Gleb Natapov, Raghavendra KT,
	Vinod Chegu, Hui-Zhi, Christian Borntraeger

sched_in preempt notifier is available for x86, allow its use in
specific virtualization technlogies as well.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 arch/x86/include/asm/kvm_host.h | 2 ++
 arch/x86/kvm/svm.c              | 6 ++++++
 arch/x86/kvm/vmx.c              | 6 ++++++
 arch/x86/kvm/x86.c              | 1 +
 4 files changed, 15 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 5724601..358e2f3 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -772,6 +772,8 @@ struct kvm_x86_ops {
 	bool (*mpx_supported)(void);
 
 	int (*check_nested_events)(struct kvm_vcpu *vcpu, bool external_intr);
+
+	void (*sched_in)(struct kvm_vcpu *kvm, int cpu);
 };
 
 struct kvm_arch_async_pf {
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index ddf7427..4baf1bc 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -4305,6 +4305,10 @@ static void svm_handle_external_intr(struct kvm_vcpu *vcpu)
 	local_irq_enable();
 }
 
+static void svm_sched_in(struct kvm_vcpu *vcpu, int cpu)
+{
+}
+
 static struct kvm_x86_ops svm_x86_ops = {
 	.cpu_has_kvm_support = has_svm,
 	.disabled_by_bios = is_disabled,
@@ -4406,6 +4410,8 @@ static struct kvm_x86_ops svm_x86_ops = {
 
 	.check_intercept = svm_check_intercept,
 	.handle_external_intr = svm_handle_external_intr,
+
+	.sched_in = svm_sched_in,
 };
 
 static int __init svm_init(void)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index bfe11cf..2b306f9 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -8846,6 +8846,10 @@ static int vmx_check_intercept(struct kvm_vcpu *vcpu,
 	return X86EMUL_CONTINUE;
 }
 
+void vmx_sched_in(struct kvm_vcpu *vcpu, int cpu)
+{
+}
+
 static struct kvm_x86_ops vmx_x86_ops = {
 	.cpu_has_kvm_support = cpu_has_kvm_support,
 	.disabled_by_bios = vmx_disabled_by_bios,
@@ -8951,6 +8955,8 @@ static struct kvm_x86_ops vmx_x86_ops = {
 	.mpx_supported = vmx_mpx_supported,
 
 	.check_nested_events = vmx_check_nested_events,
+
+	.sched_in = vmx_sched_in,
 };
 
 static int __init vmx_init(void)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d7c214f..5696ee7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7148,6 +7148,7 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
 
 void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu)
 {
+	kvm_x86_ops->sched_in(vcpu, cpu);
 }
 
 int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
-- 
2.0.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 3/6] KVM: VMX: make PLE window per-VCPU
  2014-08-20 20:53 [PATCH v2 0/6] Dynamic Pause Loop Exiting window Radim Krčmář
  2014-08-20 20:53 ` [PATCH v2 1/6] KVM: add kvm_arch_sched_in Radim Krčmář
  2014-08-20 20:53 ` [PATCH v2 2/6] KVM: x86: introduce sched_in to kvm_x86_ops Radim Krčmář
@ 2014-08-20 20:53 ` Radim Krčmář
  2014-08-21  8:25   ` Paolo Bonzini
  2014-08-20 20:53 ` [PATCH v2 4/6] KVM: VMX: dynamise PLE window Radim Krčmář
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 28+ messages in thread
From: Radim Krčmář @ 2014-08-20 20:53 UTC (permalink / raw)
  To: kvm
  Cc: linux-kernel, Paolo Bonzini, Gleb Natapov, Raghavendra KT,
	Vinod Chegu, Hui-Zhi, Christian Borntraeger

Change PLE window into per-VCPU variable, seeded from module parameter,
to allow greater flexibility.

Brings in a small overhead on every vmentry.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 arch/x86/kvm/vmx.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 2b306f9..18e0e52 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -484,6 +484,9 @@ struct vcpu_vmx {
 
 	/* Support for a guest hypervisor (nested VMX) */
 	struct nested_vmx nested;
+
+	/* Dynamic PLE window. */
+	int ple_window;
 };
 
 enum segment_cache_field {
@@ -4402,7 +4405,7 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
 
 	if (ple_gap) {
 		vmcs_write32(PLE_GAP, ple_gap);
-		vmcs_write32(PLE_WINDOW, ple_window);
+		vmx->ple_window = ple_window;
 	}
 
 	vmcs_write32(PAGE_FAULT_ERROR_CODE_MASK, 0);
@@ -7387,6 +7390,9 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
 	if (vmx->emulation_required)
 		return;
 
+	if (ple_gap)
+		vmcs_write32(PLE_WINDOW, vmx->ple_window);
+
 	if (vmx->nested.sync_shadow_vmcs) {
 		copy_vmcs12_to_shadow(vmx);
 		vmx->nested.sync_shadow_vmcs = false;
-- 
2.0.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 4/6] KVM: VMX: dynamise PLE window
  2014-08-20 20:53 [PATCH v2 0/6] Dynamic Pause Loop Exiting window Radim Krčmář
                   ` (2 preceding siblings ...)
  2014-08-20 20:53 ` [PATCH v2 3/6] KVM: VMX: make PLE window per-VCPU Radim Krčmář
@ 2014-08-20 20:53 ` Radim Krčmář
  2014-08-21  8:24   ` Paolo Bonzini
  2014-08-21  8:26   ` Paolo Bonzini
  2014-08-20 20:53 ` [PATCH v2 5/6] KVM: trace kvm_ple_window Radim Krčmář
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 28+ messages in thread
From: Radim Krčmář @ 2014-08-20 20:53 UTC (permalink / raw)
  To: kvm
  Cc: linux-kernel, Paolo Bonzini, Gleb Natapov, Raghavendra KT,
	Vinod Chegu, Hui-Zhi, Christian Borntraeger

Window is increased on every PLE exit and decreased on every sched_in.
The idea is that we don't want to PLE exit if there is no preemption
going on.
We do this with sched_in() because it does not hold rq lock.

There are two new kernel parameters for changing the window:
 ple_window_grow and ple_window_shrink
ple_window_grow affects the window on PLE exit and ple_window_shrink
does it on sched_in;  depending on their value, the window is modifier
like this: (ple_window is kvm_intel's global)

  ple_window_shrink/ |
  ple_window_grow    | PLE exit           | sched_in
  -------------------+--------------------+---------------------
  < 1                |  = ple_window      |  = ple_window
  < ple_window       | *= ple_window_grow | /= ple_window_shrink
  otherwise          | += ple_window_grow | -= ple_window_shrink

A third new parameter, ple_window_max, controls a maximal ple_window.
A minimum equals to ple_window.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 arch/x86/kvm/vmx.c | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 78 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 18e0e52..e63d7ac 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -125,14 +125,32 @@ module_param(nested, bool, S_IRUGO);
  * Time is measured based on a counter that runs at the same rate as the TSC,
  * refer SDM volume 3b section 21.6.13 & 22.1.3.
  */
-#define KVM_VMX_DEFAULT_PLE_GAP    128
-#define KVM_VMX_DEFAULT_PLE_WINDOW 4096
+#define KVM_VMX_DEFAULT_PLE_GAP           128
+#define KVM_VMX_DEFAULT_PLE_WINDOW        4096
+#define KVM_VMX_DEFAULT_PLE_WINDOW_GROW   2
+#define KVM_VMX_DEFAULT_PLE_WINDOW_SHRINK 0
+#define KVM_VMX_DEFAULT_PLE_WINDOW_MAX    \
+		INT_MAX / KVM_VMX_DEFAULT_PLE_WINDOW_GROW
+
 static int ple_gap = KVM_VMX_DEFAULT_PLE_GAP;
 module_param(ple_gap, int, S_IRUGO);
 
 static int ple_window = KVM_VMX_DEFAULT_PLE_WINDOW;
 module_param(ple_window, int, S_IRUGO);
 
+/* Default doubles per-vcpu window every exit. */
+static int ple_window_grow = KVM_VMX_DEFAULT_PLE_WINDOW_GROW;
+module_param(ple_window_grow, int, S_IRUGO);
+
+/* Default resets per-vcpu window every exit to ple_window. */
+static int ple_window_shrink = KVM_VMX_DEFAULT_PLE_WINDOW_SHRINK;
+module_param(ple_window_shrink, int, S_IRUGO);
+
+/* Default is to compute the maximum so we can never overflow. */
+static int ple_window_actual_max = KVM_VMX_DEFAULT_PLE_WINDOW_MAX;
+static int ple_window_max        = KVM_VMX_DEFAULT_PLE_WINDOW_MAX;
+module_param(ple_window_max, int, S_IRUGO);
+
 extern const ulong vmx_return;
 
 #define NR_AUTOLOAD_MSRS 8
@@ -5679,12 +5697,66 @@ out:
 	return ret;
 }
 
+static int __grow_ple_window(int val)
+{
+	if (ple_window_grow < 1)
+		return ple_window;
+
+	val = min(val, ple_window_actual_max);
+
+	if (ple_window_grow < ple_window)
+		val *= ple_window_grow;
+	else
+		val += ple_window_grow;
+
+	return val;
+}
+
+static int __shrink_ple_window(int val, int shrinker, int minimum)
+{
+	if (shrinker < 1)
+		return ple_window;
+
+	if (shrinker < ple_window)
+		val /= shrinker;
+	else
+		val -= shrinker;
+
+	return max(val, minimum);
+}
+
+static void modify_ple_window(struct kvm_vcpu *vcpu, int grow)
+{
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
+	int new;
+
+	if (grow)
+		new = __grow_ple_window(vmx->ple_window);
+	else
+		new = __shrink_ple_window(vmx->ple_window, ple_window_shrink,
+		                          ple_window);
+
+	vmx->ple_window = max(new, ple_window);
+}
+#define grow_ple_window(vcpu)   modify_ple_window(vcpu, 1)
+#define shrink_ple_window(vcpu) modify_ple_window(vcpu, 0)
+
+static void update_ple_window_actual_max(void)
+{
+	ple_window_actual_max =
+			__shrink_ple_window(max(ple_window_max, ple_window),
+			                    ple_window_grow, INT_MIN);
+}
+
 /*
  * Indicate a busy-waiting vcpu in spinlock. We do not enable the PAUSE
  * exiting, so only get here on cpu with PAUSE-Loop-Exiting.
  */
 static int handle_pause(struct kvm_vcpu *vcpu)
 {
+	if (ple_gap)
+		grow_ple_window(vcpu);
+
 	skip_emulated_instruction(vcpu);
 	kvm_vcpu_on_spin(vcpu);
 
@@ -8854,6 +8926,8 @@ static int vmx_check_intercept(struct kvm_vcpu *vcpu,
 
 void vmx_sched_in(struct kvm_vcpu *vcpu, int cpu)
 {
+	if (ple_gap)
+		shrink_ple_window(vcpu);
 }
 
 static struct kvm_x86_ops vmx_x86_ops = {
@@ -9077,6 +9151,8 @@ static int __init vmx_init(void)
 	} else
 		kvm_disable_tdp();
 
+	update_ple_window_actual_max();
+
 	return 0;
 
 out7:
-- 
2.0.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 5/6] KVM: trace kvm_ple_window
  2014-08-20 20:53 [PATCH v2 0/6] Dynamic Pause Loop Exiting window Radim Krčmář
                   ` (3 preceding siblings ...)
  2014-08-20 20:53 ` [PATCH v2 4/6] KVM: VMX: dynamise PLE window Radim Krčmář
@ 2014-08-20 20:53 ` Radim Krčmář
  2014-08-21  8:29   ` Paolo Bonzini
  2014-08-20 20:53 ` [PATCH v2 6/6] KVM: VMX: runtime knobs for dynamic PLE window Radim Krčmář
  2014-08-21  6:49   ` Zhao, Hui-Zhi (Steven, HPservers-Core-OE-PSC)
  6 siblings, 1 reply; 28+ messages in thread
From: Radim Krčmář @ 2014-08-20 20:53 UTC (permalink / raw)
  To: kvm
  Cc: linux-kernel, Paolo Bonzini, Gleb Natapov, Raghavendra KT,
	Vinod Chegu, Hui-Zhi, Christian Borntraeger

Tracepoint for dynamic PLE window, fired on every potential change.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 arch/x86/kvm/trace.h | 25 +++++++++++++++++++++++++
 arch/x86/kvm/vmx.c   |  8 +++++---
 arch/x86/kvm/x86.c   |  1 +
 3 files changed, 31 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index e850a7d..4b8e6cb 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -848,6 +848,31 @@ TRACE_EVENT(kvm_track_tsc,
 		  __print_symbolic(__entry->host_clock, host_clocks))
 );
 
+TRACE_EVENT(kvm_ple_window,
+	TP_PROTO(int grow, unsigned int vcpu_id, int new, int old),
+	TP_ARGS(grow, vcpu_id, new, old),
+
+	TP_STRUCT__entry(
+		__field(                 int,      grow         )
+		__field(        unsigned int,   vcpu_id         )
+		__field(                 int,       new         )
+		__field(                 int,       old         )
+	),
+
+	TP_fast_assign(
+		__entry->grow           = grow;
+		__entry->vcpu_id        = vcpu_id;
+		__entry->new            = new;
+		__entry->old            = old;
+	),
+
+	TP_printk("vcpu %u: ple_window %d %s %d",
+	          __entry->vcpu_id,
+	          __entry->new,
+	          __entry->grow ? "<+" : "<-",
+	          __entry->old)
+);
+
 #endif /* CONFIG_X86_64 */
 
 #endif /* _TRACE_KVM_H */
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index e63d7ac..f63ac5d 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5728,15 +5728,17 @@ static int __shrink_ple_window(int val, int shrinker, int minimum)
 static void modify_ple_window(struct kvm_vcpu *vcpu, int grow)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
+	int old = vmx->ple_window;
 	int new;
 
 	if (grow)
-		new = __grow_ple_window(vmx->ple_window);
+		new = __grow_ple_window(old)
 	else
-		new = __shrink_ple_window(vmx->ple_window, ple_window_shrink,
-		                          ple_window);
+		new = __shrink_ple_window(old, ple_window_shrink, ple_window);
 
 	vmx->ple_window = max(new, ple_window);
+
+	trace_kvm_ple_window(grow, vcpu->vcpu_id, vmx->ple_window, old);
 }
 #define grow_ple_window(vcpu)   modify_ple_window(vcpu, 1)
 #define shrink_ple_window(vcpu) modify_ple_window(vcpu, 0)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5696ee7..814b20c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7648,3 +7648,4 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_invlpga);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_skinit);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_intercepts);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_write_tsc_offset);
+EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_ple_window);
-- 
2.0.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 6/6] KVM: VMX: runtime knobs for dynamic PLE window
  2014-08-20 20:53 [PATCH v2 0/6] Dynamic Pause Loop Exiting window Radim Krčmář
                   ` (4 preceding siblings ...)
  2014-08-20 20:53 ` [PATCH v2 5/6] KVM: trace kvm_ple_window Radim Krčmář
@ 2014-08-20 20:53 ` Radim Krčmář
  2014-08-21  6:49   ` Zhao, Hui-Zhi (Steven, HPservers-Core-OE-PSC)
  6 siblings, 0 replies; 28+ messages in thread
From: Radim Krčmář @ 2014-08-20 20:53 UTC (permalink / raw)
  To: kvm
  Cc: linux-kernel, Paolo Bonzini, Gleb Natapov, Raghavendra KT,
	Vinod Chegu, Hui-Zhi, Christian Borntraeger

ple_window is updated on every vmentry, so there is no reason to have it
read-only anymore.
ple_window* weren't writable to prevent runtime overflow races;
they are prevented by a seqlock.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 arch/x86/kvm/vmx.c | 48 +++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 37 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index f63ac5d..bd73fa1 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -132,24 +132,29 @@ module_param(nested, bool, S_IRUGO);
 #define KVM_VMX_DEFAULT_PLE_WINDOW_MAX    \
 		INT_MAX / KVM_VMX_DEFAULT_PLE_WINDOW_GROW
 
+static struct kernel_param_ops param_ops_ple_t;
+#define param_check_ple_t(name, p) __param_check(name, p, int)
+
+static DEFINE_SEQLOCK(ple_window_seqlock);
+
 static int ple_gap = KVM_VMX_DEFAULT_PLE_GAP;
 module_param(ple_gap, int, S_IRUGO);
 
 static int ple_window = KVM_VMX_DEFAULT_PLE_WINDOW;
-module_param(ple_window, int, S_IRUGO);
+module_param(ple_window, ple_t, S_IRUGO | S_IWUSR);
 
 /* Default doubles per-vcpu window every exit. */
 static int ple_window_grow = KVM_VMX_DEFAULT_PLE_WINDOW_GROW;
-module_param(ple_window_grow, int, S_IRUGO);
+module_param(ple_window_grow, ple_t, S_IRUGO | S_IWUSR);
 
 /* Default resets per-vcpu window every exit to ple_window. */
 static int ple_window_shrink = KVM_VMX_DEFAULT_PLE_WINDOW_SHRINK;
-module_param(ple_window_shrink, int, S_IRUGO);
+module_param(ple_window_shrink, int, S_IRUGO | S_IWUSR);
 
 /* Default is to compute the maximum so we can never overflow. */
 static int ple_window_actual_max = KVM_VMX_DEFAULT_PLE_WINDOW_MAX;
 static int ple_window_max        = KVM_VMX_DEFAULT_PLE_WINDOW_MAX;
-module_param(ple_window_max, int, S_IRUGO);
+module_param(ple_window_max, ple_t, S_IRUGO | S_IWUSR);
 
 extern const ulong vmx_return;
 
@@ -5730,13 +5735,19 @@ static void modify_ple_window(struct kvm_vcpu *vcpu, int grow)
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 	int old = vmx->ple_window;
 	int new;
+	unsigned seq;
 
-	if (grow)
-		new = __grow_ple_window(old)
-	else
-		new = __shrink_ple_window(old, ple_window_shrink, ple_window);
+	do {
+		seq = read_seqbegin(&ple_window_seqlock);
 
-	vmx->ple_window = max(new, ple_window);
+		if (grow)
+			new = __grow_ple_window(old);
+		else
+			new = __shrink_ple_window(old, ple_window_shrink,
+			                          ple_window);
+
+		vmx->ple_window = max(new, ple_window);
+	} while (read_seqretry(&ple_window_seqlock, seq));
 
 	trace_kvm_ple_window(grow, vcpu->vcpu_id, vmx->ple_window, old);
 }
@@ -5750,6 +5761,23 @@ static void update_ple_window_actual_max(void)
 			                    ple_window_grow, INT_MIN);
 }
 
+static int param_set_ple_t(const char *arg, const struct kernel_param *kp)
+{
+	int ret;
+
+	write_seqlock(&ple_window_seqlock);
+	ret = param_set_int(arg, kp);
+	update_ple_window_actual_max();
+	write_sequnlock(&ple_window_seqlock);
+
+	return ret;
+}
+
+static struct kernel_param_ops param_ops_ple_t = {
+	.set = param_set_ple_t,
+	.get = param_get_int,
+};
+
 /*
  * Indicate a busy-waiting vcpu in spinlock. We do not enable the PAUSE
  * exiting, so only get here on cpu with PAUSE-Loop-Exiting.
@@ -9153,8 +9181,6 @@ static int __init vmx_init(void)
 	} else
 		kvm_disable_tdp();
 
-	update_ple_window_actual_max();
-
 	return 0;
 
 out7:
-- 
2.0.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* RE: [PATCH v2 0/6] Dynamic Pause Loop Exiting window.
  2014-08-20 20:53 [PATCH v2 0/6] Dynamic Pause Loop Exiting window Radim Krčmář
@ 2014-08-21  6:49   ` Zhao, Hui-Zhi (Steven, HPservers-Core-OE-PSC)
  2014-08-20 20:53 ` [PATCH v2 2/6] KVM: x86: introduce sched_in to kvm_x86_ops Radim Krčmář
                     ` (5 subsequent siblings)
  6 siblings, 0 replies; 28+ messages in thread
From: Zhao, Hui-Zhi (Steven, HPservers-Core-OE-PSC) @ 2014-08-21  6:49 UTC (permalink / raw)
  To: Radim Krčmář, kvm
  Cc: linux-kernel, Paolo Bonzini, Gleb Natapov, Raghavendra KT, Vinod,
	Chegu, Christian Borntraeger

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 2868 bytes --]

This patch have been tested by Lisa and me and it's success.

We created 4 VM guests and reboot them every 10 minutes for 12 hours around, and this issue is gone with the patch.


Please add Lisa and me to the "tested by:" list.
Tested-by: Mitchell, Lisa <lisa.mitchell@hp.com>
Tested-by: Zhao, Hui Zhi <hui-zhi.zhao@hp.com>


Regards,
Steven Zhao

-----Original Message-----
From: Radim Krčmář [mailto:rkrcmar@redhat.com] 
Sent: Thursday, August 21, 2014 4:53 AM
To: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org; Paolo Bonzini; Gleb Natapov; Raghavendra KT; Vinod, Chegu; Zhao, Hui-Zhi (Steven, HPservers-Core-OE-PSC); Christian Borntraeger
Subject: [PATCH v2 0/6] Dynamic Pause Loop Exiting window.

v1 -> v2:
 * squashed [v1 4/9] and [v1 5/9] (clamping)
 * dropped [v1 7/9] (CPP abstractions)
 * merged core of [v1 9/9] into [v1 4/9] (automatic maximum)
 * reworked kernel_param_ops: closer to pure int [v2 6/6]
 * introduced ple_window_actual_max & reworked clamping [v2 4/6]
 * added seqlock for parameter modifications [v2 6/6]

---
PLE does not scale in its current form.  When increasing VCPU count above 150, one can hit soft lockups because of runqueue lock contention.
(Which says a lot about performance.)

The main reason is that kvm_ple_loop cycles through all VCPUs.
Replacing it with a scalable solution would be ideal, but it has already been well optimized for various workloads, so this series tries to alleviate one different major problem while minimizing a chance of
regressions: we have too many useless PLE exits.

Just increasing PLE window would help some cases, but it still spirals out of control.  By increasing the window after every PLE exit, we can limit the amount of useless ones, so we don't reach the state where CPUs spend 99% of the time waiting for a lock.

HP confirmed that this series prevents soft lockups and TSC sync errors on large guests.

Radim Krčmář (6):
  KVM: add kvm_arch_sched_in
  KVM: x86: introduce sched_in to kvm_x86_ops
  KVM: VMX: make PLE window per-VCPU
  KVM: VMX: dynamise PLE window
  KVM: trace kvm_ple_window
  KVM: VMX: runtime knobs for dynamic PLE window

 arch/arm/kvm/arm.c              |   4 ++
 arch/mips/kvm/mips.c            |   4 ++
 arch/powerpc/kvm/powerpc.c      |   4 ++
 arch/s390/kvm/kvm-s390.c        |   4 ++
 arch/x86/include/asm/kvm_host.h |   2 +
 arch/x86/kvm/svm.c              |   6 ++
 arch/x86/kvm/trace.h            |  25 ++++++++
 arch/x86/kvm/vmx.c              | 124 ++++++++++++++++++++++++++++++++++++++--
 arch/x86/kvm/x86.c              |   6 ++
 include/linux/kvm_host.h        |   2 +
 virt/kvm/kvm_main.c             |   2 +
 11 files changed, 179 insertions(+), 4 deletions(-)

--
2.0.4

ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH v2 0/6] Dynamic Pause Loop Exiting window.
@ 2014-08-21  6:49   ` Zhao, Hui-Zhi (Steven, HPservers-Core-OE-PSC)
  0 siblings, 0 replies; 28+ messages in thread
From: Zhao, Hui-Zhi (Steven, HPservers-Core-OE-PSC) @ 2014-08-21  6:49 UTC (permalink / raw)
  To: Radim Krčmář, kvm
  Cc: linux-kernel, Paolo Bonzini, Gleb Natapov, Raghavendra KT, Vinod,
	Chegu, Christian Borntraeger

This patch have been tested by Lisa and me and it's success.

We created 4 VM guests and reboot them every 10 minutes for 12 hours around, and this issue is gone with the patch.


Please add Lisa and me to the "tested by:" list.
Tested-by: Mitchell, Lisa <lisa.mitchell@hp.com>
Tested-by: Zhao, Hui Zhi <hui-zhi.zhao@hp.com>


Regards,
Steven Zhao

-----Original Message-----
From: Radim Krčmář [mailto:rkrcmar@redhat.com] 
Sent: Thursday, August 21, 2014 4:53 AM
To: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org; Paolo Bonzini; Gleb Natapov; Raghavendra KT; Vinod, Chegu; Zhao, Hui-Zhi (Steven, HPservers-Core-OE-PSC); Christian Borntraeger
Subject: [PATCH v2 0/6] Dynamic Pause Loop Exiting window.

v1 -> v2:
 * squashed [v1 4/9] and [v1 5/9] (clamping)
 * dropped [v1 7/9] (CPP abstractions)
 * merged core of [v1 9/9] into [v1 4/9] (automatic maximum)
 * reworked kernel_param_ops: closer to pure int [v2 6/6]
 * introduced ple_window_actual_max & reworked clamping [v2 4/6]
 * added seqlock for parameter modifications [v2 6/6]

---
PLE does not scale in its current form.  When increasing VCPU count above 150, one can hit soft lockups because of runqueue lock contention.
(Which says a lot about performance.)

The main reason is that kvm_ple_loop cycles through all VCPUs.
Replacing it with a scalable solution would be ideal, but it has already been well optimized for various workloads, so this series tries to alleviate one different major problem while minimizing a chance of
regressions: we have too many useless PLE exits.

Just increasing PLE window would help some cases, but it still spirals out of control.  By increasing the window after every PLE exit, we can limit the amount of useless ones, so we don't reach the state where CPUs spend 99% of the time waiting for a lock.

HP confirmed that this series prevents soft lockups and TSC sync errors on large guests.

Radim Krčmář (6):
  KVM: add kvm_arch_sched_in
  KVM: x86: introduce sched_in to kvm_x86_ops
  KVM: VMX: make PLE window per-VCPU
  KVM: VMX: dynamise PLE window
  KVM: trace kvm_ple_window
  KVM: VMX: runtime knobs for dynamic PLE window

 arch/arm/kvm/arm.c              |   4 ++
 arch/mips/kvm/mips.c            |   4 ++
 arch/powerpc/kvm/powerpc.c      |   4 ++
 arch/s390/kvm/kvm-s390.c        |   4 ++
 arch/x86/include/asm/kvm_host.h |   2 +
 arch/x86/kvm/svm.c              |   6 ++
 arch/x86/kvm/trace.h            |  25 ++++++++
 arch/x86/kvm/vmx.c              | 124 ++++++++++++++++++++++++++++++++++++++--
 arch/x86/kvm/x86.c              |   6 ++
 include/linux/kvm_host.h        |   2 +
 virt/kvm/kvm_main.c             |   2 +
 11 files changed, 179 insertions(+), 4 deletions(-)

--
2.0.4


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 4/6] KVM: VMX: dynamise PLE window
  2014-08-20 20:53 ` [PATCH v2 4/6] KVM: VMX: dynamise PLE window Radim Krčmář
@ 2014-08-21  8:24   ` Paolo Bonzini
  2014-08-21 11:47     ` Radim Krčmář
  2014-08-21  8:26   ` Paolo Bonzini
  1 sibling, 1 reply; 28+ messages in thread
From: Paolo Bonzini @ 2014-08-21  8:24 UTC (permalink / raw)
  To: Radim Krčmář, kvm
  Cc: linux-kernel, Gleb Natapov, Raghavendra KT, Vinod Chegu, Hui-Zhi,
	Christian Borntraeger

Il 20/08/2014 22:53, Radim Krčmář ha scritto:
> +static void update_ple_window_actual_max(void)
> +{
> +	ple_window_actual_max =
> +			__shrink_ple_window(max(ple_window_max, ple_window),

Why the max() here?

> +			                    ple_window_grow, INT_MIN);

This should be 0 (in fact you can probably make everything unsigned now
that you've sorted out the overflows).

Paolo

> +}
> +


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 3/6] KVM: VMX: make PLE window per-VCPU
  2014-08-20 20:53 ` [PATCH v2 3/6] KVM: VMX: make PLE window per-VCPU Radim Krčmář
@ 2014-08-21  8:25   ` Paolo Bonzini
  2014-08-21 11:38     ` Radim Krčmář
  0 siblings, 1 reply; 28+ messages in thread
From: Paolo Bonzini @ 2014-08-21  8:25 UTC (permalink / raw)
  To: Radim Krčmář, kvm
  Cc: linux-kernel, Gleb Natapov, Raghavendra KT, Vinod Chegu, Hui-Zhi,
	Christian Borntraeger

Il 20/08/2014 22:53, Radim Krčmář ha scritto:
> Change PLE window into per-VCPU variable, seeded from module parameter,
> to allow greater flexibility.
> 
> Brings in a small overhead on every vmentry.
> 
> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
> ---
>  arch/x86/kvm/vmx.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 2b306f9..18e0e52 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -484,6 +484,9 @@ struct vcpu_vmx {
>  
>  	/* Support for a guest hypervisor (nested VMX) */
>  	struct nested_vmx nested;
> +
> +	/* Dynamic PLE window. */
> +	int ple_window;
>  };
>  
>  enum segment_cache_field {
> @@ -4402,7 +4405,7 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
>  
>  	if (ple_gap) {
>  		vmcs_write32(PLE_GAP, ple_gap);
> -		vmcs_write32(PLE_WINDOW, ple_window);
> +		vmx->ple_window = ple_window;
>  	}
>  
>  	vmcs_write32(PAGE_FAULT_ERROR_CODE_MASK, 0);
> @@ -7387,6 +7390,9 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
>  	if (vmx->emulation_required)
>  		return;
>  
> +	if (ple_gap)
> +		vmcs_write32(PLE_WINDOW, vmx->ple_window);

Maybe we can add a ple_window_dirty field?  It can be tested instead of
ple_gap to avoid the vmwrite in the common case?

Paolo

>  	if (vmx->nested.sync_shadow_vmcs) {
>  		copy_vmcs12_to_shadow(vmx);
>  		vmx->nested.sync_shadow_vmcs = false;
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 4/6] KVM: VMX: dynamise PLE window
  2014-08-20 20:53 ` [PATCH v2 4/6] KVM: VMX: dynamise PLE window Radim Krčmář
  2014-08-21  8:24   ` Paolo Bonzini
@ 2014-08-21  8:26   ` Paolo Bonzini
  2014-08-21 11:54     ` Radim Krčmář
  1 sibling, 1 reply; 28+ messages in thread
From: Paolo Bonzini @ 2014-08-21  8:26 UTC (permalink / raw)
  To: Radim Krčmář, kvm
  Cc: linux-kernel, Gleb Natapov, Raghavendra KT, Vinod Chegu, Hui-Zhi,
	Christian Borntraeger

Il 20/08/2014 22:53, Radim Krčmář ha scritto:
> Window is increased on every PLE exit and decreased on every sched_in.
> The idea is that we don't want to PLE exit if there is no preemption
> going on.
> We do this with sched_in() because it does not hold rq lock.
> 
> There are two new kernel parameters for changing the window:
>  ple_window_grow and ple_window_shrink
> ple_window_grow affects the window on PLE exit and ple_window_shrink
> does it on sched_in;  depending on their value, the window is modifier
> like this: (ple_window is kvm_intel's global)
> 
>   ple_window_shrink/ |
>   ple_window_grow    | PLE exit           | sched_in
>   -------------------+--------------------+---------------------
>   < 1                |  = ple_window      |  = ple_window
>   < ple_window       | *= ple_window_grow | /= ple_window_shrink
>   otherwise          | += ple_window_grow | -= ple_window_shrink
> 
> A third new parameter, ple_window_max, controls a maximal ple_window.
> A minimum equals to ple_window.
> 
> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
> ---
>  arch/x86/kvm/vmx.c | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 78 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 18e0e52..e63d7ac 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -125,14 +125,32 @@ module_param(nested, bool, S_IRUGO);
>   * Time is measured based on a counter that runs at the same rate as the TSC,
>   * refer SDM volume 3b section 21.6.13 & 22.1.3.
>   */
> -#define KVM_VMX_DEFAULT_PLE_GAP    128
> -#define KVM_VMX_DEFAULT_PLE_WINDOW 4096
> +#define KVM_VMX_DEFAULT_PLE_GAP           128
> +#define KVM_VMX_DEFAULT_PLE_WINDOW        4096
> +#define KVM_VMX_DEFAULT_PLE_WINDOW_GROW   2
> +#define KVM_VMX_DEFAULT_PLE_WINDOW_SHRINK 0
> +#define KVM_VMX_DEFAULT_PLE_WINDOW_MAX    \
> +		INT_MAX / KVM_VMX_DEFAULT_PLE_WINDOW_GROW
> +
>  static int ple_gap = KVM_VMX_DEFAULT_PLE_GAP;
>  module_param(ple_gap, int, S_IRUGO);
>  
>  static int ple_window = KVM_VMX_DEFAULT_PLE_WINDOW;
>  module_param(ple_window, int, S_IRUGO);
>  
> +/* Default doubles per-vcpu window every exit. */
> +static int ple_window_grow = KVM_VMX_DEFAULT_PLE_WINDOW_GROW;
> +module_param(ple_window_grow, int, S_IRUGO);
> +
> +/* Default resets per-vcpu window every exit to ple_window. */
> +static int ple_window_shrink = KVM_VMX_DEFAULT_PLE_WINDOW_SHRINK;
> +module_param(ple_window_shrink, int, S_IRUGO);
> +
> +/* Default is to compute the maximum so we can never overflow. */
> +static int ple_window_actual_max = KVM_VMX_DEFAULT_PLE_WINDOW_MAX;
> +static int ple_window_max        = KVM_VMX_DEFAULT_PLE_WINDOW_MAX;
> +module_param(ple_window_max, int, S_IRUGO);
> +
>  extern const ulong vmx_return;
>  
>  #define NR_AUTOLOAD_MSRS 8
> @@ -5679,12 +5697,66 @@ out:
>  	return ret;
>  }
>  
> +static int __grow_ple_window(int val)
> +{
> +	if (ple_window_grow < 1)
> +		return ple_window;
> +
> +	val = min(val, ple_window_actual_max);
> +
> +	if (ple_window_grow < ple_window)
> +		val *= ple_window_grow;
> +	else
> +		val += ple_window_grow;
> +
> +	return val;
> +}
> +
> +static int __shrink_ple_window(int val, int shrinker, int minimum)

s/shrinker/factor/ or s/shrinker/param/ (shrinker has another meaning in
the kernel).

> +{
> +	if (shrinker < 1)
> +		return ple_window;
> +
> +	if (shrinker < ple_window)
> +		val /= shrinker;
> +	else
> +		val -= shrinker;
> +
> +	return max(val, minimum);

Any reason to use anything but ple_window as the minimum, even in
update_ple_window_actual_max?

> +}
> +
> +static void modify_ple_window(struct kvm_vcpu *vcpu, int grow)
> +{
> +	struct vcpu_vmx *vmx = to_vmx(vcpu);
> +	int new;
> +
> +	if (grow)
> +		new = __grow_ple_window(vmx->ple_window);
> +	else
> +		new = __shrink_ple_window(vmx->ple_window, ple_window_shrink,
> +		                          ple_window);
> +
> +	vmx->ple_window = max(new, ple_window);
> +}
> +#define grow_ple_window(vcpu)   modify_ple_window(vcpu, 1)
> +#define shrink_ple_window(vcpu) modify_ple_window(vcpu, 0)

No macros please. :)

Paolo

> +
> +static void update_ple_window_actual_max(void)
> +{
> +	ple_window_actual_max =
> +			__shrink_ple_window(max(ple_window_max, ple_window),
> +			                    ple_window_grow, INT_MIN);
> +}
> +
>  /*
>   * Indicate a busy-waiting vcpu in spinlock. We do not enable the PAUSE
>   * exiting, so only get here on cpu with PAUSE-Loop-Exiting.
>   */
>  static int handle_pause(struct kvm_vcpu *vcpu)
>  {
> +	if (ple_gap)
> +		grow_ple_window(vcpu);
> +
>  	skip_emulated_instruction(vcpu);
>  	kvm_vcpu_on_spin(vcpu);
>  
> @@ -8854,6 +8926,8 @@ static int vmx_check_intercept(struct kvm_vcpu *vcpu,
>  
>  void vmx_sched_in(struct kvm_vcpu *vcpu, int cpu)
>  {
> +	if (ple_gap)
> +		shrink_ple_window(vcpu);
>  }
>  
>  static struct kvm_x86_ops vmx_x86_ops = {
> @@ -9077,6 +9151,8 @@ static int __init vmx_init(void)
>  	} else
>  		kvm_disable_tdp();
>  
> +	update_ple_window_actual_max();
> +
>  	return 0;
>  
>  out7:
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 5/6] KVM: trace kvm_ple_window
  2014-08-20 20:53 ` [PATCH v2 5/6] KVM: trace kvm_ple_window Radim Krčmář
@ 2014-08-21  8:29   ` Paolo Bonzini
  2014-08-21 11:56     ` Radim Krčmář
  0 siblings, 1 reply; 28+ messages in thread
From: Paolo Bonzini @ 2014-08-21  8:29 UTC (permalink / raw)
  To: Radim Krčmář, kvm
  Cc: linux-kernel, Gleb Natapov, Raghavendra KT, Vinod Chegu, Hui-Zhi,
	Christian Borntraeger

Il 20/08/2014 22:53, Radim Krčmář ha scritto:
> Tracepoint for dynamic PLE window, fired on every potential change.
> 
> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
> ---
>  arch/x86/kvm/trace.h | 25 +++++++++++++++++++++++++
>  arch/x86/kvm/vmx.c   |  8 +++++---
>  arch/x86/kvm/x86.c   |  1 +
>  3 files changed, 31 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
> index e850a7d..4b8e6cb 100644
> --- a/arch/x86/kvm/trace.h
> +++ b/arch/x86/kvm/trace.h
> @@ -848,6 +848,31 @@ TRACE_EVENT(kvm_track_tsc,
>  		  __print_symbolic(__entry->host_clock, host_clocks))
>  );
>  
> +TRACE_EVENT(kvm_ple_window,
> +	TP_PROTO(int grow, unsigned int vcpu_id, int new, int old),

s/int grow/bool grow/ (and similarly in TP_STRUCT__entry).


> +	TP_ARGS(grow, vcpu_id, new, old),
> +
> +	TP_STRUCT__entry(
> +		__field(                 int,      grow         )
> +		__field(        unsigned int,   vcpu_id         )
> +		__field(                 int,       new         )
> +		__field(                 int,       old         )
> +	),
> +
> +	TP_fast_assign(
> +		__entry->grow           = grow;
> +		__entry->vcpu_id        = vcpu_id;
> +		__entry->new            = new;
> +		__entry->old            = old;
> +	),
> +
> +	TP_printk("vcpu %u: ple_window %d %s %d",
> +	          __entry->vcpu_id,
> +	          __entry->new,
> +	          __entry->grow ? "<+" : "<-",

? "grow" : "shrink"

With these changes,

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

Paolo

> +	          __entry->old)
> +);
> +
>  #endif /* CONFIG_X86_64 */
>  
>  #endif /* _TRACE_KVM_H */
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index e63d7ac..f63ac5d 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -5728,15 +5728,17 @@ static int __shrink_ple_window(int val, int shrinker, int minimum)
>  static void modify_ple_window(struct kvm_vcpu *vcpu, int grow)
>  {
>  	struct vcpu_vmx *vmx = to_vmx(vcpu);
> +	int old = vmx->ple_window;
>  	int new;
>  
>  	if (grow)
> -		new = __grow_ple_window(vmx->ple_window);
> +		new = __grow_ple_window(old)
>  	else
> -		new = __shrink_ple_window(vmx->ple_window, ple_window_shrink,
> -		                          ple_window);
> +		new = __shrink_ple_window(old, ple_window_shrink, ple_window);
>  
>  	vmx->ple_window = max(new, ple_window);
> +
> +	trace_kvm_ple_window(grow, vcpu->vcpu_id, vmx->ple_window, old);
>  }
>  #define grow_ple_window(vcpu)   modify_ple_window(vcpu, 1)
>  #define shrink_ple_window(vcpu) modify_ple_window(vcpu, 0)
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 5696ee7..814b20c 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -7648,3 +7648,4 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_invlpga);
>  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_skinit);
>  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_intercepts);
>  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_write_tsc_offset);
> +EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_ple_window);
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/6] KVM: add kvm_arch_sched_in
  2014-08-20 20:53 ` [PATCH v2 1/6] KVM: add kvm_arch_sched_in Radim Krčmář
@ 2014-08-21  8:29   ` Paolo Bonzini
  2014-08-21 11:38     ` Radim Krčmář
  0 siblings, 1 reply; 28+ messages in thread
From: Paolo Bonzini @ 2014-08-21  8:29 UTC (permalink / raw)
  To: Radim Krčmář, kvm
  Cc: linux-kernel, Gleb Natapov, Raghavendra KT, Vinod Chegu, Hui-Zhi,
	Christian Borntraeger

Il 20/08/2014 22:53, Radim Krčmář ha scritto:
> Introduce preempt notifiers for architecture specific code.
> Advantage over creating a new notifier in every arch is slightly simpler
> code and guaranteed call order with respect to kvm_sched_in.
> 
> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
> ---
>  arch/arm/kvm/arm.c         | 4 ++++
>  arch/mips/kvm/mips.c       | 4 ++++
>  arch/powerpc/kvm/powerpc.c | 4 ++++
>  arch/s390/kvm/kvm-s390.c   | 4 ++++
>  arch/x86/kvm/x86.c         | 4 ++++

What about adding them as static inlines in
arch/*/include/asm/kvm_host.h (except for arch/x86 of course)?

Paolo

>  include/linux/kvm_host.h   | 2 ++
>  virt/kvm/kvm_main.c        | 2 ++
>  7 files changed, 24 insertions(+)
> 
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index a99e0cd..9f788eb 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -288,6 +288,10 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
>  {
>  }
>  
> +void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu)
> +{
> +}
> +
>  void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  {
>  	vcpu->cpu = cpu;
> diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
> index cd71141..2362df2 100644
> --- a/arch/mips/kvm/mips.c
> +++ b/arch/mips/kvm/mips.c
> @@ -1002,6 +1002,10 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
>  {
>  }
>  
> +void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu)
> +{
> +}
> +
>  int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
>  				  struct kvm_translation *tr)
>  {
> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> index 4c79284..cbc432f 100644
> --- a/arch/powerpc/kvm/powerpc.c
> +++ b/arch/powerpc/kvm/powerpc.c
> @@ -720,6 +720,10 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
>  	kvmppc_subarch_vcpu_uninit(vcpu);
>  }
>  
> +void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu)
> +{
> +}
> +
>  void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  {
>  #ifdef CONFIG_BOOKE
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index ce81eb2..a3c324e 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -555,6 +555,10 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
>  	/* Nothing todo */
>  }
>  
> +void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu)
> +{
> +}
> +
>  void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  {
>  	save_fp_ctl(&vcpu->arch.host_fpregs.fpc);
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 8f1e22d..d7c214f 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -7146,6 +7146,10 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
>  		static_key_slow_dec(&kvm_no_apic_vcpu);
>  }
>  
> +void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu)
> +{
> +}
> +
>  int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>  {
>  	if (type)
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index a4c33b3..ebd7236 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -624,6 +624,8 @@ void kvm_arch_exit(void);
>  int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu);
>  void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu);
>  
> +void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu);
> +
>  void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu);
>  void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu);
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 33712fb..d3c3ed0 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -3123,6 +3123,8 @@ static void kvm_sched_in(struct preempt_notifier *pn, int cpu)
>  	if (vcpu->preempted)
>  		vcpu->preempted = false;
>  
> +	kvm_arch_sched_in(vcpu, cpu);
> +
>  	kvm_arch_vcpu_load(vcpu, cpu);
>  }
>  
> 

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/6] KVM: add kvm_arch_sched_in
  2014-08-21  8:29   ` Paolo Bonzini
@ 2014-08-21 11:38     ` Radim Krčmář
  2014-08-21 12:27       ` Paolo Bonzini
  0 siblings, 1 reply; 28+ messages in thread
From: Radim Krčmář @ 2014-08-21 11:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, linux-kernel, Gleb Natapov, Raghavendra KT, Vinod Chegu,
	Hui-Zhi, Christian Borntraeger

2014-08-21 10:29+0200, Paolo Bonzini:
> Il 20/08/2014 22:53, Radim Krčmář ha scritto:
> > Introduce preempt notifiers for architecture specific code.
> > Advantage over creating a new notifier in every arch is slightly simpler
> > code and guaranteed call order with respect to kvm_sched_in.
> > 
> > Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
> > ---
> >  arch/arm/kvm/arm.c         | 4 ++++
> >  arch/mips/kvm/mips.c       | 4 ++++
> >  arch/powerpc/kvm/powerpc.c | 4 ++++
> >  arch/s390/kvm/kvm-s390.c   | 4 ++++
> >  arch/x86/kvm/x86.c         | 4 ++++
> 
> What about adding them as static inlines in
> arch/*/include/asm/kvm_host.h (except for arch/x86 of course)?

All empty arch functions are in '.c' files, so it seems better to follow
the same path.
(And have one refactoring patch if GCC does not optimize this.)

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 3/6] KVM: VMX: make PLE window per-VCPU
  2014-08-21  8:25   ` Paolo Bonzini
@ 2014-08-21 11:38     ` Radim Krčmář
  0 siblings, 0 replies; 28+ messages in thread
From: Radim Krčmář @ 2014-08-21 11:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, linux-kernel, Gleb Natapov, Raghavendra KT, Vinod Chegu,
	Hui-Zhi, Christian Borntraeger

2014-08-21 10:25+0200, Paolo Bonzini:
> Il 20/08/2014 22:53, Radim Krčmář ha scritto:
> > +	if (ple_gap)
> > +		vmcs_write32(PLE_WINDOW, vmx->ple_window);
> 
> Maybe we can add a ple_window_dirty field?  It can be tested instead of
> ple_gap to avoid the vmwrite in the common case?

Sure, v3!

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 4/6] KVM: VMX: dynamise PLE window
  2014-08-21  8:24   ` Paolo Bonzini
@ 2014-08-21 11:47     ` Radim Krčmář
  0 siblings, 0 replies; 28+ messages in thread
From: Radim Krčmář @ 2014-08-21 11:47 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, linux-kernel, Gleb Natapov, Raghavendra KT, Vinod Chegu,
	Hui-Zhi, Christian Borntraeger

2014-08-21 10:24+0200, Paolo Bonzini:
> Il 20/08/2014 22:53, Radim Krčmář ha scritto:
> > +static void update_ple_window_actual_max(void)
> > +{
> > +	ple_window_actual_max =
> > +			__shrink_ple_window(max(ple_window_max, ple_window),
> 
> Why the max() here?

To have ple_window act as a ple_window_min as well.
(When we are already preventing most stupid choices.)

And it's cheap ... I can add comment to this function :)

> > +			                    ple_window_grow, INT_MIN);
> 
> This should be 0 (in fact you can probably make everything unsigned now
> that you've sorted out the overflows).

Not in v2:
  val = min(vmx->ple_window, ple_window_actual_max);
  val += ple_window_grow;
  vmx->ple_window = val;
so we need to dip below zero to allow all possible grows.

(I'm not sure if anyone is ever going to use the additive option, so
 getting rid of it is also a valid choice -- code would be nicer.)

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 4/6] KVM: VMX: dynamise PLE window
  2014-08-21  8:26   ` Paolo Bonzini
@ 2014-08-21 11:54     ` Radim Krčmář
  2014-08-21 12:29       ` Paolo Bonzini
  0 siblings, 1 reply; 28+ messages in thread
From: Radim Krčmář @ 2014-08-21 11:54 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, linux-kernel, Gleb Natapov, Raghavendra KT, Vinod Chegu,
	Hui-Zhi, Christian Borntraeger

2014-08-21 10:26+0200, Paolo Bonzini:
> Il 20/08/2014 22:53, Radim Krčmář ha scritto:
> > +static int __shrink_ple_window(int val, int shrinker, int minimum)
> 
> s/shrinker/factor/ or s/shrinker/param/ (shrinker has another meaning in
> the kernel).

True, thanks.

> > +{
> > +	if (shrinker < 1)
> > +		return ple_window;
> > +
> > +	if (shrinker < ple_window)
> > +		val /= shrinker;
> > +	else
> > +		val -= shrinker;
> > +
> > +	return max(val, minimum);
> 
> Any reason to use anything but ple_window as the minimum, even in
> update_ple_window_actual_max?

ple_window_actual_max needs to be one grow below the ple_window_max, so
it can be lower than ple_window.

> > +}
> > +
> > +static void modify_ple_window(struct kvm_vcpu *vcpu, int grow)
> > +{
> > +	struct vcpu_vmx *vmx = to_vmx(vcpu);
> > +	int new;
> > +
> > +	if (grow)
> > +		new = __grow_ple_window(vmx->ple_window);
> > +	else
> > +		new = __shrink_ple_window(vmx->ple_window, ple_window_shrink,
> > +		                          ple_window);
> > +
> > +	vmx->ple_window = max(new, ple_window);
> > +}
> > +#define grow_ple_window(vcpu)   modify_ple_window(vcpu, 1)
> > +#define shrink_ple_window(vcpu) modify_ple_window(vcpu, 0)
> 
> No macros please. :)

Guity as charged.
Using 0/1 or true/false in this context directly would be pretty bad ...
Is enum fine? (SHIRINK_PLE_WINDOW, GROW_PLE_WINDOW?)

(I can always make it into function pointers ;)

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 5/6] KVM: trace kvm_ple_window
  2014-08-21  8:29   ` Paolo Bonzini
@ 2014-08-21 11:56     ` Radim Krčmář
  2014-08-21 13:22       ` Paolo Bonzini
  0 siblings, 1 reply; 28+ messages in thread
From: Radim Krčmář @ 2014-08-21 11:56 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, linux-kernel, Gleb Natapov, Raghavendra KT, Vinod Chegu,
	Hui-Zhi, Christian Borntraeger

2014-08-21 10:29+0200, Paolo Bonzini:
> Il 20/08/2014 22:53, Radim Krčmář ha scritto:
> > Tracepoint for dynamic PLE window, fired on every potential change.
> > 
> > Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
> > ---
> >  arch/x86/kvm/trace.h | 25 +++++++++++++++++++++++++
> >  arch/x86/kvm/vmx.c   |  8 +++++---
> >  arch/x86/kvm/x86.c   |  1 +
> >  3 files changed, 31 insertions(+), 3 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
> > index e850a7d..4b8e6cb 100644
> > --- a/arch/x86/kvm/trace.h
> > +++ b/arch/x86/kvm/trace.h
> > @@ -848,6 +848,31 @@ TRACE_EVENT(kvm_track_tsc,
> >  		  __print_symbolic(__entry->host_clock, host_clocks))
> >  );
> >  
> > +TRACE_EVENT(kvm_ple_window,
> > +	TP_PROTO(int grow, unsigned int vcpu_id, int new, int old),
> 
> s/int grow/bool grow/ (and similarly in TP_STRUCT__entry).

Ok.
(We are using 'int' for this in other tracepoints, so I guessed there is
 some hate agains bool.)

> > +	TP_ARGS(grow, vcpu_id, new, old),
> > +
> > +	TP_STRUCT__entry(
> > +		__field(                 int,      grow         )
> > +		__field(        unsigned int,   vcpu_id         )
> > +		__field(                 int,       new         )
> > +		__field(                 int,       old         )
> > +	),
> > +
> > +	TP_fast_assign(
> > +		__entry->grow           = grow;
> > +		__entry->vcpu_id        = vcpu_id;
> > +		__entry->new            = new;
> > +		__entry->old            = old;
> > +	),
> > +
> > +	TP_printk("vcpu %u: ple_window %d %s %d",
> > +	          __entry->vcpu_id,
> > +	          __entry->new,
> > +	          __entry->grow ? "<+" : "<-",
> 
> ? "grow" : "shrink"

Can't say I haven't expected comment about that :)

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/6] KVM: add kvm_arch_sched_in
  2014-08-21 11:38     ` Radim Krčmář
@ 2014-08-21 12:27       ` Paolo Bonzini
  2014-08-21 12:50         ` Radim Krčmář
  0 siblings, 1 reply; 28+ messages in thread
From: Paolo Bonzini @ 2014-08-21 12:27 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: kvm, linux-kernel, Gleb Natapov, Raghavendra KT, Vinod Chegu,
	Hui-Zhi, Christian Borntraeger

Il 21/08/2014 13:38, Radim Krčmář ha scritto:
> 2014-08-21 10:29+0200, Paolo Bonzini:
>> Il 20/08/2014 22:53, Radim Krčmář ha scritto:
>>> Introduce preempt notifiers for architecture specific code.
>>> Advantage over creating a new notifier in every arch is slightly simpler
>>> code and guaranteed call order with respect to kvm_sched_in.
>>>
>>> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
>>> ---
>>>  arch/arm/kvm/arm.c         | 4 ++++
>>>  arch/mips/kvm/mips.c       | 4 ++++
>>>  arch/powerpc/kvm/powerpc.c | 4 ++++
>>>  arch/s390/kvm/kvm-s390.c   | 4 ++++
>>>  arch/x86/kvm/x86.c         | 4 ++++
>>
>> What about adding them as static inlines in
>> arch/*/include/asm/kvm_host.h (except for arch/x86 of course)?
> 
> All empty arch functions are in '.c' files, so it seems better to follow
> the same path.
> (And have one refactoring patch if GCC does not optimize this.)

GCC certainly does not optimize this (unless you use LTO).

Paolo


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 4/6] KVM: VMX: dynamise PLE window
  2014-08-21 11:54     ` Radim Krčmář
@ 2014-08-21 12:29       ` Paolo Bonzini
  2014-08-21 12:42         ` Radim Krčmář
  0 siblings, 1 reply; 28+ messages in thread
From: Paolo Bonzini @ 2014-08-21 12:29 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: kvm, linux-kernel, Gleb Natapov, Raghavendra KT, Vinod Chegu,
	Hui-Zhi, Christian Borntraeger

Il 21/08/2014 13:54, Radim Krčmář ha scritto:
> Guity as charged.
> Using 0/1 or true/false in this context directly would be pretty bad ...
> Is enum fine? (SHIRINK_PLE_WINDOW, GROW_PLE_WINDOW?)

I prefer good old Ctrl-C Ctrl-V (adjusted for your favorite editor). :)

Paolo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 4/6] KVM: VMX: dynamise PLE window
  2014-08-21 12:29       ` Paolo Bonzini
@ 2014-08-21 12:42         ` Radim Krčmář
  2014-08-21 13:18           ` Paolo Bonzini
  0 siblings, 1 reply; 28+ messages in thread
From: Radim Krčmář @ 2014-08-21 12:42 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, linux-kernel, Gleb Natapov, Raghavendra KT, Vinod Chegu,
	Hui-Zhi, Christian Borntraeger

2014-08-21 14:29+0200, Paolo Bonzini:
> Il 21/08/2014 13:54, Radim Krčmář ha scritto:
> > Guity as charged.
> > Using 0/1 or true/false in this context directly would be pretty bad ...
> > Is enum fine? (SHIRINK_PLE_WINDOW, GROW_PLE_WINDOW?)
> 
> I prefer good old Ctrl-C Ctrl-V (adjusted for your favorite editor). :)

I, I should be able to do it, but nightmares are going to last a while.

(And to lower chances of v4: are tracepoint macros, like for kvm_apic,
 frowned upon now?)

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/6] KVM: add kvm_arch_sched_in
  2014-08-21 12:27       ` Paolo Bonzini
@ 2014-08-21 12:50         ` Radim Krčmář
  2014-08-21 13:25           ` Paolo Bonzini
  0 siblings, 1 reply; 28+ messages in thread
From: Radim Krčmář @ 2014-08-21 12:50 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, linux-kernel, Gleb Natapov, Raghavendra KT, Vinod Chegu,
	Hui-Zhi, Christian Borntraeger

2014-08-21 14:27+0200, Paolo Bonzini:
> Il 21/08/2014 13:38, Radim Krčmář ha scritto:
> > 2014-08-21 10:29+0200, Paolo Bonzini:
> >> Il 20/08/2014 22:53, Radim Krčmář ha scritto:
> >>> Introduce preempt notifiers for architecture specific code.
> >>> Advantage over creating a new notifier in every arch is slightly simpler
> >>> code and guaranteed call order with respect to kvm_sched_in.
> >>>
> >>> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
> >>> ---
> >>>  arch/arm/kvm/arm.c         | 4 ++++
> >>>  arch/mips/kvm/mips.c       | 4 ++++
> >>>  arch/powerpc/kvm/powerpc.c | 4 ++++
> >>>  arch/s390/kvm/kvm-s390.c   | 4 ++++
> >>>  arch/x86/kvm/x86.c         | 4 ++++
> >>
> >> What about adding them as static inlines in
> >> arch/*/include/asm/kvm_host.h (except for arch/x86 of course)?
> > 
> > All empty arch functions are in '.c' files, so it seems better to follow
> > the same path.
> > (And have one refactoring patch if GCC does not optimize this.)
> 
> GCC certainly does not optimize this (unless you use LTO).

I see LTO patches in next ... do we want to move every empty arch
function into headers?
(It is probably going to take LTO few years to be enabled by default.)

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 4/6] KVM: VMX: dynamise PLE window
  2014-08-21 12:42         ` Radim Krčmář
@ 2014-08-21 13:18           ` Paolo Bonzini
  2014-08-21 13:46             ` Radim Krčmář
  0 siblings, 1 reply; 28+ messages in thread
From: Paolo Bonzini @ 2014-08-21 13:18 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: kvm, linux-kernel, Gleb Natapov, Raghavendra KT, Vinod Chegu,
	Hui-Zhi, Christian Borntraeger

Il 21/08/2014 14:42, Radim Krčmář ha scritto:
> 2014-08-21 14:29+0200, Paolo Bonzini:
>> Il 21/08/2014 13:54, Radim Krčmář ha scritto:
>>> Guity as charged.
>>> Using 0/1 or true/false in this context directly would be pretty bad ...
>>> Is enum fine? (SHIRINK_PLE_WINDOW, GROW_PLE_WINDOW?)
>>
>> I prefer good old Ctrl-C Ctrl-V (adjusted for your favorite editor). :)
> 
> I, I should be able to do it, but nightmares are going to last a while.
> 
> (And to lower chances of v4: are tracepoint macros, like for kvm_apic,
>  frowned upon now?)

Nah, go ahead.  But is there any example of #define like what you were
doing?  I tried grepping for "#define.*[a-z]\(" and didn't get anything
similar.

Paolo


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 5/6] KVM: trace kvm_ple_window
  2014-08-21 11:56     ` Radim Krčmář
@ 2014-08-21 13:22       ` Paolo Bonzini
  2014-08-21 13:49         ` Radim Krčmář
  0 siblings, 1 reply; 28+ messages in thread
From: Paolo Bonzini @ 2014-08-21 13:22 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: kvm, linux-kernel, Gleb Natapov, Raghavendra KT, Vinod Chegu,
	Hui-Zhi, Christian Borntraeger

Il 21/08/2014 13:56, Radim Krčmář ha scritto:
> 2014-08-21 10:29+0200, Paolo Bonzini:
>> Il 20/08/2014 22:53, Radim Krčmář ha scritto:
>>> Tracepoint for dynamic PLE window, fired on every potential change.
>>>
>>> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
>>> ---
>>>  arch/x86/kvm/trace.h | 25 +++++++++++++++++++++++++
>>>  arch/x86/kvm/vmx.c   |  8 +++++---
>>>  arch/x86/kvm/x86.c   |  1 +
>>>  3 files changed, 31 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
>>> index e850a7d..4b8e6cb 100644
>>> --- a/arch/x86/kvm/trace.h
>>> +++ b/arch/x86/kvm/trace.h
>>> @@ -848,6 +848,31 @@ TRACE_EVENT(kvm_track_tsc,
>>>  		  __print_symbolic(__entry->host_clock, host_clocks))
>>>  );
>>>  
>>> +TRACE_EVENT(kvm_ple_window,
>>> +	TP_PROTO(int grow, unsigned int vcpu_id, int new, int old),
>>
>> s/int grow/bool grow/ (and similarly in TP_STRUCT__entry).
> 
> Ok.
> (We are using 'int' for this in other tracepoints, so I guessed there is
>  some hate agains bool.)

Yeah, there are many ints, but there are also some bools. :)

Of course "bool" is only a no-brainer if there is a clear direction to
use (e.g. "bool has_error"), and indeed this case is less obvious.  But
there are some like this one (e.g. bool gpa_match).

Keep the int if you prefer it that way.

Paolo

>>> +	TP_ARGS(grow, vcpu_id, new, old),
>>> +
>>> +	TP_STRUCT__entry(
>>> +		__field(                 int,      grow         )
>>> +		__field(        unsigned int,   vcpu_id         )
>>> +		__field(                 int,       new         )
>>> +		__field(                 int,       old         )
>>> +	),
>>> +
>>> +	TP_fast_assign(
>>> +		__entry->grow           = grow;
>>> +		__entry->vcpu_id        = vcpu_id;
>>> +		__entry->new            = new;
>>> +		__entry->old            = old;
>>> +	),
>>> +
>>> +	TP_printk("vcpu %u: ple_window %d %s %d",
>>> +	          __entry->vcpu_id,
>>> +	          __entry->new,
>>> +	          __entry->grow ? "<+" : "<-",
>>
>> ? "grow" : "shrink"
> 
> Can't say I haven't expected comment about that :)
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/6] KVM: add kvm_arch_sched_in
  2014-08-21 12:50         ` Radim Krčmář
@ 2014-08-21 13:25           ` Paolo Bonzini
  0 siblings, 0 replies; 28+ messages in thread
From: Paolo Bonzini @ 2014-08-21 13:25 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: kvm, linux-kernel, Gleb Natapov, Raghavendra KT, Vinod Chegu,
	Hui-Zhi, Christian Borntraeger

Il 21/08/2014 14:50, Radim Krčmář ha scritto:
> > > 
> > > All empty arch functions are in '.c' files, so it seems better to follow
> > > the same path.
> > > (And have one refactoring patch if GCC does not optimize this.)
> > 
> > GCC certainly does not optimize this (unless you use LTO).
> 
> I see LTO patches in next ... do we want to move every empty arch
> function into headers?

I wouldn't reject the patches.

(It would also save some lines of code, since in the headers it's common
to do "static inline void foo(void) {}" on a single line).

> (It is probably going to take LTO few years to be enabled by default.)

Indeed...  see also
http://lwn.net/SubscriberLink/608945/7763ad2aee106f1d/ from today's LWN
issue.

Paolo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 4/6] KVM: VMX: dynamise PLE window
  2014-08-21 13:18           ` Paolo Bonzini
@ 2014-08-21 13:46             ` Radim Krčmář
  0 siblings, 0 replies; 28+ messages in thread
From: Radim Krčmář @ 2014-08-21 13:46 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, linux-kernel, Gleb Natapov, Raghavendra KT, Vinod Chegu,
	Hui-Zhi, Christian Borntraeger

2014-08-21 15:18+0200, Paolo Bonzini:
> Il 21/08/2014 14:42, Radim Krčmář ha scritto:
> > 2014-08-21 14:29+0200, Paolo Bonzini:
> >> Il 21/08/2014 13:54, Radim Krčmář ha scritto:
> >>> Guity as charged.
> >>> Using 0/1 or true/false in this context directly would be pretty bad ...
> >>> Is enum fine? (SHIRINK_PLE_WINDOW, GROW_PLE_WINDOW?)
> >>
> >> I prefer good old Ctrl-C Ctrl-V (adjusted for your favorite editor). :)
> > 
> > I, I should be able to do it, but nightmares are going to last a while.
> > 
> > (And to lower chances of v4: are tracepoint macros, like for kvm_apic,
> >  frowned upon now?)
> 
> Nah, go ahead.  But is there any example of #define like what you were
> doing?  I tried grepping for "#define.*[a-z]\(" and didn't get anything
> similar.

No, at least not in KVM code, it was the same idea as tracepoints:
less magical constants at the cost of (syntactic) indirection.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 5/6] KVM: trace kvm_ple_window
  2014-08-21 13:22       ` Paolo Bonzini
@ 2014-08-21 13:49         ` Radim Krčmář
  0 siblings, 0 replies; 28+ messages in thread
From: Radim Krčmář @ 2014-08-21 13:49 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, linux-kernel, Gleb Natapov, Raghavendra KT, Vinod Chegu,
	Hui-Zhi, Christian Borntraeger

2014-08-21 15:22+0200, Paolo Bonzini:
> Il 21/08/2014 13:56, Radim Krčmář ha scritto:
> > 2014-08-21 10:29+0200, Paolo Bonzini:
> >> Il 20/08/2014 22:53, Radim Krčmář ha scritto:
> >>> +	TP_PROTO(int grow, unsigned int vcpu_id, int new, int old),
> >>
> >> s/int grow/bool grow/ (and similarly in TP_STRUCT__entry).
> > 
> > Ok.
> > (We are using 'int' for this in other tracepoints, so I guessed there is
> >  some hate agains bool.)
> 
> Yeah, there are many ints, but there are also some bools. :)
> 
> Of course "bool" is only a no-brainer if there is a clear direction to
> use (e.g. "bool has_error"), and indeed this case is less obvious.  But
> there are some like this one (e.g. bool gpa_match).
> 
> Keep the int if you prefer it that way.

I actually like bool much better, but first few were only int and then I
stopped looking ...

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2014-08-21 13:49 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-20 20:53 [PATCH v2 0/6] Dynamic Pause Loop Exiting window Radim Krčmář
2014-08-20 20:53 ` [PATCH v2 1/6] KVM: add kvm_arch_sched_in Radim Krčmář
2014-08-21  8:29   ` Paolo Bonzini
2014-08-21 11:38     ` Radim Krčmář
2014-08-21 12:27       ` Paolo Bonzini
2014-08-21 12:50         ` Radim Krčmář
2014-08-21 13:25           ` Paolo Bonzini
2014-08-20 20:53 ` [PATCH v2 2/6] KVM: x86: introduce sched_in to kvm_x86_ops Radim Krčmář
2014-08-20 20:53 ` [PATCH v2 3/6] KVM: VMX: make PLE window per-VCPU Radim Krčmář
2014-08-21  8:25   ` Paolo Bonzini
2014-08-21 11:38     ` Radim Krčmář
2014-08-20 20:53 ` [PATCH v2 4/6] KVM: VMX: dynamise PLE window Radim Krčmář
2014-08-21  8:24   ` Paolo Bonzini
2014-08-21 11:47     ` Radim Krčmář
2014-08-21  8:26   ` Paolo Bonzini
2014-08-21 11:54     ` Radim Krčmář
2014-08-21 12:29       ` Paolo Bonzini
2014-08-21 12:42         ` Radim Krčmář
2014-08-21 13:18           ` Paolo Bonzini
2014-08-21 13:46             ` Radim Krčmář
2014-08-20 20:53 ` [PATCH v2 5/6] KVM: trace kvm_ple_window Radim Krčmář
2014-08-21  8:29   ` Paolo Bonzini
2014-08-21 11:56     ` Radim Krčmář
2014-08-21 13:22       ` Paolo Bonzini
2014-08-21 13:49         ` Radim Krčmář
2014-08-20 20:53 ` [PATCH v2 6/6] KVM: VMX: runtime knobs for dynamic PLE window Radim Krčmář
2014-08-21  6:49 ` [PATCH v2 0/6] Dynamic Pause Loop Exiting window Zhao, Hui-Zhi (Steven, HPservers-Core-OE-PSC)
2014-08-21  6:49   ` Zhao, Hui-Zhi (Steven, HPservers-Core-OE-PSC)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.