All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/9] Dynamic Pause Loop Exiting window.
@ 2014-08-19 20:35 Radim Krčmář
  2014-08-19 20:35 ` [PATCH 1/9] KVM: add kvm_arch_sched_in Radim Krčmář
                   ` (9 more replies)
  0 siblings, 10 replies; 29+ messages in thread
From: Radim Krčmář @ 2014-08-19 20:35 UTC (permalink / raw)
  To: kvm
  Cc: linux-kernel, Paolo Bonzini, Gleb Natapov, Raghavendra KT,
	Vinod Chegu, Hui-Zhi

PLE does not scale in its current form.  When increasing VCPU count
above 150, one can hit soft lockups because of runqueue lock contention.
(Which says a lot about performance.)

The main reason is that kvm_ple_loop cycles through all VCPUs.
Replacing it with a scalable solution would be ideal, but it has already
been well optimized for various workloads, so this series tries to
alleviate one different major problem while minimizing a chance of
regressions: we have too many useless PLE exits.

Just increasing PLE window would help some cases, but it still spirals
out of control.  By increasing the window after every PLE exit, we can
limit the amount of useless ones, so we don't reach the state where CPUs
spend 99% of the time waiting for a lock.

HP confirmed that this series avoids soft lockups and TSC sync errors on
large guests.

---
Design notes and questions:

Alternative to first two patches could be a new notifier.

All values are made changeable because defaults weren't selected after
weeks of benchmarking -- we can get improved performance by hardcoding
if someone is willing to do it.
(Or by presuming that noone is ever going to.)

Then, we can quite safely drop overflow checks: they are impossible to
hit with small increases and I don't think that anyone wants large ones.

Also, I'd argue against the last patch: it should be done in userspace,
but I'm not sure about Linux's policy.


Radim Krčmář (9):
  KVM: add kvm_arch_sched_in
  KVM: x86: introduce sched_in to kvm_x86_ops
  KVM: VMX: make PLE window per-vcpu
  KVM: VMX: dynamise PLE window
  KVM: VMX: clamp PLE window
  KVM: trace kvm_ple_window grow/shrink
  KVM: VMX: abstract ple_window modifiers
  KVM: VMX: runtime knobs for dynamic PLE window
  KVM: VMX: automatic PLE window maximum

 arch/arm/kvm/arm.c              |  4 ++
 arch/mips/kvm/mips.c            |  4 ++
 arch/powerpc/kvm/powerpc.c      |  4 ++
 arch/s390/kvm/kvm-s390.c        |  4 ++
 arch/x86/include/asm/kvm_host.h |  2 +
 arch/x86/kvm/svm.c              |  6 +++
 arch/x86/kvm/trace.h            | 29 +++++++++++++
 arch/x86/kvm/vmx.c              | 93 +++++++++++++++++++++++++++++++++++++++--
 arch/x86/kvm/x86.c              |  6 +++
 include/linux/kvm_host.h        |  2 +
 virt/kvm/kvm_main.c             |  2 +
 11 files changed, 153 insertions(+), 3 deletions(-)

-- 
2.0.4


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 1/9] KVM: add kvm_arch_sched_in
  2014-08-19 20:35 [PATCH 0/9] Dynamic Pause Loop Exiting window Radim Krčmář
@ 2014-08-19 20:35 ` Radim Krčmář
  2014-08-20  7:47   ` Christian Borntraeger
  2014-08-19 20:35 ` [PATCH 2/9] KVM: x86: introduce sched_in to kvm_x86_ops Radim Krčmář
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 29+ messages in thread
From: Radim Krčmář @ 2014-08-19 20:35 UTC (permalink / raw)
  To: kvm
  Cc: linux-kernel, Paolo Bonzini, Gleb Natapov, Raghavendra KT,
	Vinod Chegu, Hui-Zhi

Introduce preempt notifiers for architecture specific code.
Advantage over creating a new notifier in every arch is slightly simpler
code and guaranteed call order with respect to kvm_sched_in.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 arch/arm/kvm/arm.c         | 4 ++++
 arch/mips/kvm/mips.c       | 4 ++++
 arch/powerpc/kvm/powerpc.c | 4 ++++
 arch/s390/kvm/kvm-s390.c   | 4 ++++
 arch/x86/kvm/x86.c         | 4 ++++
 include/linux/kvm_host.h   | 2 ++
 virt/kvm/kvm_main.c        | 2 ++
 7 files changed, 24 insertions(+)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index a99e0cd..9f788eb 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -288,6 +288,10 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
 {
 }
 
+void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu)
+{
+}
+
 void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
 	vcpu->cpu = cpu;
diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index cd71141..2362df2 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -1002,6 +1002,10 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
 {
 }
 
+void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu)
+{
+}
+
 int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
 				  struct kvm_translation *tr)
 {
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 4c79284..cbc432f 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -720,6 +720,10 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
 	kvmppc_subarch_vcpu_uninit(vcpu);
 }
 
+void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu)
+{
+}
+
 void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
 #ifdef CONFIG_BOOKE
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index ce81eb2..a3c324e 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -555,6 +555,10 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
 	/* Nothing todo */
 }
 
+void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu)
+{
+}
+
 void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
 	save_fp_ctl(&vcpu->arch.host_fpregs.fpc);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8f1e22d..d7c214f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7146,6 +7146,10 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
 		static_key_slow_dec(&kvm_no_apic_vcpu);
 }
 
+void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu)
+{
+}
+
 int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 {
 	if (type)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index a4c33b3..ebd7236 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -624,6 +624,8 @@ void kvm_arch_exit(void);
 int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu);
 void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu);
 
+void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu);
+
 void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu);
 void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 33712fb..d3c3ed0 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3123,6 +3123,8 @@ static void kvm_sched_in(struct preempt_notifier *pn, int cpu)
 	if (vcpu->preempted)
 		vcpu->preempted = false;
 
+	kvm_arch_sched_in(vcpu, cpu);
+
 	kvm_arch_vcpu_load(vcpu, cpu);
 }
 
-- 
2.0.4


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 2/9] KVM: x86: introduce sched_in to kvm_x86_ops
  2014-08-19 20:35 [PATCH 0/9] Dynamic Pause Loop Exiting window Radim Krčmář
  2014-08-19 20:35 ` [PATCH 1/9] KVM: add kvm_arch_sched_in Radim Krčmář
@ 2014-08-19 20:35 ` Radim Krčmář
  2014-08-19 20:35 ` [PATCH 3/9] KVM: VMX: make PLE window per-vcpu Radim Krčmář
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 29+ messages in thread
From: Radim Krčmář @ 2014-08-19 20:35 UTC (permalink / raw)
  To: kvm
  Cc: linux-kernel, Paolo Bonzini, Gleb Natapov, Raghavendra KT,
	Vinod Chegu, Hui-Zhi

sched_in preempt notifier is available for x86, allow its use in
specific virtualization technlogies as well.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 arch/x86/include/asm/kvm_host.h | 2 ++
 arch/x86/kvm/svm.c              | 6 ++++++
 arch/x86/kvm/vmx.c              | 6 ++++++
 arch/x86/kvm/x86.c              | 1 +
 4 files changed, 15 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 5724601..358e2f3 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -772,6 +772,8 @@ struct kvm_x86_ops {
 	bool (*mpx_supported)(void);
 
 	int (*check_nested_events)(struct kvm_vcpu *vcpu, bool external_intr);
+
+	void (*sched_in)(struct kvm_vcpu *kvm, int cpu);
 };
 
 struct kvm_arch_async_pf {
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index ddf7427..4baf1bc 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -4305,6 +4305,10 @@ static void svm_handle_external_intr(struct kvm_vcpu *vcpu)
 	local_irq_enable();
 }
 
+static void svm_sched_in(struct kvm_vcpu *vcpu, int cpu)
+{
+}
+
 static struct kvm_x86_ops svm_x86_ops = {
 	.cpu_has_kvm_support = has_svm,
 	.disabled_by_bios = is_disabled,
@@ -4406,6 +4410,8 @@ static struct kvm_x86_ops svm_x86_ops = {
 
 	.check_intercept = svm_check_intercept,
 	.handle_external_intr = svm_handle_external_intr,
+
+	.sched_in = svm_sched_in,
 };
 
 static int __init svm_init(void)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index bfe11cf..2b306f9 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -8846,6 +8846,10 @@ static int vmx_check_intercept(struct kvm_vcpu *vcpu,
 	return X86EMUL_CONTINUE;
 }
 
+void vmx_sched_in(struct kvm_vcpu *vcpu, int cpu)
+{
+}
+
 static struct kvm_x86_ops vmx_x86_ops = {
 	.cpu_has_kvm_support = cpu_has_kvm_support,
 	.disabled_by_bios = vmx_disabled_by_bios,
@@ -8951,6 +8955,8 @@ static struct kvm_x86_ops vmx_x86_ops = {
 	.mpx_supported = vmx_mpx_supported,
 
 	.check_nested_events = vmx_check_nested_events,
+
+	.sched_in = vmx_sched_in,
 };
 
 static int __init vmx_init(void)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d7c214f..5696ee7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7148,6 +7148,7 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
 
 void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu)
 {
+	kvm_x86_ops->sched_in(vcpu, cpu);
 }
 
 int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
-- 
2.0.4


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 3/9] KVM: VMX: make PLE window per-vcpu
  2014-08-19 20:35 [PATCH 0/9] Dynamic Pause Loop Exiting window Radim Krčmář
  2014-08-19 20:35 ` [PATCH 1/9] KVM: add kvm_arch_sched_in Radim Krčmář
  2014-08-19 20:35 ` [PATCH 2/9] KVM: x86: introduce sched_in to kvm_x86_ops Radim Krčmář
@ 2014-08-19 20:35 ` Radim Krčmář
  2014-08-20  7:13   ` Paolo Bonzini
  2014-08-19 20:35 ` [PATCH 4/9] KVM: VMX: dynamise PLE window Radim Krčmář
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 29+ messages in thread
From: Radim Krčmář @ 2014-08-19 20:35 UTC (permalink / raw)
  To: kvm
  Cc: linux-kernel, Paolo Bonzini, Gleb Natapov, Raghavendra KT,
	Vinod Chegu, Hui-Zhi

Change PLE window into per-vcpu variable, seeded from module parameter,
to allow greater flexibility.

Brings in a small overhead on every vmentry.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 I've been thinking about a general hierarchical per-vcpu variable model,
 but it's hard to have current performance and sane code.

 arch/x86/kvm/vmx.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 2b306f9..eaa5574 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -484,6 +484,9 @@ struct vcpu_vmx {
 
 	/* Support for a guest hypervisor (nested VMX) */
 	struct nested_vmx nested;
+
+	/* Dynamic PLE window. */
+	int ple_window;
 };
 
 enum segment_cache_field {
@@ -4403,6 +4406,7 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
 	if (ple_gap) {
 		vmcs_write32(PLE_GAP, ple_gap);
 		vmcs_write32(PLE_WINDOW, ple_window);
+		vmx->ple_window = ple_window;
 	}
 
 	vmcs_write32(PAGE_FAULT_ERROR_CODE_MASK, 0);
@@ -7387,6 +7391,9 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
 	if (vmx->emulation_required)
 		return;
 
+	if (ple_gap)
+		vmcs_write32(PLE_WINDOW, vmx->ple_window);
+
 	if (vmx->nested.sync_shadow_vmcs) {
 		copy_vmcs12_to_shadow(vmx);
 		vmx->nested.sync_shadow_vmcs = false;
-- 
2.0.4


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 4/9] KVM: VMX: dynamise PLE window
  2014-08-19 20:35 [PATCH 0/9] Dynamic Pause Loop Exiting window Radim Krčmář
                   ` (2 preceding siblings ...)
  2014-08-19 20:35 ` [PATCH 3/9] KVM: VMX: make PLE window per-vcpu Radim Krčmář
@ 2014-08-19 20:35 ` Radim Krčmář
  2014-08-19 20:35 ` [PATCH 5/9] KVM: VMX: clamp " Radim Krčmář
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 29+ messages in thread
From: Radim Krčmář @ 2014-08-19 20:35 UTC (permalink / raw)
  To: kvm
  Cc: linux-kernel, Paolo Bonzini, Gleb Natapov, Raghavendra KT,
	Vinod Chegu, Hui-Zhi

Window is increased on every PLE exit and decreased on every sched_in.
The idea is that we don't want to PLE exit if there is no preemption
going on.

We do this with sched_in() because it does not hold rq lock.

There are two new kernel parameters for changing the window:
 ple_window_grow and ple_window_shrink
ple_window_grow affects the window on PLE exit and ple_window_shrink
does it on sched_in;  depending on their value, the window is modifier
like this: (ple_window is kvm_intel's global)

  ple_window_shrink/ |
  ple_window_grow    | PLE exit           | sched_in
  -------------------+--------------------+---------------------
  < 1                |  = ple_window      |  = ple_window
  < ple_window       | *= ple_window_grow | /= ple_window_shrink
  otherwise          | += ple_window_grow | -= ple_window_shrink

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 arch/x86/kvm/vmx.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 50 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index eaa5574..66259fd 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -125,14 +125,25 @@ module_param(nested, bool, S_IRUGO);
  * Time is measured based on a counter that runs at the same rate as the TSC,
  * refer SDM volume 3b section 21.6.13 & 22.1.3.
  */
-#define KVM_VMX_DEFAULT_PLE_GAP    128
-#define KVM_VMX_DEFAULT_PLE_WINDOW 4096
+#define KVM_VMX_DEFAULT_PLE_GAP           128
+#define KVM_VMX_DEFAULT_PLE_WINDOW        4096
+#define KVM_VMX_DEFAULT_PLE_WINDOW_GROW   2
+#define KVM_VMX_DEFAULT_PLE_WINDOW_SHRINK 0
+
 static int ple_gap = KVM_VMX_DEFAULT_PLE_GAP;
 module_param(ple_gap, int, S_IRUGO);
 
 static int ple_window = KVM_VMX_DEFAULT_PLE_WINDOW;
 module_param(ple_window, int, S_IRUGO);
 
+/* Default doubles per-vcpu window every exit. */
+static int ple_window_grow = KVM_VMX_DEFAULT_PLE_WINDOW_GROW;
+module_param(ple_window_grow, int, S_IRUGO);
+
+/* Default resets per-vcpu window every exit to ple_window. */
+static int ple_window_shrink = KVM_VMX_DEFAULT_PLE_WINDOW_SHRINK;
+module_param(ple_window_shrink, int, S_IRUGO);
+
 extern const ulong vmx_return;
 
 #define NR_AUTOLOAD_MSRS 8
@@ -5680,12 +5691,47 @@ out:
 	return ret;
 }
 
+static void grow_ple_window(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
+	int old = vmx->ple_window;
+	int new;
+
+	if (ple_window_grow < 1)
+		new = ple_window;
+	else if (ple_window_grow < ple_window)
+		new = old * ple_window_grow;
+	else
+		new = old + ple_window_grow;
+
+	vmx->ple_window = new;
+}
+
+static void shrink_ple_window(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
+	int old = vmx->ple_window;
+	int new;
+
+	if (ple_window_shrink < 1)
+		new = ple_window;
+	else if (ple_window_shrink < ple_window)
+		new = old / ple_window_shrink;
+	else
+		new = old - ple_window_shrink;
+
+	vmx->ple_window = new;
+}
+
 /*
  * Indicate a busy-waiting vcpu in spinlock. We do not enable the PAUSE
  * exiting, so only get here on cpu with PAUSE-Loop-Exiting.
  */
 static int handle_pause(struct kvm_vcpu *vcpu)
 {
+	if (ple_gap)
+		grow_ple_window(vcpu);
+
 	skip_emulated_instruction(vcpu);
 	kvm_vcpu_on_spin(vcpu);
 
@@ -8855,6 +8901,8 @@ static int vmx_check_intercept(struct kvm_vcpu *vcpu,
 
 void vmx_sched_in(struct kvm_vcpu *vcpu, int cpu)
 {
+	if (ple_gap)
+		shrink_ple_window(vcpu);
 }
 
 static struct kvm_x86_ops vmx_x86_ops = {
-- 
2.0.4


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 5/9] KVM: VMX: clamp PLE window
  2014-08-19 20:35 [PATCH 0/9] Dynamic Pause Loop Exiting window Radim Krčmář
                   ` (3 preceding siblings ...)
  2014-08-19 20:35 ` [PATCH 4/9] KVM: VMX: dynamise PLE window Radim Krčmář
@ 2014-08-19 20:35 ` Radim Krčmář
  2014-08-20  7:18   ` Paolo Bonzini
  2014-08-19 20:35 ` [PATCH 6/9] KVM: trace kvm_ple_window grow/shrink Radim Krčmář
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 29+ messages in thread
From: Radim Krčmář @ 2014-08-19 20:35 UTC (permalink / raw)
  To: kvm
  Cc: linux-kernel, Paolo Bonzini, Gleb Natapov, Raghavendra KT,
	Vinod Chegu, Hui-Zhi

Modifications could get unwanted values of PLE window. (low or negative)
Use ple_window and the maximal value that cannot overflow as bounds.

ple_window_max defaults to a very high value, but it would make sense to
set it to some fraction of the scheduler tick.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 arch/x86/kvm/vmx.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 66259fd..e1192fb 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -144,6 +144,10 @@ module_param(ple_window_grow, int, S_IRUGO);
 static int ple_window_shrink = KVM_VMX_DEFAULT_PLE_WINDOW_SHRINK;
 module_param(ple_window_shrink, int, S_IRUGO);
 
+/* Default is to compute the maximum so we can never overflow. */
+static int ple_window_max = INT_MAX / KVM_VMX_DEFAULT_PLE_WINDOW_GROW;
+module_param(ple_window_max, int, S_IRUGO);
+
 extern const ulong vmx_return;
 
 #define NR_AUTOLOAD_MSRS 8
@@ -5704,7 +5708,7 @@ static void grow_ple_window(struct kvm_vcpu *vcpu)
 	else
 		new = old + ple_window_grow;
 
-	vmx->ple_window = new;
+	vmx->ple_window = min(new, ple_window_max);
 }
 
 static void shrink_ple_window(struct kvm_vcpu *vcpu)
@@ -5720,7 +5724,7 @@ static void shrink_ple_window(struct kvm_vcpu *vcpu)
 	else
 		new = old - ple_window_shrink;
 
-	vmx->ple_window = new;
+	vmx->ple_window = max(new, ple_window);
 }
 
 /*
-- 
2.0.4


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 6/9] KVM: trace kvm_ple_window grow/shrink
  2014-08-19 20:35 [PATCH 0/9] Dynamic Pause Loop Exiting window Radim Krčmář
                   ` (4 preceding siblings ...)
  2014-08-19 20:35 ` [PATCH 5/9] KVM: VMX: clamp " Radim Krčmář
@ 2014-08-19 20:35 ` Radim Krčmář
  2014-08-19 20:35 ` [PATCH 7/9] KVM: VMX: abstract ple_window modifiers Radim Krčmář
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 29+ messages in thread
From: Radim Krčmář @ 2014-08-19 20:35 UTC (permalink / raw)
  To: kvm
  Cc: linux-kernel, Paolo Bonzini, Gleb Natapov, Raghavendra KT,
	Vinod Chegu, Hui-Zhi

Tracepoint for dynamic PLE window, fired on every potential change.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 arch/x86/kvm/trace.h | 29 +++++++++++++++++++++++++++++
 arch/x86/kvm/vmx.c   |  4 ++++
 arch/x86/kvm/x86.c   |  1 +
 3 files changed, 34 insertions(+)

diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index e850a7d..e4682f5 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -848,6 +848,35 @@ TRACE_EVENT(kvm_track_tsc,
 		  __print_symbolic(__entry->host_clock, host_clocks))
 );
 
+TRACE_EVENT(kvm_ple_window,
+	TP_PROTO(int grow, unsigned int vcpu_id, int new, int old),
+	TP_ARGS(grow, vcpu_id, new, old),
+
+	TP_STRUCT__entry(
+		__field(                 int,      grow         )
+		__field(        unsigned int,   vcpu_id         )
+		__field(                 int,       new         )
+		__field(                 int,       old         )
+	),
+
+	TP_fast_assign(
+		__entry->grow           = grow;
+		__entry->vcpu_id        = vcpu_id;
+		__entry->new            = new;
+		__entry->old            = old;
+	),
+
+	TP_printk("vcpu %u: ple_window %d %s %d",
+	          __entry->vcpu_id,
+	          __entry->new,
+	          __entry->grow ? "<+" : "<-",
+	          __entry->old)
+);
+#define trace_kvm_ple_window_grow(vcpu_id, new, old) \
+	trace_kvm_ple_window(1, vcpu_id, new, old)
+#define trace_kvm_ple_window_shrink(vcpu_id, new, old) \
+	trace_kvm_ple_window(0, vcpu_id, new, old)
+
 #endif /* CONFIG_X86_64 */
 
 #endif /* _TRACE_KVM_H */
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index e1192fb..a236a9f 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5709,6 +5709,8 @@ static void grow_ple_window(struct kvm_vcpu *vcpu)
 		new = old + ple_window_grow;
 
 	vmx->ple_window = min(new, ple_window_max);
+
+	trace_kvm_ple_window_grow(vcpu->vcpu_id, vmx->ple_window, old);
 }
 
 static void shrink_ple_window(struct kvm_vcpu *vcpu)
@@ -5725,6 +5727,8 @@ static void shrink_ple_window(struct kvm_vcpu *vcpu)
 		new = old - ple_window_shrink;
 
 	vmx->ple_window = max(new, ple_window);
+
+	trace_kvm_ple_window_shrink(vcpu->vcpu_id, vmx->ple_window, old);
 }
 
 /*
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5696ee7..814b20c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7648,3 +7648,4 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_invlpga);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_skinit);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_intercepts);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_write_tsc_offset);
+EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_ple_window);
-- 
2.0.4


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 7/9] KVM: VMX: abstract ple_window modifiers
  2014-08-19 20:35 [PATCH 0/9] Dynamic Pause Loop Exiting window Radim Krčmář
                   ` (5 preceding siblings ...)
  2014-08-19 20:35 ` [PATCH 6/9] KVM: trace kvm_ple_window grow/shrink Radim Krčmář
@ 2014-08-19 20:35 ` Radim Krčmář
  2014-08-20  7:02   ` Paolo Bonzini
  2014-08-19 20:35 ` [PATCH 8/9] KVM: VMX: runtime knobs for dynamic PLE window Radim Krčmář
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 29+ messages in thread
From: Radim Krčmář @ 2014-08-19 20:35 UTC (permalink / raw)
  To: kvm
  Cc: linux-kernel, Paolo Bonzini, Gleb Natapov, Raghavendra KT,
	Vinod Chegu, Hui-Zhi

They were almost identical and thus merged with a loathable macro.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 This solution is hopefully more acceptable than function pointers.

 arch/x86/kvm/vmx.c | 53 +++++++++++++++++++----------------------------------
 1 file changed, 19 insertions(+), 34 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index a236a9f..c6cfb71 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5694,42 +5694,27 @@ static int handle_invalid_guest_state(struct kvm_vcpu *vcpu)
 out:
 	return ret;
 }
-
-static void grow_ple_window(struct kvm_vcpu *vcpu)
-{
-	struct vcpu_vmx *vmx = to_vmx(vcpu);
-	int old = vmx->ple_window;
-	int new;
-
-	if (ple_window_grow < 1)
-		new = ple_window;
-	else if (ple_window_grow < ple_window)
-		new = old * ple_window_grow;
-	else
-		new = old + ple_window_grow;
-
-	vmx->ple_window = min(new, ple_window_max);
-
-	trace_kvm_ple_window_grow(vcpu->vcpu_id, vmx->ple_window, old);
+#define make_ple_window_modifier(type, oplt, opge, cmp, bound) \
+static void type##_ple_window(struct kvm_vcpu *vcpu) \
+{ \
+	struct vcpu_vmx *vmx = to_vmx(vcpu); \
+	int old = vmx->ple_window; \
+	int new; \
+\
+	if (ple_window_##type < 1) \
+		new = ple_window; \
+	else if (ple_window_##type < ple_window) \
+		new = old oplt ple_window_##type; \
+	else \
+		new = old opge ple_window_##type; \
+\
+	vmx->ple_window = cmp(new, bound); \
+\
+	trace_kvm_ple_window_##type(vcpu->vcpu_id, vmx->ple_window, old); \
 }
 
-static void shrink_ple_window(struct kvm_vcpu *vcpu)
-{
-	struct vcpu_vmx *vmx = to_vmx(vcpu);
-	int old = vmx->ple_window;
-	int new;
-
-	if (ple_window_shrink < 1)
-		new = ple_window;
-	else if (ple_window_shrink < ple_window)
-		new = old / ple_window_shrink;
-	else
-		new = old - ple_window_shrink;
-
-	vmx->ple_window = max(new, ple_window);
-
-	trace_kvm_ple_window_shrink(vcpu->vcpu_id, vmx->ple_window, old);
-}
+make_ple_window_modifier(grow,   *, +, min, ple_window_max)
+make_ple_window_modifier(shrink, /, -, max, ple_window)
 
 /*
  * Indicate a busy-waiting vcpu in spinlock. We do not enable the PAUSE
-- 
2.0.4


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 8/9] KVM: VMX: runtime knobs for dynamic PLE window
  2014-08-19 20:35 [PATCH 0/9] Dynamic Pause Loop Exiting window Radim Krčmář
                   ` (6 preceding siblings ...)
  2014-08-19 20:35 ` [PATCH 7/9] KVM: VMX: abstract ple_window modifiers Radim Krčmář
@ 2014-08-19 20:35 ` Radim Krčmář
  2014-08-19 20:35 ` [PATCH 9/9] KVM: VMX: automatic PLE window maximum Radim Krčmář
  2014-08-21  6:48   ` Zhao, Hui-Zhi (Steven, HPservers-Core-OE-PSC)
  9 siblings, 0 replies; 29+ messages in thread
From: Radim Krčmář @ 2014-08-19 20:35 UTC (permalink / raw)
  To: kvm
  Cc: linux-kernel, Paolo Bonzini, Gleb Natapov, Raghavendra KT,
	Vinod Chegu, Hui-Zhi

ple_window is updated on every vmentry, so there is no reason to have it
read-only anymore.
ple_window_* weren't writable to prevent runtime overflow races;
they are mitigated by clamping the value of ple_window.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 If we decide to ignore insane overflows, last two hunks can be dropped.

 arch/x86/kvm/vmx.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c6cfb71..d7f58e8 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -134,19 +134,19 @@ static int ple_gap = KVM_VMX_DEFAULT_PLE_GAP;
 module_param(ple_gap, int, S_IRUGO);
 
 static int ple_window = KVM_VMX_DEFAULT_PLE_WINDOW;
-module_param(ple_window, int, S_IRUGO);
+module_param(ple_window, int, S_IRUGO | S_IWUSR);
 
 /* Default doubles per-vcpu window every exit. */
 static int ple_window_grow = KVM_VMX_DEFAULT_PLE_WINDOW_GROW;
-module_param(ple_window_grow, int, S_IRUGO);
+module_param(ple_window_grow, int, S_IRUGO | S_IWUSR);
 
 /* Default resets per-vcpu window every exit to ple_window. */
 static int ple_window_shrink = KVM_VMX_DEFAULT_PLE_WINDOW_SHRINK;
-module_param(ple_window_shrink, int, S_IRUGO);
+module_param(ple_window_shrink, int, S_IRUGO | S_IWUSR);
 
 /* Default is to compute the maximum so we can never overflow. */
 static int ple_window_max = INT_MAX / KVM_VMX_DEFAULT_PLE_WINDOW_GROW;
-module_param(ple_window_max, int, S_IRUGO);
+module_param(ple_window_max, int, S_IRUGO | S_IWUSR);
 
 extern const ulong vmx_return;
 
@@ -5694,7 +5694,8 @@ static int handle_invalid_guest_state(struct kvm_vcpu *vcpu)
 out:
 	return ret;
 }
-#define make_ple_window_modifier(type, oplt, opge, cmp, bound) \
+
+#define make_ple_window_modifier(type, oplt, opge) \
 static void type##_ple_window(struct kvm_vcpu *vcpu) \
 { \
 	struct vcpu_vmx *vmx = to_vmx(vcpu); \
@@ -5708,13 +5709,13 @@ static void type##_ple_window(struct kvm_vcpu *vcpu) \
 	else \
 		new = old opge ple_window_##type; \
 \
-	vmx->ple_window = cmp(new, bound); \
+	vmx->ple_window = clamp(new, ple_window, ple_window_max); \
 \
 	trace_kvm_ple_window_##type(vcpu->vcpu_id, vmx->ple_window, old); \
 }
 
-make_ple_window_modifier(grow,   *, +, min, ple_window_max)
-make_ple_window_modifier(shrink, /, -, max, ple_window)
+make_ple_window_modifier(grow,   *, +) /* grow_ple_window */
+make_ple_window_modifier(shrink, /, -) /* shrink_ple_window */
 
 /*
  * Indicate a busy-waiting vcpu in spinlock. We do not enable the PAUSE
-- 
2.0.4


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 9/9] KVM: VMX: automatic PLE window maximum
  2014-08-19 20:35 [PATCH 0/9] Dynamic Pause Loop Exiting window Radim Krčmář
                   ` (7 preceding siblings ...)
  2014-08-19 20:35 ` [PATCH 8/9] KVM: VMX: runtime knobs for dynamic PLE window Radim Krčmář
@ 2014-08-19 20:35 ` Radim Krčmář
  2014-08-20  7:16   ` Paolo Bonzini
  2014-08-21  6:48   ` Zhao, Hui-Zhi (Steven, HPservers-Core-OE-PSC)
  9 siblings, 1 reply; 29+ messages in thread
From: Radim Krčmář @ 2014-08-19 20:35 UTC (permalink / raw)
  To: kvm
  Cc: linux-kernel, Paolo Bonzini, Gleb Natapov, Raghavendra KT,
	Vinod Chegu, Hui-Zhi

Every increase of ple_window_grow creates potential overflows.
They are not serious, because we clamp ple_window and userspace is
expected to fix ple_window_max within a second.
---
 arch/x86/kvm/vmx.c | 34 +++++++++++++++++++++++++++++++++-
 1 file changed, 33 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index d7f58e8..6873a0b 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -138,7 +138,9 @@ module_param(ple_window, int, S_IRUGO | S_IWUSR);
 
 /* Default doubles per-vcpu window every exit. */
 static int ple_window_grow = KVM_VMX_DEFAULT_PLE_WINDOW_GROW;
-module_param(ple_window_grow, int, S_IRUGO | S_IWUSR);
+static struct kernel_param_ops ple_window_grow_ops;
+module_param_cb(ple_window_grow, &ple_window_grow_ops,
+                &ple_window_grow, S_IRUGO | S_IWUSR);
 
 /* Default resets per-vcpu window every exit to ple_window. */
 static int ple_window_shrink = KVM_VMX_DEFAULT_PLE_WINDOW_SHRINK;
@@ -5717,6 +5719,36 @@ static void type##_ple_window(struct kvm_vcpu *vcpu) \
 make_ple_window_modifier(grow,   *, +) /* grow_ple_window */
 make_ple_window_modifier(shrink, /, -) /* shrink_ple_window */
 
+static void clamp_ple_window_max(void)
+{
+	int maximum;
+
+	if (ple_window_grow < 1)
+		return;
+
+	if (ple_window_grow < ple_window)
+		maximum = INT_MAX / ple_window_grow;
+	else
+		maximum = INT_MAX - ple_window_grow;
+
+	ple_window_max = clamp(ple_window_max, ple_window, maximum);
+}
+
+static int set_ple_window_grow(const char *arg, const struct kernel_param *kp)
+{
+	int ret;
+
+	clamp_ple_window_max();
+	ret = param_set_int(arg, kp);
+
+	return ret;
+}
+
+static struct kernel_param_ops ple_window_grow_ops = {
+	.set = set_ple_window_grow,
+	.get = param_get_int,
+};
+
 /*
  * Indicate a busy-waiting vcpu in spinlock. We do not enable the PAUSE
  * exiting, so only get here on cpu with PAUSE-Loop-Exiting.
-- 
2.0.4


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 7/9] KVM: VMX: abstract ple_window modifiers
  2014-08-19 20:35 ` [PATCH 7/9] KVM: VMX: abstract ple_window modifiers Radim Krčmář
@ 2014-08-20  7:02   ` Paolo Bonzini
  2014-08-20 12:25     ` Radim Krčmář
  0 siblings, 1 reply; 29+ messages in thread
From: Paolo Bonzini @ 2014-08-20  7:02 UTC (permalink / raw)
  To: Radim Krčmář, kvm
  Cc: linux-kernel, Gleb Natapov, Raghavendra KT, Vinod Chegu, Hui-Zhi

Il 19/08/2014 22:35, Radim Krčmář ha scritto:
> They were almost identical and thus merged with a loathable macro.
> 
> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
> ---
>  This solution is hopefully more acceptable than function pointers.

I think a little amount of duplication is not a problem.

Paolo

>  arch/x86/kvm/vmx.c | 53 +++++++++++++++++++----------------------------------
>  1 file changed, 19 insertions(+), 34 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index a236a9f..c6cfb71 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -5694,42 +5694,27 @@ static int handle_invalid_guest_state(struct kvm_vcpu *vcpu)
>  out:
>  	return ret;
>  }
> -
> -static void grow_ple_window(struct kvm_vcpu *vcpu)
> -{
> -	struct vcpu_vmx *vmx = to_vmx(vcpu);
> -	int old = vmx->ple_window;
> -	int new;
> -
> -	if (ple_window_grow < 1)
> -		new = ple_window;
> -	else if (ple_window_grow < ple_window)
> -		new = old * ple_window_grow;
> -	else
> -		new = old + ple_window_grow;
> -
> -	vmx->ple_window = min(new, ple_window_max);
> -
> -	trace_kvm_ple_window_grow(vcpu->vcpu_id, vmx->ple_window, old);
> +#define make_ple_window_modifier(type, oplt, opge, cmp, bound) \
> +static void type##_ple_window(struct kvm_vcpu *vcpu) \
> +{ \
> +	struct vcpu_vmx *vmx = to_vmx(vcpu); \
> +	int old = vmx->ple_window; \
> +	int new; \
> +\
> +	if (ple_window_##type < 1) \
> +		new = ple_window; \
> +	else if (ple_window_##type < ple_window) \
> +		new = old oplt ple_window_##type; \
> +	else \
> +		new = old opge ple_window_##type; \
> +\
> +	vmx->ple_window = cmp(new, bound); \
> +\
> +	trace_kvm_ple_window_##type(vcpu->vcpu_id, vmx->ple_window, old); \
>  }
>  
> -static void shrink_ple_window(struct kvm_vcpu *vcpu)
> -{
> -	struct vcpu_vmx *vmx = to_vmx(vcpu);
> -	int old = vmx->ple_window;
> -	int new;
> -
> -	if (ple_window_shrink < 1)
> -		new = ple_window;
> -	else if (ple_window_shrink < ple_window)
> -		new = old / ple_window_shrink;
> -	else
> -		new = old - ple_window_shrink;
> -
> -	vmx->ple_window = max(new, ple_window);
> -
> -	trace_kvm_ple_window_shrink(vcpu->vcpu_id, vmx->ple_window, old);
> -}
> +make_ple_window_modifier(grow,   *, +, min, ple_window_max)
> +make_ple_window_modifier(shrink, /, -, max, ple_window)
>  
>  /*
>   * Indicate a busy-waiting vcpu in spinlock. We do not enable the PAUSE
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/9] KVM: VMX: make PLE window per-vcpu
  2014-08-19 20:35 ` [PATCH 3/9] KVM: VMX: make PLE window per-vcpu Radim Krčmář
@ 2014-08-20  7:13   ` Paolo Bonzini
  2014-08-20 12:26     ` Radim Krčmář
  0 siblings, 1 reply; 29+ messages in thread
From: Paolo Bonzini @ 2014-08-20  7:13 UTC (permalink / raw)
  To: Radim Krčmář, kvm
  Cc: linux-kernel, Gleb Natapov, Raghavendra KT, Vinod Chegu, Hui-Zhi

Il 19/08/2014 22:35, Radim Krčmář ha scritto:
> Change PLE window into per-vcpu variable, seeded from module parameter,
> to allow greater flexibility.
> 
> Brings in a small overhead on every vmentry.
> 
> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
> ---
>  I've been thinking about a general hierarchical per-vcpu variable model,
>  but it's hard to have current performance and sane code.
> 
>  arch/x86/kvm/vmx.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 2b306f9..eaa5574 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -484,6 +484,9 @@ struct vcpu_vmx {
>  
>  	/* Support for a guest hypervisor (nested VMX) */
>  	struct nested_vmx nested;
> +
> +	/* Dynamic PLE window. */
> +	int ple_window;
>  };
>  
>  enum segment_cache_field {
> @@ -4403,6 +4406,7 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
>  	if (ple_gap) {
>  		vmcs_write32(PLE_GAP, ple_gap);
>  		vmcs_write32(PLE_WINDOW, ple_window);

Is this necessary?

> +		vmx->ple_window = ple_window;
>  	}
>  
>  	vmcs_write32(PAGE_FAULT_ERROR_CODE_MASK, 0);
> @@ -7387,6 +7391,9 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
>  	if (vmx->emulation_required)
>  		return;
>  
> +	if (ple_gap)
> +		vmcs_write32(PLE_WINDOW, vmx->ple_window);
> +
>  	if (vmx->nested.sync_shadow_vmcs) {
>  		copy_vmcs12_to_shadow(vmx);
>  		vmx->nested.sync_shadow_vmcs = false;
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 9/9] KVM: VMX: automatic PLE window maximum
  2014-08-19 20:35 ` [PATCH 9/9] KVM: VMX: automatic PLE window maximum Radim Krčmář
@ 2014-08-20  7:16   ` Paolo Bonzini
  2014-08-20  7:18     ` Paolo Bonzini
  2014-08-20 12:41     ` Radim Krčmář
  0 siblings, 2 replies; 29+ messages in thread
From: Paolo Bonzini @ 2014-08-20  7:16 UTC (permalink / raw)
  To: Radim Krčmář, kvm
  Cc: linux-kernel, Gleb Natapov, Raghavendra KT, Vinod Chegu, Hui-Zhi

Il 19/08/2014 22:35, Radim Krčmář ha scritto:
> Every increase of ple_window_grow creates potential overflows.
> They are not serious, because we clamp ple_window and userspace is
> expected to fix ple_window_max within a second.
> ---
>  arch/x86/kvm/vmx.c | 34 +++++++++++++++++++++++++++++++++-
>  1 file changed, 33 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index d7f58e8..6873a0b 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -138,7 +138,9 @@ module_param(ple_window, int, S_IRUGO | S_IWUSR);
>  
>  /* Default doubles per-vcpu window every exit. */
>  static int ple_window_grow = KVM_VMX_DEFAULT_PLE_WINDOW_GROW;
> -module_param(ple_window_grow, int, S_IRUGO | S_IWUSR);
> +static struct kernel_param_ops ple_window_grow_ops;
> +module_param_cb(ple_window_grow, &ple_window_grow_ops,
> +                &ple_window_grow, S_IRUGO | S_IWUSR);
>  
>  /* Default resets per-vcpu window every exit to ple_window. */
>  static int ple_window_shrink = KVM_VMX_DEFAULT_PLE_WINDOW_SHRINK;
> @@ -5717,6 +5719,36 @@ static void type##_ple_window(struct kvm_vcpu *vcpu) \
>  make_ple_window_modifier(grow,   *, +) /* grow_ple_window */
>  make_ple_window_modifier(shrink, /, -) /* shrink_ple_window */
>  
> +static void clamp_ple_window_max(void)
> +{
> +	int maximum;
> +
> +	if (ple_window_grow < 1)
> +		return;
> +
> +	if (ple_window_grow < ple_window)
> +		maximum = INT_MAX / ple_window_grow;
> +	else
> +		maximum = INT_MAX - ple_window_grow;
> +
> +	ple_window_max = clamp(ple_window_max, ple_window, maximum);
> +}

I think avoiding overflows is better.  In fact, I think you should call
this function for ple_window_max too.

You could keep the ple_window_max variable to the user-set value.
Whenever ple_window_grow or ple_window_max are changed, you can set an
internal variable (let's call it ple_window_actual_max, but I'm not wed
to this name) to the computed value, and then do:

	if (ple_window_grow < 1 || ple_window_actual_max < ple_window)
		new = ple_window;
	else if (ple_window_grow < ple_window)
		new = max(ple_window_actual_max, old) * ple_window_grow;
	else
		new = max(ple_window_actual_max, old) + ple_window_grow;

(I think the || in the first "if" can be eliminated with some creativity
in clamp_ple_window_max).

Paolo

> +static int set_ple_window_grow(const char *arg, const struct kernel_param *kp)
> +{
> +	int ret;
> +
> +	clamp_ple_window_max();
> +	ret = param_set_int(arg, kp);
> +
> +	return ret;
> +}
> +
> +static struct kernel_param_ops ple_window_grow_ops = {
> +	.set = set_ple_window_grow,
> +	.get = param_get_int,
> +};
> +
>  /*
>   * Indicate a busy-waiting vcpu in spinlock. We do not enable the PAUSE
>   * exiting, so only get here on cpu with PAUSE-Loop-Exiting.
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 5/9] KVM: VMX: clamp PLE window
  2014-08-19 20:35 ` [PATCH 5/9] KVM: VMX: clamp " Radim Krčmář
@ 2014-08-20  7:18   ` Paolo Bonzini
  2014-08-20 12:46     ` Radim Krčmář
  0 siblings, 1 reply; 29+ messages in thread
From: Paolo Bonzini @ 2014-08-20  7:18 UTC (permalink / raw)
  To: Radim Krčmář, kvm
  Cc: linux-kernel, Gleb Natapov, Raghavendra KT, Vinod Chegu, Hui-Zhi

Il 19/08/2014 22:35, Radim Krčmář ha scritto:
> Modifications could get unwanted values of PLE window. (low or negative)
> Use ple_window and the maximal value that cannot overflow as bounds.
> 
> ple_window_max defaults to a very high value, but it would make sense to
> set it to some fraction of the scheduler tick.
> 
> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
> ---
>  arch/x86/kvm/vmx.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 66259fd..e1192fb 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -144,6 +144,10 @@ module_param(ple_window_grow, int, S_IRUGO);
>  static int ple_window_shrink = KVM_VMX_DEFAULT_PLE_WINDOW_SHRINK;
>  module_param(ple_window_shrink, int, S_IRUGO);
>  
> +/* Default is to compute the maximum so we can never overflow. */
> +static int ple_window_max = INT_MAX / KVM_VMX_DEFAULT_PLE_WINDOW_GROW;
> +module_param(ple_window_max, int, S_IRUGO);
> +
>  extern const ulong vmx_return;
>  
>  #define NR_AUTOLOAD_MSRS 8
> @@ -5704,7 +5708,7 @@ static void grow_ple_window(struct kvm_vcpu *vcpu)
>  	else
>  		new = old + ple_window_grow;
>  
> -	vmx->ple_window = new;
> +	vmx->ple_window = min(new, ple_window_max);
>  }

Please introduce a dynamic overflow-avoiding ple_window_max (like what
you have in patch 9) already in patch 4...

>  static void shrink_ple_window(struct kvm_vcpu *vcpu)
> @@ -5720,7 +5724,7 @@ static void shrink_ple_window(struct kvm_vcpu *vcpu)
>  	else
>  		new = old - ple_window_shrink;
>  
> -	vmx->ple_window = new;
> +	vmx->ple_window = max(new, ple_window);

... and also squash this in patch 4.

This patch can then introduce the ple_window_max module parameter (using
module_param_cb to avoid overflows).

Paolo

>  }
>  
>  /*
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 9/9] KVM: VMX: automatic PLE window maximum
  2014-08-20  7:16   ` Paolo Bonzini
@ 2014-08-20  7:18     ` Paolo Bonzini
  2014-08-20 12:41     ` Radim Krčmář
  1 sibling, 0 replies; 29+ messages in thread
From: Paolo Bonzini @ 2014-08-20  7:18 UTC (permalink / raw)
  To: Radim Krčmář, kvm
  Cc: linux-kernel, Gleb Natapov, Raghavendra KT, Vinod Chegu, Hui-Zhi

Il 20/08/2014 09:16, Paolo Bonzini ha scritto:
> Il 19/08/2014 22:35, Radim Krčmář ha scritto:
>> Every increase of ple_window_grow creates potential overflows.
>> They are not serious, because we clamp ple_window and userspace is
>> expected to fix ple_window_max within a second.
>> ---
>>  arch/x86/kvm/vmx.c | 34 +++++++++++++++++++++++++++++++++-
>>  1 file changed, 33 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index d7f58e8..6873a0b 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -138,7 +138,9 @@ module_param(ple_window, int, S_IRUGO | S_IWUSR);
>>  
>>  /* Default doubles per-vcpu window every exit. */
>>  static int ple_window_grow = KVM_VMX_DEFAULT_PLE_WINDOW_GROW;
>> -module_param(ple_window_grow, int, S_IRUGO | S_IWUSR);
>> +static struct kernel_param_ops ple_window_grow_ops;
>> +module_param_cb(ple_window_grow, &ple_window_grow_ops,
>> +                &ple_window_grow, S_IRUGO | S_IWUSR);
>>  
>>  /* Default resets per-vcpu window every exit to ple_window. */
>>  static int ple_window_shrink = KVM_VMX_DEFAULT_PLE_WINDOW_SHRINK;
>> @@ -5717,6 +5719,36 @@ static void type##_ple_window(struct kvm_vcpu *vcpu) \
>>  make_ple_window_modifier(grow,   *, +) /* grow_ple_window */
>>  make_ple_window_modifier(shrink, /, -) /* shrink_ple_window */
>>  
>> +static void clamp_ple_window_max(void)
>> +{
>> +	int maximum;
>> +
>> +	if (ple_window_grow < 1)
>> +		return;
>> +
>> +	if (ple_window_grow < ple_window)
>> +		maximum = INT_MAX / ple_window_grow;
>> +	else
>> +		maximum = INT_MAX - ple_window_grow;
>> +
>> +	ple_window_max = clamp(ple_window_max, ple_window, maximum);
>> +}
> 
> I think avoiding overflows is better.  In fact, I think you should call
> this function for ple_window_max too.
> 
> You could keep the ple_window_max variable to the user-set value.
> Whenever ple_window_grow or ple_window_max are changed, you can set an
> internal variable (let's call it ple_window_actual_max, but I'm not wed
> to this name) to the computed value, and then do:
> 
> 	if (ple_window_grow < 1 || ple_window_actual_max < ple_window)
> 		new = ple_window;
> 	else if (ple_window_grow < ple_window)
> 		new = max(ple_window_actual_max, old) * ple_window_grow;
> 	else
> 		new = max(ple_window_actual_max, old) + ple_window_grow;

Ehm, this should of course be "min".

Paolo

> (I think the || in the first "if" can be eliminated with some creativity
> in clamp_ple_window_max).
> 
> Paolo
> 
>> +static int set_ple_window_grow(const char *arg, const struct kernel_param *kp)
>> +{
>> +	int ret;
>> +
>> +	clamp_ple_window_max();
>> +	ret = param_set_int(arg, kp);
>> +
>> +	return ret;
>> +}
>> +
>> +static struct kernel_param_ops ple_window_grow_ops = {
>> +	.set = set_ple_window_grow,
>> +	.get = param_get_int,
>> +};
>> +
>>  /*
>>   * Indicate a busy-waiting vcpu in spinlock. We do not enable the PAUSE
>>   * exiting, so only get here on cpu with PAUSE-Loop-Exiting.
>>
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/9] KVM: add kvm_arch_sched_in
  2014-08-19 20:35 ` [PATCH 1/9] KVM: add kvm_arch_sched_in Radim Krčmář
@ 2014-08-20  7:47   ` Christian Borntraeger
  2014-08-20 12:56     ` Radim Krčmář
  0 siblings, 1 reply; 29+ messages in thread
From: Christian Borntraeger @ 2014-08-20  7:47 UTC (permalink / raw)
  To: Radim Krčmář, kvm
  Cc: linux-kernel, Paolo Bonzini, Gleb Natapov, Raghavendra KT,
	Vinod Chegu, Hui-Zhi

On 19/08/14 22:35, Radim Krčmář wrote:

> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -3123,6 +3123,8 @@ static void kvm_sched_in(struct preempt_notifier *pn, int cpu)
>  	if (vcpu->preempted)
>  		vcpu->preempted = false;
> 
> +	kvm_arch_sched_in(vcpu, cpu);
> +
>  	kvm_arch_vcpu_load(vcpu, cpu);
>  }
> 

Why cant you reuse kvm_arch_vcpu_load? Its also called on each sched_in and is architecture specific.

Christian


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 7/9] KVM: VMX: abstract ple_window modifiers
  2014-08-20  7:02   ` Paolo Bonzini
@ 2014-08-20 12:25     ` Radim Krčmář
  0 siblings, 0 replies; 29+ messages in thread
From: Radim Krčmář @ 2014-08-20 12:25 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, linux-kernel, Gleb Natapov, Raghavendra KT, Vinod Chegu, Hui-Zhi

2014-08-20 09:02+0200, Paolo Bonzini:
> Il 19/08/2014 22:35, Radim Krčmář ha scritto:
> > They were almost identical and thus merged with a loathable macro.
> > 
> > Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
> > ---
> >  This solution is hopefully more acceptable than function pointers.
> 
> I think a little amount of duplication is not a problem.

Ok, I'll drop this patch from from v2.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/9] KVM: VMX: make PLE window per-vcpu
  2014-08-20  7:13   ` Paolo Bonzini
@ 2014-08-20 12:26     ` Radim Krčmář
  0 siblings, 0 replies; 29+ messages in thread
From: Radim Krčmář @ 2014-08-20 12:26 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, linux-kernel, Gleb Natapov, Raghavendra KT, Vinod Chegu, Hui-Zhi

2014-08-20 09:13+0200, Paolo Bonzini:
> Il 19/08/2014 22:35, Radim Krčmář ha scritto:
> >  enum segment_cache_field {
> > @@ -4403,6 +4406,7 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
> >  	if (ple_gap) {
> >  		vmcs_write32(PLE_GAP, ple_gap);
> >  		vmcs_write32(PLE_WINDOW, ple_window);
> 
> Is this necessary?

V2, thanks.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 9/9] KVM: VMX: automatic PLE window maximum
  2014-08-20  7:16   ` Paolo Bonzini
  2014-08-20  7:18     ` Paolo Bonzini
@ 2014-08-20 12:41     ` Radim Krčmář
  2014-08-20 13:15       ` Paolo Bonzini
  1 sibling, 1 reply; 29+ messages in thread
From: Radim Krčmář @ 2014-08-20 12:41 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, linux-kernel, Gleb Natapov, Raghavendra KT, Vinod Chegu, Hui-Zhi

2014-08-20 09:16+0200, Paolo Bonzini:
> Il 19/08/2014 22:35, Radim Krčmář ha scritto:
> > Every increase of ple_window_grow creates potential overflows.
> > They are not serious, because we clamp ple_window and userspace is
> > expected to fix ple_window_max within a second.
> > ---
> I think avoiding overflows is better.  In fact, I think you should call
> this function for ple_window_max too.

(Ack, I just wanted to avoid the worst userspace error, which is why
 PW_max hasn't changed when PW_grow got smaller and we could overflow.)

> You could keep the ple_window_max variable to the user-set value.
> Whenever ple_window_grow or ple_window_max are changed, you can set an
> internal variable (let's call it ple_window_actual_max, but I'm not wed
> to this name) to the computed value, and then do:
> 
> 	if (ple_window_grow < 1 || ple_window_actual_max < ple_window)
> 		new = ple_window;
> 	else if (ple_window_grow < ple_window)
> 		new = max(ple_window_actual_max, old) * ple_window_grow;
> 	else
> 		new = max(ple_window_actual_max, old) + ple_window_grow;

Oh, I like that this can get rid of all overflows, ple_window_actual_max
(PW_effective_max?) is going to be set to
"ple_window_max [/-] ple_window_grow" in v2.

> (I think the || in the first "if" can be eliminated with some creativity
> in clamp_ple_window_max).

To do it, we'll want to intercept changes to ple_window as well.
(I disliked this patch a lot even before :)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 5/9] KVM: VMX: clamp PLE window
  2014-08-20  7:18   ` Paolo Bonzini
@ 2014-08-20 12:46     ` Radim Krčmář
  0 siblings, 0 replies; 29+ messages in thread
From: Radim Krčmář @ 2014-08-20 12:46 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, linux-kernel, Gleb Natapov, Raghavendra KT, Vinod Chegu, Hui-Zhi

2014-08-20 09:18+0200, Paolo Bonzini:
> Il 19/08/2014 22:35, Radim Krčmář ha scritto:
> > Modifications could get unwanted values of PLE window. (low or negative)
> > Use ple_window and the maximal value that cannot overflow as bounds.
> > 
> > ple_window_max defaults to a very high value, but it would make sense to
> > set it to some fraction of the scheduler tick.
> > 
> > Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
> > ---
> Please introduce a dynamic overflow-avoiding ple_window_max (like what
> you have in patch 9) already in patch 4...
> 
> >  static void shrink_ple_window(struct kvm_vcpu *vcpu)
> > @@ -5720,7 +5724,7 @@ static void shrink_ple_window(struct kvm_vcpu *vcpu)
> >  	else
> >  		new = old - ple_window_shrink;
> >  
> > -	vmx->ple_window = new;
> > +	vmx->ple_window = max(new, ple_window);
> 
> ... and also squash this in patch 4.
> 
> This patch can then introduce the ple_window_max module parameter (using
> module_param_cb to avoid overflows).

Will do.

---
It is going to make the patches slightly harder to review;
Are we doing it because git doesn't bisect on series boundaries?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/9] KVM: add kvm_arch_sched_in
  2014-08-20  7:47   ` Christian Borntraeger
@ 2014-08-20 12:56     ` Radim Krčmář
  0 siblings, 0 replies; 29+ messages in thread
From: Radim Krčmář @ 2014-08-20 12:56 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: kvm, linux-kernel, Paolo Bonzini, Gleb Natapov, Raghavendra KT,
	Vinod Chegu, Hui-Zhi

2014-08-20 09:47+0200, Christian Borntraeger:
> On 19/08/14 22:35, Radim Krčmář wrote:
> 
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -3123,6 +3123,8 @@ static void kvm_sched_in(struct preempt_notifier *pn, int cpu)
> >  	if (vcpu->preempted)
> >  		vcpu->preempted = false;
> > 
> > +	kvm_arch_sched_in(vcpu, cpu);
> > +
> >  	kvm_arch_vcpu_load(vcpu, cpu);
> >  }
> > 
> 
> Why cant you reuse kvm_arch_vcpu_load? Its also called on each sched_in and is architecture specific.

kvm_arch_vcpu_load is also called from kvm_vcpu_ioctl, so we'd be
shrinking unnecessarily.
(sched_in gives us a bit of useful information about the state of the
 system, kvm_vcpu_ioctl not that much.)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 9/9] KVM: VMX: automatic PLE window maximum
  2014-08-20 12:41     ` Radim Krčmář
@ 2014-08-20 13:15       ` Paolo Bonzini
  2014-08-20 15:31         ` Radim Krčmář
  0 siblings, 1 reply; 29+ messages in thread
From: Paolo Bonzini @ 2014-08-20 13:15 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: kvm, linux-kernel, Gleb Natapov, Raghavendra KT, Vinod Chegu, Hui-Zhi

Il 20/08/2014 14:41, Radim Krčmář ha scritto:
>> > 	if (ple_window_grow < 1 || ple_window_actual_max < ple_window)
>> > 		new = ple_window;
>> > 	else if (ple_window_grow < ple_window)
>> > 		new = max(ple_window_actual_max, old) * ple_window_grow;
>> > 	else
>> > 		new = max(ple_window_actual_max, old) + ple_window_grow;
> Oh, I like that this can get rid of all overflows, ple_window_actual_max
> (PW_effective_max?) is going to be set to
> "ple_window_max [/-] ple_window_grow" in v2.
> 
>> > (I think the || in the first "if" can be eliminated with some creativity
>> > in clamp_ple_window_max).
> To do it, we'll want to intercept changes to ple_window as well.
> (I disliked this patch a lot even before :)

What about setting ple_window_actual_max to 0 if ple_window_grow is 0
(instead of just returning)?

Then the "if (ple_window_actual_max < ple_window)" will always fail and
you'll go through "new = ple_window".  But perhaps it's more gross and
worthless than creative. :)

Paolo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 9/9] KVM: VMX: automatic PLE window maximum
  2014-08-20 13:15       ` Paolo Bonzini
@ 2014-08-20 15:31         ` Radim Krčmář
  2014-08-20 15:34           ` Paolo Bonzini
  0 siblings, 1 reply; 29+ messages in thread
From: Radim Krčmář @ 2014-08-20 15:31 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, linux-kernel, Gleb Natapov, Raghavendra KT, Vinod Chegu, Hui-Zhi

2014-08-20 15:15+0200, Paolo Bonzini:
> Il 20/08/2014 14:41, Radim Krčmář ha scritto:
> >> > 	if (ple_window_grow < 1 || ple_window_actual_max < ple_window)
> >> > 		new = ple_window;
> >> > 	else if (ple_window_grow < ple_window)
> >> > 		new = max(ple_window_actual_max, old) * ple_window_grow;
> >> > 	else
> >> > 		new = max(ple_window_actual_max, old) + ple_window_grow;
> > Oh, I like that this can get rid of all overflows, ple_window_actual_max
> > (PW_effective_max?) is going to be set to
> > "ple_window_max [/-] ple_window_grow" in v2.
> > 
> >> > (I think the || in the first "if" can be eliminated with some creativity
> >> > in clamp_ple_window_max).
> > To do it, we'll want to intercept changes to ple_window as well.
> > (I disliked this patch a lot even before :)
> 
> What about setting ple_window_actual_max to 0 if ple_window_grow is 0
> (instead of just returning)?
> 
> Then the "if (ple_window_actual_max < ple_window)" will always fail and
> you'll go through "new = ple_window".  But perhaps it's more gross and
> worthless than creative. :)

That code can't use PW directly, because PW_actual_max needs to be one
PW_grow below PW_max, so I'd rather enforce minimal PW_actual_max.

Btw. without extra code, we are still going to overflow on races when
changing PW_grow, should they be covered as well?

(+ There is a bug in this patch -- clamp_ple_window_max() should be
 after param_set_int() ... damned unreviewed last-second changes.)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 9/9] KVM: VMX: automatic PLE window maximum
  2014-08-20 15:31         ` Radim Krčmář
@ 2014-08-20 15:34           ` Paolo Bonzini
  2014-08-20 16:01             ` Radim Krčmář
  0 siblings, 1 reply; 29+ messages in thread
From: Paolo Bonzini @ 2014-08-20 15:34 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: kvm, linux-kernel, Gleb Natapov, Raghavendra KT, Vinod Chegu, Hui-Zhi

Il 20/08/2014 17:31, Radim Krčmář ha scritto:
> Btw. without extra code, we are still going to overflow on races when
> changing PW_grow, should they be covered as well?

You mean because there is no spinlock or similar protecting the changes?
 I guess you could use a seqlock.

Paolo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 9/9] KVM: VMX: automatic PLE window maximum
  2014-08-20 15:34           ` Paolo Bonzini
@ 2014-08-20 16:01             ` Radim Krčmář
  2014-08-20 16:03               ` Paolo Bonzini
  0 siblings, 1 reply; 29+ messages in thread
From: Radim Krčmář @ 2014-08-20 16:01 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, linux-kernel, Gleb Natapov, Raghavendra KT, Vinod Chegu, Hui-Zhi

2014-08-20 17:34+0200, Paolo Bonzini:
> Il 20/08/2014 17:31, Radim Krčmář ha scritto:
> > Btw. without extra code, we are still going to overflow on races when
> > changing PW_grow, should they be covered as well?
> 
> You mean because there is no spinlock or similar protecting the changes?
>  I guess you could use a seqlock.

Yes, for example between a modification of ple_window
  new = min(old, PW_actual_max) * PW_grow
which gets compiled into something like this:
  1) tmp = min(old, PW_actual_max)
  2) new = tmp * PW_grow
and a write to increase PW_grow
  3) PW_actual_max = min(PW_max / new_PW_grow, PW_actual_max)
  4) PW_grow = new_PW_grow
  5) PW_actual_max = PW_max / new_PW_grow

3 and 4 can exectute between 1 and 2, which could overflow.

I don't think they are important enough to warrant a significant
performance hit of locking.
Or even more checks that would prevent it in a lockless way.

(I'd just see that the result is set to something legal and also drop
 line 3, because it does not help things that much.)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 9/9] KVM: VMX: automatic PLE window maximum
  2014-08-20 16:01             ` Radim Krčmář
@ 2014-08-20 16:03               ` Paolo Bonzini
  2014-08-20 16:26                 ` Radim Krčmář
  0 siblings, 1 reply; 29+ messages in thread
From: Paolo Bonzini @ 2014-08-20 16:03 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: kvm, linux-kernel, Gleb Natapov, Raghavendra KT, Vinod Chegu, Hui-Zhi

Il 20/08/2014 18:01, Radim Krčmář ha scritto:
> 2014-08-20 17:34+0200, Paolo Bonzini:
>> Il 20/08/2014 17:31, Radim Krčmář ha scritto:
>>> Btw. without extra code, we are still going to overflow on races when
>>> changing PW_grow, should they be covered as well?
>>
>> You mean because there is no spinlock or similar protecting the changes?
>>  I guess you could use a seqlock.
> 
> Yes, for example between a modification of ple_window
>   new = min(old, PW_actual_max) * PW_grow
> which gets compiled into something like this:
>   1) tmp = min(old, PW_actual_max)
>   2) new = tmp * PW_grow
> and a write to increase PW_grow
>   3) PW_actual_max = min(PW_max / new_PW_grow, PW_actual_max)
>   4) PW_grow = new_PW_grow
>   5) PW_actual_max = PW_max / new_PW_grow
> 
> 3 and 4 can exectute between 1 and 2, which could overflow.
> 
> I don't think they are important enough to warrant a significant
> performance hit of locking.

A seqlock just costs two memory accesses to the same (shared) cache line
as the PW data, and a non-taken branch.  I don't like code that is
unsafe by design...

Paolo

> Or even more checks that would prevent it in a lockless way.
> 
> (I'd just see that the result is set to something legal and also drop
>  line 3, because it does not help things that much.)
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 9/9] KVM: VMX: automatic PLE window maximum
  2014-08-20 16:03               ` Paolo Bonzini
@ 2014-08-20 16:26                 ` Radim Krčmář
  0 siblings, 0 replies; 29+ messages in thread
From: Radim Krčmář @ 2014-08-20 16:26 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, linux-kernel, Gleb Natapov, Raghavendra KT, Vinod Chegu, Hui-Zhi

2014-08-20 18:03+0200, Paolo Bonzini:
> Il 20/08/2014 18:01, Radim Krčmář ha scritto:
> > 2014-08-20 17:34+0200, Paolo Bonzini:
> >> Il 20/08/2014 17:31, Radim Krčmář ha scritto:
> >>> Btw. without extra code, we are still going to overflow on races when
> >>> changing PW_grow, should they be covered as well?
> >>
> >> You mean because there is no spinlock or similar protecting the changes?
> >>  I guess you could use a seqlock.
> > 
> > Yes, for example between a modification of ple_window
> >   new = min(old, PW_actual_max) * PW_grow
> > which gets compiled into something like this:
> >   1) tmp = min(old, PW_actual_max)
> >   2) new = tmp * PW_grow
> > and a write to increase PW_grow
> >   3) PW_actual_max = min(PW_max / new_PW_grow, PW_actual_max)
> >   4) PW_grow = new_PW_grow
> >   5) PW_actual_max = PW_max / new_PW_grow
> > 
> > 3 and 4 can exectute between 1 and 2, which could overflow.
> > 
> > I don't think they are important enough to warrant a significant
> > performance hit of locking.
> 
> A seqlock just costs two memory accesses to the same (shared) cache line
> as the PW data, and a non-taken branch.

Oh, seqlock readers do not have to write to shared memory, so it is
acceptable ...

>                                         I don't like code that is
> unsafe by design...

I wouldn't say it is unsafe, because VCPU's PW is always greater than
module's PW. We are just going to PLE exit sooner than expected.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH 0/9] Dynamic Pause Loop Exiting window.
  2014-08-19 20:35 [PATCH 0/9] Dynamic Pause Loop Exiting window Radim Krčmář
@ 2014-08-21  6:48   ` Zhao, Hui-Zhi (Steven, HPservers-Core-OE-PSC)
  2014-08-19 20:35 ` [PATCH 2/9] KVM: x86: introduce sched_in to kvm_x86_ops Radim Krčmář
                     ` (8 subsequent siblings)
  9 siblings, 0 replies; 29+ messages in thread
From: Zhao, Hui-Zhi (Steven, HPservers-Core-OE-PSC) @ 2014-08-21  6:48 UTC (permalink / raw)
  To: Radim Krčmář,
	kvm, linux-kernel, Paolo Bonzini, Gleb Natapov, Raghavendra KT,
	Mitchell, Lisa (MCLinux in Fort Collins)
  Cc: Vinod, Chegu

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 3197 bytes --]

This patch have been tested by Lisa and me and it's success.

We created 4 VM guests and reboot them every 10 minutes for 12 hours around, and this issue is gone with the patch.


Please add Lisa and me to the "tested by:" list.
Tested-by: Mitchell, Lisa <lisa.mitchell@hp.com>
Tested-by: Zhao, Hui Zhi <hui-zhi.zhao@hp.com>


Regards,
Steven Zhao

-----Original Message-----
From: Radim Krčmář [mailto:rkrcmar@redhat.com] 
Sent: Wednesday, August 20, 2014 4:35 AM
To: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org; Paolo Bonzini; Gleb Natapov; Raghavendra KT; Vinod, Chegu; Zhao, Hui-Zhi (Steven, HPservers-Core-OE-PSC)
Subject: [PATCH 0/9] Dynamic Pause Loop Exiting window.

PLE does not scale in its current form.  When increasing VCPU count above 150, one can hit soft lockups because of runqueue lock contention.
(Which says a lot about performance.)

The main reason is that kvm_ple_loop cycles through all VCPUs.
Replacing it with a scalable solution would be ideal, but it has already been well optimized for various workloads, so this series tries to alleviate one different major problem while minimizing a chance of
regressions: we have too many useless PLE exits.

Just increasing PLE window would help some cases, but it still spirals out of control.  By increasing the window after every PLE exit, we can limit the amount of useless ones, so we don't reach the state where CPUs spend 99% of the time waiting for a lock.

HP confirmed that this series avoids soft lockups and TSC sync errors on large guests.

---
Design notes and questions:

Alternative to first two patches could be a new notifier.

All values are made changeable because defaults weren't selected after weeks of benchmarking -- we can get improved performance by hardcoding if someone is willing to do it.
(Or by presuming that noone is ever going to.)

Then, we can quite safely drop overflow checks: they are impossible to hit with small increases and I don't think that anyone wants large ones.

Also, I'd argue against the last patch: it should be done in userspace, but I'm not sure about Linux's policy.


Radim Krčmář (9):
  KVM: add kvm_arch_sched_in
  KVM: x86: introduce sched_in to kvm_x86_ops
  KVM: VMX: make PLE window per-vcpu
  KVM: VMX: dynamise PLE window
  KVM: VMX: clamp PLE window
  KVM: trace kvm_ple_window grow/shrink
  KVM: VMX: abstract ple_window modifiers
  KVM: VMX: runtime knobs for dynamic PLE window
  KVM: VMX: automatic PLE window maximum

 arch/arm/kvm/arm.c              |  4 ++
 arch/mips/kvm/mips.c            |  4 ++
 arch/powerpc/kvm/powerpc.c      |  4 ++
 arch/s390/kvm/kvm-s390.c        |  4 ++
 arch/x86/include/asm/kvm_host.h |  2 +
 arch/x86/kvm/svm.c              |  6 +++
 arch/x86/kvm/trace.h            | 29 +++++++++++++
 arch/x86/kvm/vmx.c              | 93 +++++++++++++++++++++++++++++++++++++++--
 arch/x86/kvm/x86.c              |  6 +++
 include/linux/kvm_host.h        |  2 +
 virt/kvm/kvm_main.c             |  2 +
 11 files changed, 153 insertions(+), 3 deletions(-)

--
2.0.4

ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH 0/9] Dynamic Pause Loop Exiting window.
@ 2014-08-21  6:48   ` Zhao, Hui-Zhi (Steven, HPservers-Core-OE-PSC)
  0 siblings, 0 replies; 29+ messages in thread
From: Zhao, Hui-Zhi (Steven, HPservers-Core-OE-PSC) @ 2014-08-21  6:48 UTC (permalink / raw)
  To: Radim Krčmář,
	kvm, linux-kernel, Paolo Bonzini, Gleb Natapov, Raghavendra KT,
	Mitchell, Lisa (MCLinux in Fort Collins)
  Cc: Vinod, Chegu

This patch have been tested by Lisa and me and it's success.

We created 4 VM guests and reboot them every 10 minutes for 12 hours around, and this issue is gone with the patch.


Please add Lisa and me to the "tested by:" list.
Tested-by: Mitchell, Lisa <lisa.mitchell@hp.com>
Tested-by: Zhao, Hui Zhi <hui-zhi.zhao@hp.com>


Regards,
Steven Zhao

-----Original Message-----
From: Radim Krčmář [mailto:rkrcmar@redhat.com] 
Sent: Wednesday, August 20, 2014 4:35 AM
To: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org; Paolo Bonzini; Gleb Natapov; Raghavendra KT; Vinod, Chegu; Zhao, Hui-Zhi (Steven, HPservers-Core-OE-PSC)
Subject: [PATCH 0/9] Dynamic Pause Loop Exiting window.

PLE does not scale in its current form.  When increasing VCPU count above 150, one can hit soft lockups because of runqueue lock contention.
(Which says a lot about performance.)

The main reason is that kvm_ple_loop cycles through all VCPUs.
Replacing it with a scalable solution would be ideal, but it has already been well optimized for various workloads, so this series tries to alleviate one different major problem while minimizing a chance of
regressions: we have too many useless PLE exits.

Just increasing PLE window would help some cases, but it still spirals out of control.  By increasing the window after every PLE exit, we can limit the amount of useless ones, so we don't reach the state where CPUs spend 99% of the time waiting for a lock.

HP confirmed that this series avoids soft lockups and TSC sync errors on large guests.

---
Design notes and questions:

Alternative to first two patches could be a new notifier.

All values are made changeable because defaults weren't selected after weeks of benchmarking -- we can get improved performance by hardcoding if someone is willing to do it.
(Or by presuming that noone is ever going to.)

Then, we can quite safely drop overflow checks: they are impossible to hit with small increases and I don't think that anyone wants large ones.

Also, I'd argue against the last patch: it should be done in userspace, but I'm not sure about Linux's policy.


Radim Krčmář (9):
  KVM: add kvm_arch_sched_in
  KVM: x86: introduce sched_in to kvm_x86_ops
  KVM: VMX: make PLE window per-vcpu
  KVM: VMX: dynamise PLE window
  KVM: VMX: clamp PLE window
  KVM: trace kvm_ple_window grow/shrink
  KVM: VMX: abstract ple_window modifiers
  KVM: VMX: runtime knobs for dynamic PLE window
  KVM: VMX: automatic PLE window maximum

 arch/arm/kvm/arm.c              |  4 ++
 arch/mips/kvm/mips.c            |  4 ++
 arch/powerpc/kvm/powerpc.c      |  4 ++
 arch/s390/kvm/kvm-s390.c        |  4 ++
 arch/x86/include/asm/kvm_host.h |  2 +
 arch/x86/kvm/svm.c              |  6 +++
 arch/x86/kvm/trace.h            | 29 +++++++++++++
 arch/x86/kvm/vmx.c              | 93 +++++++++++++++++++++++++++++++++++++++--
 arch/x86/kvm/x86.c              |  6 +++
 include/linux/kvm_host.h        |  2 +
 virt/kvm/kvm_main.c             |  2 +
 11 files changed, 153 insertions(+), 3 deletions(-)

--
2.0.4


^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2014-08-21  6:50 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-19 20:35 [PATCH 0/9] Dynamic Pause Loop Exiting window Radim Krčmář
2014-08-19 20:35 ` [PATCH 1/9] KVM: add kvm_arch_sched_in Radim Krčmář
2014-08-20  7:47   ` Christian Borntraeger
2014-08-20 12:56     ` Radim Krčmář
2014-08-19 20:35 ` [PATCH 2/9] KVM: x86: introduce sched_in to kvm_x86_ops Radim Krčmář
2014-08-19 20:35 ` [PATCH 3/9] KVM: VMX: make PLE window per-vcpu Radim Krčmář
2014-08-20  7:13   ` Paolo Bonzini
2014-08-20 12:26     ` Radim Krčmář
2014-08-19 20:35 ` [PATCH 4/9] KVM: VMX: dynamise PLE window Radim Krčmář
2014-08-19 20:35 ` [PATCH 5/9] KVM: VMX: clamp " Radim Krčmář
2014-08-20  7:18   ` Paolo Bonzini
2014-08-20 12:46     ` Radim Krčmář
2014-08-19 20:35 ` [PATCH 6/9] KVM: trace kvm_ple_window grow/shrink Radim Krčmář
2014-08-19 20:35 ` [PATCH 7/9] KVM: VMX: abstract ple_window modifiers Radim Krčmář
2014-08-20  7:02   ` Paolo Bonzini
2014-08-20 12:25     ` Radim Krčmář
2014-08-19 20:35 ` [PATCH 8/9] KVM: VMX: runtime knobs for dynamic PLE window Radim Krčmář
2014-08-19 20:35 ` [PATCH 9/9] KVM: VMX: automatic PLE window maximum Radim Krčmář
2014-08-20  7:16   ` Paolo Bonzini
2014-08-20  7:18     ` Paolo Bonzini
2014-08-20 12:41     ` Radim Krčmář
2014-08-20 13:15       ` Paolo Bonzini
2014-08-20 15:31         ` Radim Krčmář
2014-08-20 15:34           ` Paolo Bonzini
2014-08-20 16:01             ` Radim Krčmář
2014-08-20 16:03               ` Paolo Bonzini
2014-08-20 16:26                 ` Radim Krčmář
2014-08-21  6:48 ` [PATCH 0/9] Dynamic Pause Loop Exiting window Zhao, Hui-Zhi (Steven, HPservers-Core-OE-PSC)
2014-08-21  6:48   ` Zhao, Hui-Zhi (Steven, HPservers-Core-OE-PSC)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.