All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/5] KVM: LAPIC: Optimize timer latency further
@ 2019-05-20  8:18 Wanpeng Li
  2019-05-20  8:18 ` [PATCH v4 1/5] KVM: LAPIC: Extract adaptive tune timer advancement logic Wanpeng Li
                   ` (6 more replies)
  0 siblings, 7 replies; 14+ messages in thread
From: Wanpeng Li @ 2019-05-20  8:18 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: Paolo Bonzini, Radim Krčmář

Advance lapic timer tries to hidden the hypervisor overhead between the 
host emulated timer fires and the guest awares the timer is fired. However, 
it just hidden the time between apic_timer_fn/handle_preemption_timer -> 
wait_lapic_expire, instead of the real position of vmentry which is 
mentioned in the orignial commit d0659d946be0 ("KVM: x86: add option to 
advance tscdeadline hrtimer expiration"). There is 700+ cpu cycles between 
the end of wait_lapic_expire and before world switch on my haswell desktop.

This patchset tries to narrow the last gap(wait_lapic_expire -> world switch), 
it takes the real overhead time between apic_timer_fn/handle_preemption_timer
and before world switch into consideration when adaptively tuning timer 
advancement. The patchset can reduce 40% latency (~1600+ cycles to ~1000+ 
cycles on a haswell desktop) for kvm-unit-tests/tscdeadline_latency when 
testing busy waits.

v3 -> v4:
 * create timer_advance_ns debugfs entry iff lapic_in_kernel() 
 * keep if (guest_tsc < tsc_deadline) before the call to __wait_lapic_expire()

v2 -> v3:
 * expose 'kvm_timer.timer_advance_ns' to userspace
 * move the tracepoint below guest_exit_irqoff()
 * move wait_lapic_expire() before flushing the L1

v1 -> v2:
 * fix indent in patch 1/4
 * remove the wait_lapic_expire() tracepoint and expose by debugfs
 * move the call to wait_lapic_expire() into vmx.c and svm.c

Wanpeng Li (5):
  KVM: LAPIC: Extract adaptive tune timer advancement logic
  KVM: LAPIC: Fix lapic_timer_advance_ns parameter overflow
  KVM: LAPIC: Expose per-vCPU timer_advance_ns to userspace
  KVM: LAPIC: Delay trace advance expire delta
  KVM: LAPIC: Optimize timer latency further

 arch/x86/kvm/debugfs.c | 18 +++++++++++++++
 arch/x86/kvm/lapic.c   | 60 +++++++++++++++++++++++++++++---------------------
 arch/x86/kvm/lapic.h   |  3 ++-
 arch/x86/kvm/svm.c     |  4 ++++
 arch/x86/kvm/vmx/vmx.c |  4 ++++
 arch/x86/kvm/x86.c     |  9 ++++----
 6 files changed, 68 insertions(+), 30 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v4 1/5] KVM: LAPIC: Extract adaptive tune timer advancement logic
  2019-05-20  8:18 [PATCH v4 0/5] KVM: LAPIC: Optimize timer latency further Wanpeng Li
@ 2019-05-20  8:18 ` Wanpeng Li
  2019-05-20  8:18 ` [PATCH v4 2/5] KVM: LAPIC: Fix lapic_timer_advance_ns parameter overflow Wanpeng Li
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Wanpeng Li @ 2019-05-20  8:18 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Radim Krčmář,
	Sean Christopherson, Liran Alon

From: Wanpeng Li <wanpengli@tencent.com>

Extract adaptive tune timer advancement logic to a single function.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
 arch/x86/kvm/lapic.c | 57 ++++++++++++++++++++++++++++++----------------------
 1 file changed, 33 insertions(+), 24 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 4924f83..89ac82d 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1501,11 +1501,40 @@ static inline void __wait_lapic_expire(struct kvm_vcpu *vcpu, u64 guest_cycles)
 	}
 }
 
-void wait_lapic_expire(struct kvm_vcpu *vcpu)
+static inline void adaptive_tune_timer_advancement(struct kvm_vcpu *vcpu,
+				u64 guest_tsc, u64 tsc_deadline)
 {
 	struct kvm_lapic *apic = vcpu->arch.apic;
 	u32 timer_advance_ns = apic->lapic_timer.timer_advance_ns;
-	u64 guest_tsc, tsc_deadline, ns;
+	u64 ns;
+
+	/* too early */
+	if (guest_tsc < tsc_deadline) {
+		ns = (tsc_deadline - guest_tsc) * 1000000ULL;
+		do_div(ns, vcpu->arch.virtual_tsc_khz);
+		timer_advance_ns -= min((u32)ns,
+			timer_advance_ns / LAPIC_TIMER_ADVANCE_ADJUST_STEP);
+	} else {
+	/* too late */
+		ns = (guest_tsc - tsc_deadline) * 1000000ULL;
+		do_div(ns, vcpu->arch.virtual_tsc_khz);
+		timer_advance_ns += min((u32)ns,
+			timer_advance_ns / LAPIC_TIMER_ADVANCE_ADJUST_STEP);
+	}
+
+	if (abs(guest_tsc - tsc_deadline) < LAPIC_TIMER_ADVANCE_ADJUST_DONE)
+		apic->lapic_timer.timer_advance_adjust_done = true;
+	if (unlikely(timer_advance_ns > 5000)) {
+		timer_advance_ns = 0;
+		apic->lapic_timer.timer_advance_adjust_done = true;
+	}
+	apic->lapic_timer.timer_advance_ns = timer_advance_ns;
+}
+
+void wait_lapic_expire(struct kvm_vcpu *vcpu)
+{
+	struct kvm_lapic *apic = vcpu->arch.apic;
+	u64 guest_tsc, tsc_deadline;
 
 	if (apic->lapic_timer.expired_tscdeadline == 0)
 		return;
@@ -1521,28 +1550,8 @@ void wait_lapic_expire(struct kvm_vcpu *vcpu)
 	if (guest_tsc < tsc_deadline)
 		__wait_lapic_expire(vcpu, tsc_deadline - guest_tsc);
 
-	if (!apic->lapic_timer.timer_advance_adjust_done) {
-		/* too early */
-		if (guest_tsc < tsc_deadline) {
-			ns = (tsc_deadline - guest_tsc) * 1000000ULL;
-			do_div(ns, vcpu->arch.virtual_tsc_khz);
-			timer_advance_ns -= min((u32)ns,
-				timer_advance_ns / LAPIC_TIMER_ADVANCE_ADJUST_STEP);
-		} else {
-		/* too late */
-			ns = (guest_tsc - tsc_deadline) * 1000000ULL;
-			do_div(ns, vcpu->arch.virtual_tsc_khz);
-			timer_advance_ns += min((u32)ns,
-				timer_advance_ns / LAPIC_TIMER_ADVANCE_ADJUST_STEP);
-		}
-		if (abs(guest_tsc - tsc_deadline) < LAPIC_TIMER_ADVANCE_ADJUST_DONE)
-			apic->lapic_timer.timer_advance_adjust_done = true;
-		if (unlikely(timer_advance_ns > 5000)) {
-			timer_advance_ns = 0;
-			apic->lapic_timer.timer_advance_adjust_done = true;
-		}
-		apic->lapic_timer.timer_advance_ns = timer_advance_ns;
-	}
+	if (unlikely(!apic->lapic_timer.timer_advance_adjust_done))
+		adaptive_tune_timer_advancement(vcpu, guest_tsc, tsc_deadline);
 }
 
 static void start_sw_tscdeadline(struct kvm_lapic *apic)
-- 
2.7.4


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v4 2/5] KVM: LAPIC: Fix lapic_timer_advance_ns parameter overflow
  2019-05-20  8:18 [PATCH v4 0/5] KVM: LAPIC: Optimize timer latency further Wanpeng Li
  2019-05-20  8:18 ` [PATCH v4 1/5] KVM: LAPIC: Extract adaptive tune timer advancement logic Wanpeng Li
@ 2019-05-20  8:18 ` Wanpeng Li
  2019-05-20  8:18 ` [PATCH v4 3/5] KVM: LAPIC: Expose per-vCPU timer_advance_ns to userspace Wanpeng Li
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Wanpeng Li @ 2019-05-20  8:18 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Radim Krčmář,
	Sean Christopherson, Liran Alon

From: Wanpeng Li <wanpengli@tencent.com>

After commit c3941d9e0 (KVM: lapic: Allow user to disable adaptive tuning of
timer advancement), '-1' enables adaptive tuning starting from default
advancment of 1000ns. However, we should expose an int instead of an overflow
uint module parameter.

Before patch:

/sys/module/kvm/parameters/lapic_timer_advance_ns:4294967295

After patch:

/sys/module/kvm/parameters/lapic_timer_advance_ns:-1

Fixes: c3941d9e0 (KVM: lapic: Allow user to disable adaptive tuning of timer advancement)
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
 arch/x86/kvm/x86.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 784c953..3eb77e7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -143,7 +143,7 @@ module_param(tsc_tolerance_ppm, uint, S_IRUGO | S_IWUSR);
  * tuning, i.e. allows priveleged userspace to set an exact advancement time.
  */
 static int __read_mostly lapic_timer_advance_ns = -1;
-module_param(lapic_timer_advance_ns, uint, S_IRUGO | S_IWUSR);
+module_param(lapic_timer_advance_ns, int, S_IRUGO | S_IWUSR);
 
 static bool __read_mostly vector_hashing = true;
 module_param(vector_hashing, bool, S_IRUGO);
-- 
2.7.4


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v4 3/5] KVM: LAPIC: Expose per-vCPU timer_advance_ns to userspace
  2019-05-20  8:18 [PATCH v4 0/5] KVM: LAPIC: Optimize timer latency further Wanpeng Li
  2019-05-20  8:18 ` [PATCH v4 1/5] KVM: LAPIC: Extract adaptive tune timer advancement logic Wanpeng Li
  2019-05-20  8:18 ` [PATCH v4 2/5] KVM: LAPIC: Fix lapic_timer_advance_ns parameter overflow Wanpeng Li
@ 2019-05-20  8:18 ` Wanpeng Li
  2019-05-20  8:18 ` [PATCH v4 4/5] KVM: LAPIC: Delay trace advance expire delta Wanpeng Li
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Wanpeng Li @ 2019-05-20  8:18 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Radim Krčmář,
	Sean Christopherson, Liran Alon

From: Wanpeng Li <wanpengli@tencent.com>

Expose per-vCPU timer_advance_ns to userspace, so it is able to 
query the auto-adjusted value.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
 arch/x86/kvm/debugfs.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/arch/x86/kvm/debugfs.c b/arch/x86/kvm/debugfs.c
index c19c7ed..a2f3432 100644
--- a/arch/x86/kvm/debugfs.c
+++ b/arch/x86/kvm/debugfs.c
@@ -9,12 +9,22 @@
  */
 #include <linux/kvm_host.h>
 #include <linux/debugfs.h>
+#include "lapic.h"
 
 bool kvm_arch_has_vcpu_debugfs(void)
 {
 	return true;
 }
 
+static int vcpu_get_timer_advance_ns(void *data, u64 *val)
+{
+	struct kvm_vcpu *vcpu = (struct kvm_vcpu *) data;
+	*val = vcpu->arch.apic->lapic_timer.timer_advance_ns;
+	return 0;
+}
+
+DEFINE_SIMPLE_ATTRIBUTE(vcpu_timer_advance_ns_fops, vcpu_get_timer_advance_ns, NULL, "%llu\n");
+
 static int vcpu_get_tsc_offset(void *data, u64 *val)
 {
 	struct kvm_vcpu *vcpu = (struct kvm_vcpu *) data;
@@ -51,6 +61,14 @@ int kvm_arch_create_vcpu_debugfs(struct kvm_vcpu *vcpu)
 	if (!ret)
 		return -ENOMEM;
 
+	if (lapic_in_kernel(vcpu)) {
+		ret = debugfs_create_file("lapic_timer_advance_ns", 0444,
+								vcpu->debugfs_dentry,
+								vcpu, &vcpu_timer_advance_ns_fops);
+		if (!ret)
+			return -ENOMEM;
+	}
+
 	if (kvm_has_tsc_control) {
 		ret = debugfs_create_file("tsc-scaling-ratio", 0444,
 							vcpu->debugfs_dentry,
-- 
2.7.4


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v4 4/5] KVM: LAPIC: Delay trace advance expire delta
  2019-05-20  8:18 [PATCH v4 0/5] KVM: LAPIC: Optimize timer latency further Wanpeng Li
                   ` (2 preceding siblings ...)
  2019-05-20  8:18 ` [PATCH v4 3/5] KVM: LAPIC: Expose per-vCPU timer_advance_ns to userspace Wanpeng Li
@ 2019-05-20  8:18 ` Wanpeng Li
  2019-05-20 11:14   ` Paolo Bonzini
  2019-05-20  8:18 ` [PATCH v4 5/5] KVM: LAPIC: Optimize timer latency further Wanpeng Li
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: Wanpeng Li @ 2019-05-20  8:18 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Radim Krčmář,
	Sean Christopherson, Liran Alon

From: Wanpeng Li <wanpengli@tencent.com>

wait_lapic_expire() call was moved above guest_enter_irqoff() because of 
its tracepoint, which violated the RCU extended quiescent state invoked 
by guest_enter_irqoff()[1][2]. This patch simply moves the tracepoint 
below guest_exit_irqoff() in vcpu_enter_guest(). Snapshot the delta before 
VM-Enter, but trace it after VM-Exit. This can help us to move 
wait_lapic_expire() just before vmentry in the later patch.

[1] Commit 8b89fe1f6c43 ("kvm: x86: move tracepoints outside extended quiescent state")
[2] https://patchwork.kernel.org/patch/7821111/

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
 arch/x86/kvm/lapic.c | 14 +++++++-------
 arch/x86/kvm/lapic.h |  1 +
 arch/x86/kvm/x86.c   |  4 ++++
 3 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 89ac82d..6652928 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1502,27 +1502,27 @@ static inline void __wait_lapic_expire(struct kvm_vcpu *vcpu, u64 guest_cycles)
 }
 
 static inline void adaptive_tune_timer_advancement(struct kvm_vcpu *vcpu,
-				u64 guest_tsc, u64 tsc_deadline)
+				s64 advance_expire_delta)
 {
 	struct kvm_lapic *apic = vcpu->arch.apic;
 	u32 timer_advance_ns = apic->lapic_timer.timer_advance_ns;
 	u64 ns;
 
 	/* too early */
-	if (guest_tsc < tsc_deadline) {
-		ns = (tsc_deadline - guest_tsc) * 1000000ULL;
+	if (advance_expire_delta < 0) {
+		ns = -advance_expire_delta * 1000000ULL;
 		do_div(ns, vcpu->arch.virtual_tsc_khz);
 		timer_advance_ns -= min((u32)ns,
 			timer_advance_ns / LAPIC_TIMER_ADVANCE_ADJUST_STEP);
 	} else {
 	/* too late */
-		ns = (guest_tsc - tsc_deadline) * 1000000ULL;
+		ns = advance_expire_delta * 1000000ULL;
 		do_div(ns, vcpu->arch.virtual_tsc_khz);
 		timer_advance_ns += min((u32)ns,
 			timer_advance_ns / LAPIC_TIMER_ADVANCE_ADJUST_STEP);
 	}
 
-	if (abs(guest_tsc - tsc_deadline) < LAPIC_TIMER_ADVANCE_ADJUST_DONE)
+	if (abs(advance_expire_delta) < LAPIC_TIMER_ADVANCE_ADJUST_DONE)
 		apic->lapic_timer.timer_advance_adjust_done = true;
 	if (unlikely(timer_advance_ns > 5000)) {
 		timer_advance_ns = 0;
@@ -1545,13 +1545,13 @@ void wait_lapic_expire(struct kvm_vcpu *vcpu)
 	tsc_deadline = apic->lapic_timer.expired_tscdeadline;
 	apic->lapic_timer.expired_tscdeadline = 0;
 	guest_tsc = kvm_read_l1_tsc(vcpu, rdtsc());
-	trace_kvm_wait_lapic_expire(vcpu->vcpu_id, guest_tsc - tsc_deadline);
+	apic->lapic_timer.advance_expire_delta = guest_tsc - tsc_deadline;
 
 	if (guest_tsc < tsc_deadline)
 		__wait_lapic_expire(vcpu, tsc_deadline - guest_tsc);
 
 	if (unlikely(!apic->lapic_timer.timer_advance_adjust_done))
-		adaptive_tune_timer_advancement(vcpu, guest_tsc, tsc_deadline);
+		adaptive_tune_timer_advancement(vcpu, apic->lapic_timer.advance_expire_delta);
 }
 
 static void start_sw_tscdeadline(struct kvm_lapic *apic)
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index d6d049b..3e72a25 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -32,6 +32,7 @@ struct kvm_timer {
 	u64 tscdeadline;
 	u64 expired_tscdeadline;
 	u32 timer_advance_ns;
+	s64 advance_expire_delta;
 	atomic_t pending;			/* accumulated triggered timers */
 	bool hv_timer_in_use;
 	bool timer_advance_adjust_done;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3eb77e7..ac7353f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8017,6 +8017,10 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 	++vcpu->stat.exits;
 
 	guest_exit_irqoff();
+	if (lapic_in_kernel(vcpu) &&
+	    vcpu->arch.apic->lapic_timer.timer_advance_ns)
+		trace_kvm_wait_lapic_expire(vcpu->vcpu_id,
+			vcpu->arch.apic->lapic_timer.advance_expire_delta);
 
 	local_irq_enable();
 	preempt_enable();
-- 
2.7.4


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v4 5/5] KVM: LAPIC: Optimize timer latency further
  2019-05-20  8:18 [PATCH v4 0/5] KVM: LAPIC: Optimize timer latency further Wanpeng Li
                   ` (3 preceding siblings ...)
  2019-05-20  8:18 ` [PATCH v4 4/5] KVM: LAPIC: Delay trace advance expire delta Wanpeng Li
@ 2019-05-20  8:18 ` Wanpeng Li
  2019-05-20 11:16 ` [PATCH v4 0/5] " Paolo Bonzini
  2019-05-22  8:51 ` Wanpeng Li
  6 siblings, 0 replies; 14+ messages in thread
From: Wanpeng Li @ 2019-05-20  8:18 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Radim Krčmář,
	Sean Christopherson, Liran Alon

From: Wanpeng Li <wanpengli@tencent.com>

Advance lapic timer tries to hidden the hypervisor overhead between the 
host emulated timer fires and the guest awares the timer is fired. However, 
it just hidden the time between apic_timer_fn/handle_preemption_timer -> 
wait_lapic_expire, instead of the real position of vmentry which is 
mentioned in the orignial commit d0659d946be0 ("KVM: x86: add option to 
advance tscdeadline hrtimer expiration"). There is 700+ cpu cycles between 
the end of wait_lapic_expire and before world switch on my haswell desktop.

This patch tries to narrow the last gap(wait_lapic_expire -> world switch), 
it takes the real overhead time between apic_timer_fn/handle_preemption_timer
and before world switch into consideration when adaptively tuning timer 
advancement. The patch can reduce 40% latency (~1600+ cycles to ~1000+ cycles 
on a haswell desktop) for kvm-unit-tests/tscdeadline_latency when testing 
busy waits.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
 arch/x86/kvm/lapic.c   | 3 ++-
 arch/x86/kvm/lapic.h   | 2 +-
 arch/x86/kvm/svm.c     | 4 ++++
 arch/x86/kvm/vmx/vmx.c | 4 ++++
 arch/x86/kvm/x86.c     | 3 ---
 5 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 6652928..db75ac5 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1531,7 +1531,7 @@ static inline void adaptive_tune_timer_advancement(struct kvm_vcpu *vcpu,
 	apic->lapic_timer.timer_advance_ns = timer_advance_ns;
 }
 
-void wait_lapic_expire(struct kvm_vcpu *vcpu)
+void kvm_wait_lapic_expire(struct kvm_vcpu *vcpu)
 {
 	struct kvm_lapic *apic = vcpu->arch.apic;
 	u64 guest_tsc, tsc_deadline;
@@ -1553,6 +1553,7 @@ void wait_lapic_expire(struct kvm_vcpu *vcpu)
 	if (unlikely(!apic->lapic_timer.timer_advance_adjust_done))
 		adaptive_tune_timer_advancement(vcpu, apic->lapic_timer.advance_expire_delta);
 }
+EXPORT_SYMBOL_GPL(kvm_wait_lapic_expire);
 
 static void start_sw_tscdeadline(struct kvm_lapic *apic)
 {
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 3e72a25..f974a3d 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -220,7 +220,7 @@ static inline int kvm_lapic_latched_init(struct kvm_vcpu *vcpu)
 
 bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector);
 
-void wait_lapic_expire(struct kvm_vcpu *vcpu);
+void kvm_wait_lapic_expire(struct kvm_vcpu *vcpu);
 
 bool kvm_intr_is_single_vcpu_fast(struct kvm *kvm, struct kvm_lapic_irq *irq,
 			struct kvm_vcpu **dest_vcpu);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index a849dcb..e68c1b9 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -5632,6 +5632,10 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
 	clgi();
 	kvm_load_guest_xcr0(vcpu);
 
+	if (lapic_in_kernel(vcpu) &&
+		vcpu->arch.apic->lapic_timer.timer_advance_ns)
+		kvm_wait_lapic_expire(vcpu);
+
 	/*
 	 * If this vCPU has touched SPEC_CTRL, restore the guest's value if
 	 * it's non-zero. Since vmentry is serialising on affected CPUs, there
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 16d2a3e..57d0e57 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6433,6 +6433,10 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
 
 	vmx_update_hv_timer(vcpu);
 
+	if (lapic_in_kernel(vcpu) &&
+		vcpu->arch.apic->lapic_timer.timer_advance_ns)
+		kvm_wait_lapic_expire(vcpu);
+
 	/*
 	 * If this vCPU has touched SPEC_CTRL, restore the guest's value if
 	 * it's non-zero. Since vmentry is serialising on affected CPUs, there
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ac7353f..e6f05f4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7959,9 +7959,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 	}
 
 	trace_kvm_entry(vcpu->vcpu_id);
-	if (lapic_in_kernel(vcpu) &&
-	    vcpu->arch.apic->lapic_timer.timer_advance_ns)
-		wait_lapic_expire(vcpu);
 	guest_enter_irqoff();
 
 	fpregs_assert_state_consistent();
-- 
2.7.4


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 4/5] KVM: LAPIC: Delay trace advance expire delta
  2019-05-20  8:18 ` [PATCH v4 4/5] KVM: LAPIC: Delay trace advance expire delta Wanpeng Li
@ 2019-05-20 11:14   ` Paolo Bonzini
  2019-05-20 11:22     ` Wanpeng Li
  0 siblings, 1 reply; 14+ messages in thread
From: Paolo Bonzini @ 2019-05-20 11:14 UTC (permalink / raw)
  To: Wanpeng Li, linux-kernel, kvm
  Cc: Radim Krčmář, Sean Christopherson, Liran Alon

On 20/05/19 10:18, Wanpeng Li wrote:
> From: Wanpeng Li <wanpengli@tencent.com>
> 
> wait_lapic_expire() call was moved above guest_enter_irqoff() because of 
> its tracepoint, which violated the RCU extended quiescent state invoked 
> by guest_enter_irqoff()[1][2]. This patch simply moves the tracepoint 
> below guest_exit_irqoff() in vcpu_enter_guest(). Snapshot the delta before 
> VM-Enter, but trace it after VM-Exit. This can help us to move 
> wait_lapic_expire() just before vmentry in the later patch.
> 
> [1] Commit 8b89fe1f6c43 ("kvm: x86: move tracepoints outside extended quiescent state")
> [2] https://patchwork.kernel.org/patch/7821111/

This is a bit confusing, since the delta is printed after the 
corresponding vmexit but the wait is done before the vmentry.  I think 
we can drop the tracepoint:

------------- 8< ----------------
From ae148d98d49b96b5222e2c78ac1b1e13cc526d71 Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini@redhat.com>
Date: Mon, 20 May 2019 13:10:01 +0200
Subject: [PATCH] KVM: lapic: replace wait_lapic_expire tracepoint with
 restart_apic_timer

wait_lapic_expire() call was moved above guest_enter_irqoff() because of
its tracepoint, which violated the RCU extended quiescent state invoked
by guest_enter_irqoff()[1][2].

We would like to move wait_lapic_expire() just before vmentry, which would
place wait_lapic_expire() again inside the extended quiescent state.  Drop
the tracepoint, but add instead another one that can be useful and where
we can check the status of the adaptive tuning procedure.

[1] Commit 8b89fe1f6c43 ("kvm: x86: move tracepoints outside extended quiescent state")
[2] https://patchwork.kernel.org/patch/7821111/

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

---
 arch/x86/kvm/lapic.c |  4 +++-
 arch/x86/kvm/trace.h | 15 +++++++--------
 2 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index c12b090f4fad..8f05c1d0b486 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1545,7 +1545,6 @@ void wait_lapic_expire(struct kvm_vcpu *vcpu)
 	tsc_deadline = apic->lapic_timer.expired_tscdeadline;
 	apic->lapic_timer.expired_tscdeadline = 0;
 	guest_tsc = kvm_read_l1_tsc(vcpu, rdtsc());
-	trace_kvm_wait_lapic_expire(vcpu->vcpu_id, guest_tsc - tsc_deadline);
 
 	if (guest_tsc < tsc_deadline)
 		__wait_lapic_expire(vcpu, tsc_deadline - guest_tsc);
@@ -1763,6 +1762,9 @@ static void start_sw_timer(struct kvm_lapic *apic)
 
 static void restart_apic_timer(struct kvm_lapic *apic)
 {
+	trace_kvm_restart_apic_timer(apic->vcpu->vcpu_id,
+				     apic->lapic_timer.timer_advance_ns);
+
 	preempt_disable();
 
 	if (!apic_lvtt_period(apic) && atomic_read(&apic->lapic_timer.pending))
diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index 4d47a2631d1f..f6e000038f3f 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -953,24 +953,23 @@
 		  __entry->flags)
 );
 
-TRACE_EVENT(kvm_wait_lapic_expire,
-	TP_PROTO(unsigned int vcpu_id, s64 delta),
-	TP_ARGS(vcpu_id, delta),
+TRACE_EVENT(kvm_restart_apic_timer,
+	TP_PROTO(unsigned int vcpu_id, u32 advance),
+	TP_ARGS(vcpu_id, advance),
 
 	TP_STRUCT__entry(
 		__field(	unsigned int,	vcpu_id		)
-		__field(	s64,		delta		)
+		__field(	u32,		advance		)
 	),
 
 	TP_fast_assign(
 		__entry->vcpu_id	   = vcpu_id;
-		__entry->delta             = delta;
+		__entry->advance           = advance;
 	),
 
-	TP_printk("vcpu %u: delta %lld (%s)",
+	TP_printk("vcpu %u: advance %u",
 		  __entry->vcpu_id,
-		  __entry->delta,
-		  __entry->delta < 0 ? "early" : "late")
+		  __entry->advance)
 );
 
 TRACE_EVENT(kvm_enter_smm,

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 0/5] KVM: LAPIC: Optimize timer latency further
  2019-05-20  8:18 [PATCH v4 0/5] KVM: LAPIC: Optimize timer latency further Wanpeng Li
                   ` (4 preceding siblings ...)
  2019-05-20  8:18 ` [PATCH v4 5/5] KVM: LAPIC: Optimize timer latency further Wanpeng Li
@ 2019-05-20 11:16 ` Paolo Bonzini
  2019-05-22  8:51 ` Wanpeng Li
  6 siblings, 0 replies; 14+ messages in thread
From: Paolo Bonzini @ 2019-05-20 11:16 UTC (permalink / raw)
  To: Wanpeng Li, linux-kernel, kvm; +Cc: Radim Krčmář

On 20/05/19 10:18, Wanpeng Li wrote:
> Advance lapic timer tries to hidden the hypervisor overhead between the 
> host emulated timer fires and the guest awares the timer is fired. However, 
> it just hidden the time between apic_timer_fn/handle_preemption_timer -> 
> wait_lapic_expire, instead of the real position of vmentry which is 
> mentioned in the orignial commit d0659d946be0 ("KVM: x86: add option to 
> advance tscdeadline hrtimer expiration"). There is 700+ cpu cycles between 
> the end of wait_lapic_expire and before world switch on my haswell desktop.
> 
> This patchset tries to narrow the last gap(wait_lapic_expire -> world switch), 
> it takes the real overhead time between apic_timer_fn/handle_preemption_timer
> and before world switch into consideration when adaptively tuning timer 
> advancement. The patchset can reduce 40% latency (~1600+ cycles to ~1000+ 
> cycles on a haswell desktop) for kvm-unit-tests/tscdeadline_latency when 
> testing busy waits.
> 
> v3 -> v4:
>  * create timer_advance_ns debugfs entry iff lapic_in_kernel() 
>  * keep if (guest_tsc < tsc_deadline) before the call to __wait_lapic_expire()
> 
> v2 -> v3:
>  * expose 'kvm_timer.timer_advance_ns' to userspace
>  * move the tracepoint below guest_exit_irqoff()
>  * move wait_lapic_expire() before flushing the L1
> 
> v1 -> v2:
>  * fix indent in patch 1/4
>  * remove the wait_lapic_expire() tracepoint and expose by debugfs
>  * move the call to wait_lapic_expire() into vmx.c and svm.c
> 
> Wanpeng Li (5):
>   KVM: LAPIC: Extract adaptive tune timer advancement logic
>   KVM: LAPIC: Fix lapic_timer_advance_ns parameter overflow
>   KVM: LAPIC: Expose per-vCPU timer_advance_ns to userspace
>   KVM: LAPIC: Delay trace advance expire delta
>   KVM: LAPIC: Optimize timer latency further
> 
>  arch/x86/kvm/debugfs.c | 18 +++++++++++++++
>  arch/x86/kvm/lapic.c   | 60 +++++++++++++++++++++++++++++---------------------
>  arch/x86/kvm/lapic.h   |  3 ++-
>  arch/x86/kvm/svm.c     |  4 ++++
>  arch/x86/kvm/vmx/vmx.c |  4 ++++
>  arch/x86/kvm/x86.c     |  9 ++++----
>  6 files changed, 68 insertions(+), 30 deletions(-)
> 

Queued, thanks (2-3 for 5.2, the rest for 5.3).

Paolo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 4/5] KVM: LAPIC: Delay trace advance expire delta
  2019-05-20 11:14   ` Paolo Bonzini
@ 2019-05-20 11:22     ` Wanpeng Li
  2019-05-20 11:33       ` Paolo Bonzini
  0 siblings, 1 reply; 14+ messages in thread
From: Wanpeng Li @ 2019-05-20 11:22 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: LKML, kvm, Radim Krčmář, Sean Christopherson, Liran Alon

On Mon, 20 May 2019 at 19:14, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 20/05/19 10:18, Wanpeng Li wrote:
> > From: Wanpeng Li <wanpengli@tencent.com>
> >
> > wait_lapic_expire() call was moved above guest_enter_irqoff() because of
> > its tracepoint, which violated the RCU extended quiescent state invoked
> > by guest_enter_irqoff()[1][2]. This patch simply moves the tracepoint
> > below guest_exit_irqoff() in vcpu_enter_guest(). Snapshot the delta before
> > VM-Enter, but trace it after VM-Exit. This can help us to move
> > wait_lapic_expire() just before vmentry in the later patch.
> >
> > [1] Commit 8b89fe1f6c43 ("kvm: x86: move tracepoints outside extended quiescent state")
> > [2] https://patchwork.kernel.org/patch/7821111/
>
> This is a bit confusing, since the delta is printed after the
> corresponding vmexit but the wait is done before the vmentry.  I think
> we can drop the tracepoint:
>
> ------------- 8< ----------------
> From ae148d98d49b96b5222e2c78ac1b1e13cc526d71 Mon Sep 17 00:00:00 2001
> From: Paolo Bonzini <pbonzini@redhat.com>
> Date: Mon, 20 May 2019 13:10:01 +0200
> Subject: [PATCH] KVM: lapic: replace wait_lapic_expire tracepoint with
>  restart_apic_timer
>
> wait_lapic_expire() call was moved above guest_enter_irqoff() because of
> its tracepoint, which violated the RCU extended quiescent state invoked
> by guest_enter_irqoff()[1][2].
>
> We would like to move wait_lapic_expire() just before vmentry, which would
> place wait_lapic_expire() again inside the extended quiescent state.  Drop
> the tracepoint, but add instead another one that can be useful and where
> we can check the status of the adaptive tuning procedure.

https://lkml.org/lkml/2019/5/15/1435

Maybe Sean's comment is reasonable, per-vCPU debugfs entry for
adaptive tuning and wait_lapic_expire() tracepoint for hand tuning.

Regards,
Wanpeng Li

>
> [1] Commit 8b89fe1f6c43 ("kvm: x86: move tracepoints outside extended quiescent state")
> [2] https://patchwork.kernel.org/patch/7821111/
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>
> ---
>  arch/x86/kvm/lapic.c |  4 +++-
>  arch/x86/kvm/trace.h | 15 +++++++--------
>  2 files changed, 10 insertions(+), 9 deletions(-)
>
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index c12b090f4fad..8f05c1d0b486 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -1545,7 +1545,6 @@ void wait_lapic_expire(struct kvm_vcpu *vcpu)
>         tsc_deadline = apic->lapic_timer.expired_tscdeadline;
>         apic->lapic_timer.expired_tscdeadline = 0;
>         guest_tsc = kvm_read_l1_tsc(vcpu, rdtsc());
> -       trace_kvm_wait_lapic_expire(vcpu->vcpu_id, guest_tsc - tsc_deadline);
>
>         if (guest_tsc < tsc_deadline)
>                 __wait_lapic_expire(vcpu, tsc_deadline - guest_tsc);
> @@ -1763,6 +1762,9 @@ static void start_sw_timer(struct kvm_lapic *apic)
>
>  static void restart_apic_timer(struct kvm_lapic *apic)
>  {
> +       trace_kvm_restart_apic_timer(apic->vcpu->vcpu_id,
> +                                    apic->lapic_timer.timer_advance_ns);
> +
>         preempt_disable();
>
>         if (!apic_lvtt_period(apic) && atomic_read(&apic->lapic_timer.pending))
> diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
> index 4d47a2631d1f..f6e000038f3f 100644
> --- a/arch/x86/kvm/trace.h
> +++ b/arch/x86/kvm/trace.h
> @@ -953,24 +953,23 @@
>                   __entry->flags)
>  );
>
> -TRACE_EVENT(kvm_wait_lapic_expire,
> -       TP_PROTO(unsigned int vcpu_id, s64 delta),
> -       TP_ARGS(vcpu_id, delta),
> +TRACE_EVENT(kvm_restart_apic_timer,
> +       TP_PROTO(unsigned int vcpu_id, u32 advance),
> +       TP_ARGS(vcpu_id, advance),
>
>         TP_STRUCT__entry(
>                 __field(        unsigned int,   vcpu_id         )
> -               __field(        s64,            delta           )
> +               __field(        u32,            advance         )
>         ),
>
>         TP_fast_assign(
>                 __entry->vcpu_id           = vcpu_id;
> -               __entry->delta             = delta;
> +               __entry->advance           = advance;
>         ),
>
> -       TP_printk("vcpu %u: delta %lld (%s)",
> +       TP_printk("vcpu %u: advance %u",
>                   __entry->vcpu_id,
> -                 __entry->delta,
> -                 __entry->delta < 0 ? "early" : "late")
> +                 __entry->advance)
>  );
>
>  TRACE_EVENT(kvm_enter_smm,

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 4/5] KVM: LAPIC: Delay trace advance expire delta
  2019-05-20 11:22     ` Wanpeng Li
@ 2019-05-20 11:33       ` Paolo Bonzini
  2019-05-20 11:36         ` Wanpeng Li
  0 siblings, 1 reply; 14+ messages in thread
From: Paolo Bonzini @ 2019-05-20 11:33 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: LKML, kvm, Radim Krčmář, Sean Christopherson, Liran Alon

On 20/05/19 13:22, Wanpeng Li wrote:
>>
>> We would like to move wait_lapic_expire() just before vmentry, which would
>> place wait_lapic_expire() again inside the extended quiescent state.  Drop
>> the tracepoint, but add instead another one that can be useful and where
>> we can check the status of the adaptive tuning procedure.
> https://lkml.org/lkml/2019/5/15/1435
> 
> Maybe Sean's comment is reasonable, per-vCPU debugfs entry for
> adaptive tuning and wait_lapic_expire() tracepoint for hand tuning.

Hmm, yeah, that makes sense.  The location of the tracepoint is a bit
weird, but I guess we can add a comment in the code.

Paolo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 4/5] KVM: LAPIC: Delay trace advance expire delta
  2019-05-20 11:33       ` Paolo Bonzini
@ 2019-05-20 11:36         ` Wanpeng Li
  2019-05-20 11:41           ` Paolo Bonzini
  0 siblings, 1 reply; 14+ messages in thread
From: Wanpeng Li @ 2019-05-20 11:36 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: LKML, kvm, Radim Krčmář, Sean Christopherson, Liran Alon

On Mon, 20 May 2019 at 19:33, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 20/05/19 13:22, Wanpeng Li wrote:
> >>
> >> We would like to move wait_lapic_expire() just before vmentry, which would
> >> place wait_lapic_expire() again inside the extended quiescent state.  Drop
> >> the tracepoint, but add instead another one that can be useful and where
> >> we can check the status of the adaptive tuning procedure.
> > https://lkml.org/lkml/2019/5/15/1435
> >
> > Maybe Sean's comment is reasonable, per-vCPU debugfs entry for
> > adaptive tuning and wait_lapic_expire() tracepoint for hand tuning.
>
> Hmm, yeah, that makes sense.  The location of the tracepoint is a bit
> weird, but I guess we can add a comment in the code.

Do you need me to post a new patchset? :)

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 4/5] KVM: LAPIC: Delay trace advance expire delta
  2019-05-20 11:36         ` Wanpeng Li
@ 2019-05-20 11:41           ` Paolo Bonzini
  2019-05-20 11:45             ` Wanpeng Li
  0 siblings, 1 reply; 14+ messages in thread
From: Paolo Bonzini @ 2019-05-20 11:41 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: LKML, kvm, Radim Krčmář, Sean Christopherson, Liran Alon

On 20/05/19 13:36, Wanpeng Li wrote:
>> Hmm, yeah, that makes sense.  The location of the tracepoint is a bit
>> weird, but I guess we can add a comment in the code.
> Do you need me to post a new patchset? :)

No problem.  The final patch that I committed is this:

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index c12b090f4fad..f8615872ae64 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1502,27 +1502,27 @@ static inline void __wait_lapic_expire(struct kvm_vcpu *vcpu, u64 guest_cycles)
 }
 
 static inline void adjust_lapic_timer_advance(struct kvm_vcpu *vcpu,
-					      u64 guest_tsc, u64 tsc_deadline)
+					      s64 advance_expire_delta)
 {
 	struct kvm_lapic *apic = vcpu->arch.apic;
 	u32 timer_advance_ns = apic->lapic_timer.timer_advance_ns;
 	u64 ns;
 
 	/* too early */
-	if (guest_tsc < tsc_deadline) {
-		ns = (tsc_deadline - guest_tsc) * 1000000ULL;
+	if (advance_expire_delta < 0) {
+		ns = -advance_expire_delta * 1000000ULL;
 		do_div(ns, vcpu->arch.virtual_tsc_khz);
 		timer_advance_ns -= min((u32)ns,
 			timer_advance_ns / LAPIC_TIMER_ADVANCE_ADJUST_STEP);
 	} else {
 	/* too late */
-		ns = (guest_tsc - tsc_deadline) * 1000000ULL;
+		ns = advance_expire_delta * 1000000ULL;
 		do_div(ns, vcpu->arch.virtual_tsc_khz);
 		timer_advance_ns += min((u32)ns,
 			timer_advance_ns / LAPIC_TIMER_ADVANCE_ADJUST_STEP);
 	}
 
-	if (abs(guest_tsc - tsc_deadline) < LAPIC_TIMER_ADVANCE_ADJUST_DONE)
+	if (abs(advance_expire_delta) < LAPIC_TIMER_ADVANCE_ADJUST_DONE)
 		apic->lapic_timer.timer_advance_adjust_done = true;
 	if (unlikely(timer_advance_ns > 5000)) {
 		timer_advance_ns = 0;
@@ -1545,13 +1545,13 @@ void wait_lapic_expire(struct kvm_vcpu *vcpu)
 	tsc_deadline = apic->lapic_timer.expired_tscdeadline;
 	apic->lapic_timer.expired_tscdeadline = 0;
 	guest_tsc = kvm_read_l1_tsc(vcpu, rdtsc());
-	trace_kvm_wait_lapic_expire(vcpu->vcpu_id, guest_tsc - tsc_deadline);
+	apic->lapic_timer.advance_expire_delta = guest_tsc - tsc_deadline;
 
 	if (guest_tsc < tsc_deadline)
 		__wait_lapic_expire(vcpu, tsc_deadline - guest_tsc);
 
 	if (unlikely(!apic->lapic_timer.timer_advance_adjust_done))
-		adjust_lapic_timer_advance(vcpu, guest_tsc, tsc_deadline);
+		adjust_lapic_timer_advance(vcpu, apic->lapic_timer.advance_expire_delta);
 }
 
 static void start_sw_tscdeadline(struct kvm_lapic *apic)
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index d6d049ba3045..3e72a255543d 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -32,6 +32,7 @@ struct kvm_timer {
 	u64 tscdeadline;
 	u64 expired_tscdeadline;
 	u32 timer_advance_ns;
+	s64 advance_expire_delta;
 	atomic_t pending;			/* accumulated triggered timers */
 	bool hv_timer_in_use;
 	bool timer_advance_adjust_done;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e7e57de50a3c..35631505421c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8008,6 +8008,13 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 	++vcpu->stat.exits;
 
 	guest_exit_irqoff();
+	if (lapic_in_kernel(vcpu)) {
+		s64 delta = vcpu->arch.apic->lapic_timer.advance_expire_delta;
+		if (delta != S64_MIN) {
+			trace_kvm_wait_lapic_expire(vcpu->vcpu_id, delta);
+			vcpu->arch.apic->lapic_timer.advance_expire_delta = S64_MIN;
+		}
+	}
 
 	local_irq_enable();
 	preempt_enable();

so that KVM tracks whether wait_lapic_expire was called, and do not
invoke the tracepoint if not.

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 4/5] KVM: LAPIC: Delay trace advance expire delta
  2019-05-20 11:41           ` Paolo Bonzini
@ 2019-05-20 11:45             ` Wanpeng Li
  0 siblings, 0 replies; 14+ messages in thread
From: Wanpeng Li @ 2019-05-20 11:45 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: LKML, kvm, Radim Krčmář, Sean Christopherson, Liran Alon

On Mon, 20 May 2019 at 19:41, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 20/05/19 13:36, Wanpeng Li wrote:
> >> Hmm, yeah, that makes sense.  The location of the tracepoint is a bit
> >> weird, but I guess we can add a comment in the code.
> > Do you need me to post a new patchset? :)
>
> No problem.  The final patch that I committed is this:
>
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index c12b090f4fad..f8615872ae64 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -1502,27 +1502,27 @@ static inline void __wait_lapic_expire(struct kvm_vcpu *vcpu, u64 guest_cycles)
>  }
>
>  static inline void adjust_lapic_timer_advance(struct kvm_vcpu *vcpu,
> -                                             u64 guest_tsc, u64 tsc_deadline)
> +                                             s64 advance_expire_delta)
>  {
>         struct kvm_lapic *apic = vcpu->arch.apic;
>         u32 timer_advance_ns = apic->lapic_timer.timer_advance_ns;
>         u64 ns;
>
>         /* too early */
> -       if (guest_tsc < tsc_deadline) {
> -               ns = (tsc_deadline - guest_tsc) * 1000000ULL;
> +       if (advance_expire_delta < 0) {
> +               ns = -advance_expire_delta * 1000000ULL;
>                 do_div(ns, vcpu->arch.virtual_tsc_khz);
>                 timer_advance_ns -= min((u32)ns,
>                         timer_advance_ns / LAPIC_TIMER_ADVANCE_ADJUST_STEP);
>         } else {
>         /* too late */
> -               ns = (guest_tsc - tsc_deadline) * 1000000ULL;
> +               ns = advance_expire_delta * 1000000ULL;
>                 do_div(ns, vcpu->arch.virtual_tsc_khz);
>                 timer_advance_ns += min((u32)ns,
>                         timer_advance_ns / LAPIC_TIMER_ADVANCE_ADJUST_STEP);
>         }
>
> -       if (abs(guest_tsc - tsc_deadline) < LAPIC_TIMER_ADVANCE_ADJUST_DONE)
> +       if (abs(advance_expire_delta) < LAPIC_TIMER_ADVANCE_ADJUST_DONE)
>                 apic->lapic_timer.timer_advance_adjust_done = true;
>         if (unlikely(timer_advance_ns > 5000)) {
>                 timer_advance_ns = 0;
> @@ -1545,13 +1545,13 @@ void wait_lapic_expire(struct kvm_vcpu *vcpu)
>         tsc_deadline = apic->lapic_timer.expired_tscdeadline;
>         apic->lapic_timer.expired_tscdeadline = 0;
>         guest_tsc = kvm_read_l1_tsc(vcpu, rdtsc());
> -       trace_kvm_wait_lapic_expire(vcpu->vcpu_id, guest_tsc - tsc_deadline);
> +       apic->lapic_timer.advance_expire_delta = guest_tsc - tsc_deadline;
>
>         if (guest_tsc < tsc_deadline)
>                 __wait_lapic_expire(vcpu, tsc_deadline - guest_tsc);
>
>         if (unlikely(!apic->lapic_timer.timer_advance_adjust_done))
> -               adjust_lapic_timer_advance(vcpu, guest_tsc, tsc_deadline);
> +               adjust_lapic_timer_advance(vcpu, apic->lapic_timer.advance_expire_delta);
>  }
>
>  static void start_sw_tscdeadline(struct kvm_lapic *apic)
> diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
> index d6d049ba3045..3e72a255543d 100644
> --- a/arch/x86/kvm/lapic.h
> +++ b/arch/x86/kvm/lapic.h
> @@ -32,6 +32,7 @@ struct kvm_timer {
>         u64 tscdeadline;
>         u64 expired_tscdeadline;
>         u32 timer_advance_ns;
> +       s64 advance_expire_delta;
>         atomic_t pending;                       /* accumulated triggered timers */
>         bool hv_timer_in_use;
>         bool timer_advance_adjust_done;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index e7e57de50a3c..35631505421c 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -8008,6 +8008,13 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>         ++vcpu->stat.exits;
>
>         guest_exit_irqoff();
> +       if (lapic_in_kernel(vcpu)) {
> +               s64 delta = vcpu->arch.apic->lapic_timer.advance_expire_delta;
> +               if (delta != S64_MIN) {
> +                       trace_kvm_wait_lapic_expire(vcpu->vcpu_id, delta);
> +                       vcpu->arch.apic->lapic_timer.advance_expire_delta = S64_MIN;
> +               }
> +       }
>
>         local_irq_enable();
>         preempt_enable();
>
> so that KVM tracks whether wait_lapic_expire was called, and do not
> invoke the tracepoint if not.

Looks good to me, thank you. :)

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 0/5] KVM: LAPIC: Optimize timer latency further
  2019-05-20  8:18 [PATCH v4 0/5] KVM: LAPIC: Optimize timer latency further Wanpeng Li
                   ` (5 preceding siblings ...)
  2019-05-20 11:16 ` [PATCH v4 0/5] " Paolo Bonzini
@ 2019-05-22  8:51 ` Wanpeng Li
  6 siblings, 0 replies; 14+ messages in thread
From: Wanpeng Li @ 2019-05-22  8:51 UTC (permalink / raw)
  To: LKML, kvm; +Cc: Paolo Bonzini, Radim Krčmář

On Mon, 20 May 2019 at 16:18, Wanpeng Li <kernellwp@gmail.com> wrote:
>
> Advance lapic timer tries to hidden the hypervisor overhead between the
> host emulated timer fires and the guest awares the timer is fired. However,
> it just hidden the time between apic_timer_fn/handle_preemption_timer ->
> wait_lapic_expire, instead of the real position of vmentry which is
> mentioned in the orignial commit d0659d946be0 ("KVM: x86: add option to
> advance tscdeadline hrtimer expiration"). There is 700+ cpu cycles between
> the end of wait_lapic_expire and before world switch on my haswell desktop.
>
> This patchset tries to narrow the last gap(wait_lapic_expire -> world switch),
> it takes the real overhead time between apic_timer_fn/handle_preemption_timer
> and before world switch into consideration when adaptively tuning timer
> advancement. The patchset can reduce 40% latency (~1600+ cycles to ~1000+
> cycles on a haswell desktop) for kvm-unit-tests/tscdeadline_latency when
> testing busy waits.

Testing on a Skylake Server, w/ nohz=off, idle=poll in the guest.
Reduces average cyclictest latency from 3us to 2us.

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2019-05-22  8:52 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-20  8:18 [PATCH v4 0/5] KVM: LAPIC: Optimize timer latency further Wanpeng Li
2019-05-20  8:18 ` [PATCH v4 1/5] KVM: LAPIC: Extract adaptive tune timer advancement logic Wanpeng Li
2019-05-20  8:18 ` [PATCH v4 2/5] KVM: LAPIC: Fix lapic_timer_advance_ns parameter overflow Wanpeng Li
2019-05-20  8:18 ` [PATCH v4 3/5] KVM: LAPIC: Expose per-vCPU timer_advance_ns to userspace Wanpeng Li
2019-05-20  8:18 ` [PATCH v4 4/5] KVM: LAPIC: Delay trace advance expire delta Wanpeng Li
2019-05-20 11:14   ` Paolo Bonzini
2019-05-20 11:22     ` Wanpeng Li
2019-05-20 11:33       ` Paolo Bonzini
2019-05-20 11:36         ` Wanpeng Li
2019-05-20 11:41           ` Paolo Bonzini
2019-05-20 11:45             ` Wanpeng Li
2019-05-20  8:18 ` [PATCH v4 5/5] KVM: LAPIC: Optimize timer latency further Wanpeng Li
2019-05-20 11:16 ` [PATCH v4 0/5] " Paolo Bonzini
2019-05-22  8:51 ` Wanpeng Li

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.