[RFC PATCH 0/5] Utilizing VMX preemption for timer virtualization

All of lore.kernel.org
 help / color / mirror / Atom feed

* [RFC PATCH 0/5] Utilizing VMX preemption for timer virtualization
@ 2016-05-20  1:44 Yunhong Jiang
  2016-05-20  1:44 ` [RFC PATCH 1/5] Add the kvm sched_out hook Yunhong Jiang
                   ` (6 more replies)
  0 siblings, 7 replies; 29+ messages in thread
From: Yunhong Jiang @ 2016-05-20  1:44 UTC (permalink / raw)
  To: kvm; +Cc: rkrcmar, pbonzini

The VMX-preemption timer is a feature on VMX, it counts down, from the
value loaded by VM entry, in VMX nonroot operation. When the timer
counts down to zero, it stops counting down and a VM exit occurs.

The VMX preemption timer for tsc deadline timer virtualization. The
VMX preemption timer is armed when the vCPU is running, and a VMExit
will happen if the virtual TSC deadline timer expires.

When the vCPU thread is scheduled out, the tsc deadline timer
virtualization will be switched to use the current solution, i.e. use
the timer for it. It's switched back to VMX preemption timer when the
vCPU thread is scheduled int.

This solution replace the complex OS's hrtimer system, and also the
host timer interrupt handling cost, with a preemption_timer VMexit. It
fits well for some NFV usage scenario, when the vCPU is bound to a
pCPU and the pCPU is isolated, or some similar scenarioes.

However, it possibly has impact if the vCPU thread is scheduled in/out
very frequently, because it switches from/to the hrtimer emulation a
lot. A module parameter is provided to turn it on or off.

Signed-off-by: Yunhong Jiang <yunhong.jiang@intel.com>

Yunhong Jiang (5):
  Add the kvm sched_out hook
  Utilize the vmx preemption timer
  Separate the start_sw_tscdeadline
  Utilize the vmx preemption timer for tsc deadline timer
  Adding trace for the hwemul_timer

 arch/arm/include/asm/kvm_host.h     |   1 +
 arch/mips/include/asm/kvm_host.h    |   1 +
 arch/powerpc/include/asm/kvm_host.h |   1 +
 arch/s390/include/asm/kvm_host.h    |   1 +
 arch/x86/include/asm/kvm_host.h     |   5 ++
 arch/x86/kvm/lapic.c                | 163 +++++++++++++++++++++++++++++++-----
 arch/x86/kvm/lapic.h                |  10 +++
 arch/x86/kvm/trace.h                |  48 +++++++++++
 arch/x86/kvm/vmx.c                  |  78 ++++++++++++++++-
 arch/x86/kvm/x86.c                  |  13 +++
 include/linux/kvm_host.h            |   1 +
 virt/kvm/kvm_main.c                 |   1 +
 12 files changed, 300 insertions(+), 23 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [RFC PATCH 1/5] Add the kvm sched_out hook
  2016-05-20  1:44 [RFC PATCH 0/5] Utilizing VMX preemption for timer virtualization Yunhong Jiang
@ 2016-05-20  1:44 ` Yunhong Jiang
  2016-05-20  1:45 ` [RFC PATCH 2/5] Utilize the vmx preemption timer Yunhong Jiang
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 29+ messages in thread
From: Yunhong Jiang @ 2016-05-20  1:44 UTC (permalink / raw)
  To: kvm; +Cc: rkrcmar, pbonzini

From: Yunhong Jiang <yunhong.jiang@gmail.com>

When a vCPU thread is scheduled out, the sched_out hook will be invoked.
For x86 platform, a kvm_x86_ops item is added to be invoked.

Signed-off-by: Yunhong Jiang <yunhong.jiang@intel.com>
---
 arch/arm/include/asm/kvm_host.h     | 1 +
 arch/mips/include/asm/kvm_host.h    | 1 +
 arch/powerpc/include/asm/kvm_host.h | 1 +
 arch/s390/include/asm/kvm_host.h    | 1 +
 arch/x86/include/asm/kvm_host.h     | 1 +
 arch/x86/kvm/vmx.c                  | 5 +++++
 arch/x86/kvm/x86.c                  | 6 ++++++
 include/linux/kvm_host.h            | 1 +
 virt/kvm/kvm_main.c                 | 1 +
 9 files changed, 18 insertions(+)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 0df6b1fc9655..8ea4b8a1a27b 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -292,6 +292,7 @@ static inline void kvm_arch_sync_events(struct kvm *kvm) {}
 static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
 static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
 static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}
+static inline void kvm_arch_sched_out(struct kvm_vcpu *vcpu, int cpu) {}
 
 static inline void kvm_arm_init_debug(void) {}
 static inline void kvm_arm_setup_debug(struct kvm_vcpu *vcpu) {}
diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h
index 6733ac575da4..6e432a3065e3 100644
--- a/arch/mips/include/asm/kvm_host.h
+++ b/arch/mips/include/asm/kvm_host.h
@@ -812,6 +812,7 @@ static inline void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
 		struct kvm_memory_slot *slot) {}
 static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
 static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
+static inline void kvm_arch_sched_out(struct kvm_vcpu *vcpu, int cpu) {}
 static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
 static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
 static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index ec35af34a3fb..2a0ebb0a78eb 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -725,6 +725,7 @@ static inline void kvm_arch_sync_events(struct kvm *kvm) {}
 static inline void kvm_arch_memslots_updated(struct kvm *kvm, struct kvm_memslots *slots) {}
 static inline void kvm_arch_flush_shadow_all(struct kvm *kvm) {}
 static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
+static inline void kvm_arch_sched_out(struct kvm_vcpu *vcpu, int cpu) {}
 static inline void kvm_arch_exit(void) {}
 static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
 static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 37b9017c6a96..69084b874837 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -688,6 +688,7 @@ static inline void kvm_arch_check_processor_compat(void *rtn) {}
 static inline void kvm_arch_sync_events(struct kvm *kvm) {}
 static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
 static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
+static inline void kvm_arch_sched_out(struct kvm_vcpu *vcpu, int cpu) {}
 static inline void kvm_arch_free_memslot(struct kvm *kvm,
 		struct kvm_memory_slot *free, struct kvm_memory_slot *dont) {}
 static inline void kvm_arch_memslots_updated(struct kvm *kvm, struct kvm_memslots *slots) {}
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index e0fbe7e70dc1..5e6b3ce7748f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -959,6 +959,7 @@ struct kvm_x86_ops {
 	int (*check_nested_events)(struct kvm_vcpu *vcpu, bool external_intr);
 
 	void (*sched_in)(struct kvm_vcpu *kvm, int cpu);
+	void (*sched_out)(struct kvm_vcpu *vcpu);
 
 	/*
 	 * Arch-specific dirty logging hooks. These hooks are only supposed to
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index e605d1ed334f..9e078ff29f86 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -10656,6 +10656,10 @@ static void vmx_sched_in(struct kvm_vcpu *vcpu, int cpu)
 		shrink_ple_window(vcpu);
 }
 
+static void vmx_sched_out(struct kvm_vcpu *vcpu)
+{
+}
+
 static void vmx_slot_enable_log_dirty(struct kvm *kvm,
 				     struct kvm_memory_slot *slot)
 {
@@ -11001,6 +11005,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
 	.check_nested_events = vmx_check_nested_events,
 
 	.sched_in = vmx_sched_in,
+	.sched_out = vmx_sched_out,
 
 	.slot_enable_log_dirty = vmx_slot_enable_log_dirty,
 	.slot_disable_log_dirty = vmx_slot_disable_log_dirty,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c805cf494154..5776473be362 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7727,6 +7727,12 @@ void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu)
 	kvm_x86_ops->sched_in(vcpu, cpu);
 }
 
+void kvm_arch_sched_out(struct kvm_vcpu *vcpu)
+{
+	if (kvm_x86_ops->sched_out)
+		kvm_x86_ops->sched_out(vcpu);
+}
+
 int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 {
 	if (type)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index b1fa8f11c95b..04c48c50faf3 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -722,6 +722,7 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu);
 void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu);
 
 void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu);
+void kvm_arch_sched_out(struct kvm_vcpu *vcpu);
 
 void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu);
 void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index dd4ac9d9e8f5..711edf7224b1 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3533,6 +3533,7 @@ static void kvm_sched_out(struct preempt_notifier *pn,
 
 	if (current->state == TASK_RUNNING)
 		vcpu->preempted = true;
+	kvm_arch_sched_out(vcpu);
 	kvm_arch_vcpu_put(vcpu);
 }
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 2/5] Utilize the vmx preemption timer
  2016-05-20  1:44 [RFC PATCH 0/5] Utilizing VMX preemption for timer virtualization Yunhong Jiang
  2016-05-20  1:44 ` [RFC PATCH 1/5] Add the kvm sched_out hook Yunhong Jiang
@ 2016-05-20  1:45 ` Yunhong Jiang
  2016-05-20  9:45   ` Paolo Bonzini
  2016-05-20  1:45 ` [RFC PATCH 3/5] Separate the start_sw_tscdeadline Yunhong Jiang
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 29+ messages in thread
From: Yunhong Jiang @ 2016-05-20  1:45 UTC (permalink / raw)
  To: kvm; +Cc: rkrcmar, pbonzini

From: Yunhong Jiang <yunhong.jiang@gmail.com>

Adding the basic VMX preemption timer functionality, including checking
if the feature is supported, setup/clean the VMX preemption timer. Also
adds a parameter to state if the VMX preemption timer should be utilized.

Signed-off-by: Yunhong Jiang <yunhong.jiang@intel.com>
---
 arch/x86/include/asm/kvm_host.h |  4 ++++
 arch/x86/kvm/lapic.c            |  7 +++++++
 arch/x86/kvm/vmx.c              | 45 ++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 55 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 5e6b3ce7748f..8e58db20b3a4 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1006,6 +1006,10 @@ struct kvm_x86_ops {
 	int (*update_pi_irte)(struct kvm *kvm, unsigned int host_irq,
 			      uint32_t guest_irq, bool set);
 	void (*apicv_post_state_restore)(struct kvm_vcpu *vcpu);
+
+	int (*hw_emul_timer)(struct kvm_vcpu *vcpu);
+	void (*set_hwemul_timer)(struct kvm_vcpu *vcpu, u64 tsc);
+	void (*clear_hwemul_timer)(struct kvm_vcpu *vcpu);
 };
 
 struct kvm_arch_async_pf {
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index bbb5b283ff63..8908ee514f6c 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -256,6 +256,13 @@ static inline int apic_lvtt_tscdeadline(struct kvm_lapic *apic)
 	return apic->lapic_timer.timer_mode == APIC_LVT_TIMER_TSCDEADLINE;
 }
 
+static inline int hw_emul_timer(struct kvm_lapic *apic)
+{
+	if (kvm_x86_ops->hw_emul_timer)
+		return kvm_x86_ops->hw_emul_timer(apic->vcpu);
+	return 0;
+}
+
 static inline int apic_lvt_nmi_mode(u32 lvt_val)
 {
 	return (lvt_val & (APIC_MODE_MASK | APIC_LVT_MASKED)) == APIC_DM_NMI;
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 9e078ff29f86..5475a7699ee5 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -110,6 +110,9 @@ module_param_named(pml, enable_pml, bool, S_IRUGO);
 
 #define KVM_VMX_TSC_MULTIPLIER_MAX     0xffffffffffffffffULL
 
+static bool __read_mostly hwemul_timer;
+module_param_named(hwemul_timer, hwemul_timer, bool, S_IRUGO);
+
 #define KVM_GUEST_CR0_MASK (X86_CR0_NW | X86_CR0_CD)
 #define KVM_VM_CR0_ALWAYS_ON_UNRESTRICTED_GUEST (X86_CR0_WP | X86_CR0_NE)
 #define KVM_VM_CR0_ALWAYS_ON						\
@@ -1056,6 +1059,20 @@ static inline bool cpu_has_vmx_virtual_intr_delivery(void)
 		SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY;
 }
 
+static inline bool cpu_has_preemption_timer(void)
+{
+	return vmcs_config.pin_based_exec_ctrl &
+		PIN_BASED_VMX_PREEMPTION_TIMER;
+}
+
+static inline int cpu_preemption_timer_multi(void)
+{
+	u64 vmx_msr;
+
+	rdmsrl(MSR_IA32_VMX_MISC, vmx_msr);
+	return vmx_msr & VMX_MISC_PREEMPTION_TIMER_RATE_MASK;
+}
+
 static inline bool cpu_has_vmx_posted_intr(void)
 {
 	return IS_ENABLED(CONFIG_X86_LOCAL_APIC) &&
@@ -3306,7 +3323,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
 		return -EIO;
 
 	min = PIN_BASED_EXT_INTR_MASK | PIN_BASED_NMI_EXITING;
-	opt = PIN_BASED_VIRTUAL_NMIS | PIN_BASED_POSTED_INTR;
+	opt = PIN_BASED_VIRTUAL_NMIS | PIN_BASED_POSTED_INTR |
+		 PIN_BASED_VMX_PREEMPTION_TIMER;
 	if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_PINBASED_CTLS,
 				&_pin_based_exec_control) < 0)
 		return -EIO;
@@ -4779,6 +4797,8 @@ static u32 vmx_pin_based_exec_ctrl(struct vcpu_vmx *vmx)
 
 	if (!kvm_vcpu_apicv_active(&vmx->vcpu))
 		pin_based_exec_ctrl &= ~PIN_BASED_POSTED_INTR;
+	/* Enable the preemption timer dynamically */
+	pin_based_exec_ctrl &= ~PIN_BASED_VMX_PREEMPTION_TIMER;
 	return pin_based_exec_ctrl;
 }
 
@@ -10650,6 +10670,25 @@ static int vmx_check_intercept(struct kvm_vcpu *vcpu,
 	return X86EMUL_CONTINUE;
 }
 
+static int vmx_hwemul_timer(struct kvm_vcpu *vcpu)
+{
+	return hwemul_timer && cpu_has_preemption_timer();
+}
+
+static void vmx_set_hwemul_timer(struct kvm_vcpu *vcpu, u64 target_tsc)
+{
+	vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
+			target_tsc >> cpu_preemption_timer_multi());
+	vmcs_set_bits(PIN_BASED_VM_EXEC_CONTROL,
+			PIN_BASED_VMX_PREEMPTION_TIMER);
+}
+
+static void vmx_clear_hwemul_timer(struct kvm_vcpu *vcpu)
+{
+	vmcs_clear_bits(PIN_BASED_VM_EXEC_CONTROL,
+			PIN_BASED_VMX_PREEMPTION_TIMER);
+}
+
 static void vmx_sched_in(struct kvm_vcpu *vcpu, int cpu)
 {
 	if (ple_gap)
@@ -11018,6 +11057,10 @@ static struct kvm_x86_ops vmx_x86_ops = {
 	.pmu_ops = &intel_pmu_ops,
 
 	.update_pi_irte = vmx_update_pi_irte,
+
+	.hw_emul_timer = vmx_hwemul_timer,
+	.set_hwemul_timer = vmx_set_hwemul_timer,
+	.clear_hwemul_timer = vmx_clear_hwemul_timer,
 };
 
 static int __init vmx_init(void)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 3/5] Separate the start_sw_tscdeadline
  2016-05-20  1:44 [RFC PATCH 0/5] Utilizing VMX preemption for timer virtualization Yunhong Jiang
  2016-05-20  1:44 ` [RFC PATCH 1/5] Add the kvm sched_out hook Yunhong Jiang
  2016-05-20  1:45 ` [RFC PATCH 2/5] Utilize the vmx preemption timer Yunhong Jiang
@ 2016-05-20  1:45 ` Yunhong Jiang
  2016-05-20 10:16   ` Paolo Bonzini
  2016-05-20  1:45 ` [RFC PATCH 4/5] Utilize the vmx preemption timer for tsc deadline timer Yunhong Jiang
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 29+ messages in thread
From: Yunhong Jiang @ 2016-05-20  1:45 UTC (permalink / raw)
  To: kvm; +Cc: rkrcmar, pbonzini

From: Yunhong Jiang <yunhong.jiang@gmail.com>

The function to start the tsc deadline timer virtualization will be used
also by the sched_out function when we use hwemul_timer, so substract it
as a separated function. No logic changes.

Signed-off-by: Yunhong Jiang <yunhong.jiang@intel.com>
---
 arch/x86/kvm/lapic.c | 57 ++++++++++++++++++++++++++++------------------------
 1 file changed, 31 insertions(+), 26 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 8908ee514f6c..12c416929d9c 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1320,6 +1320,36 @@ void wait_lapic_expire(struct kvm_vcpu *vcpu)
 		__delay(tsc_deadline - guest_tsc);
 }
 
+static void start_sw_tscdeadline(struct kvm_lapic *apic)
+{
+	u64 guest_tsc, tscdeadline = apic->lapic_timer.tscdeadline;
+	u64 ns = 0;
+	ktime_t expire;
+	struct kvm_vcpu *vcpu = apic->vcpu;
+	unsigned long this_tsc_khz = vcpu->arch.virtual_tsc_khz;
+	unsigned long flags;
+	ktime_t now;
+
+	if (unlikely(!tscdeadline || !this_tsc_khz))
+		return;
+
+	local_irq_save(flags);
+
+	now = apic->lapic_timer.timer.base->get_time();
+	guest_tsc = kvm_read_l1_tsc(vcpu, rdtsc());
+	if (likely(tscdeadline > guest_tsc)) {
+		ns = (tscdeadline - guest_tsc) * 1000000ULL;
+		do_div(ns, this_tsc_khz);
+		expire = ktime_add_ns(now, ns);
+		expire = ktime_sub_ns(expire, lapic_timer_advance_ns);
+		hrtimer_start(&apic->lapic_timer.timer,
+				expire, HRTIMER_MODE_ABS_PINNED);
+	} else
+		apic_timer_expired(apic);
+
+	local_irq_restore(flags);
+}
+
 static void start_apic_timer(struct kvm_lapic *apic)
 {
 	ktime_t now;
@@ -1366,32 +1396,7 @@ static void start_apic_timer(struct kvm_lapic *apic)
 			   ktime_to_ns(ktime_add_ns(now,
 					apic->lapic_timer.period)));
 	} else if (apic_lvtt_tscdeadline(apic)) {
-		/* lapic timer in tsc deadline mode */
-		u64 guest_tsc, tscdeadline = apic->lapic_timer.tscdeadline;
-		u64 ns = 0;
-		ktime_t expire;
-		struct kvm_vcpu *vcpu = apic->vcpu;
-		unsigned long this_tsc_khz = vcpu->arch.virtual_tsc_khz;
-		unsigned long flags;
-
-		if (unlikely(!tscdeadline || !this_tsc_khz))
-			return;
-
-		local_irq_save(flags);
-
-		now = apic->lapic_timer.timer.base->get_time();
-		guest_tsc = kvm_read_l1_tsc(vcpu, rdtsc());
-		if (likely(tscdeadline > guest_tsc)) {
-			ns = (tscdeadline - guest_tsc) * 1000000ULL;
-			do_div(ns, this_tsc_khz);
-			expire = ktime_add_ns(now, ns);
-			expire = ktime_sub_ns(expire, lapic_timer_advance_ns);
-			hrtimer_start(&apic->lapic_timer.timer,
-				      expire, HRTIMER_MODE_ABS_PINNED);
-		} else
-			apic_timer_expired(apic);
-
-		local_irq_restore(flags);
+		start_sw_tscdeadline(apic);
 	}
 }
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 4/5] Utilize the vmx preemption timer for tsc deadline timer
  2016-05-20  1:44 [RFC PATCH 0/5] Utilizing VMX preemption for timer virtualization Yunhong Jiang
                   ` (2 preceding siblings ...)
  2016-05-20  1:45 ` [RFC PATCH 3/5] Separate the start_sw_tscdeadline Yunhong Jiang
@ 2016-05-20  1:45 ` Yunhong Jiang
  2016-05-20 10:34   ` Paolo Bonzini
  2016-05-20  1:45 ` [RFC PATCH 5/5] Adding trace for the hwemul_timer Yunhong Jiang
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 29+ messages in thread
From: Yunhong Jiang @ 2016-05-20  1:45 UTC (permalink / raw)
  To: kvm; +Cc: rkrcmar, pbonzini

From: Yunhong Jiang <yunhong.jiang@gmail.com>

Utilizing the VMX preemption timer for tsc deadline timer
virtualization. The VMX preemption timer is armed when the vCPU is
running, and a VMExit will happen if the virtual TSC deadline timer
expires.

When the vCPU thread is scheduled out, the tsc deadline timer
virtualization will be switched to use the current solution, i.e. use
the timer for it. It's switched back to VMX preemption timer when the
vCPU thread is scheduled int.

This solution avoids the complex OS's hrtimer system, and also the host
timer interrupt handling cost, with a preemption_timer VMexit. It fits
well for some NFV usage scenario, when the vCPU is bound to a pCPU and
the pCPU is isolated, or some similar scenario.

However, it possibly has impact if the vCPU thread is scheduled in/out
very frequently, because it switches from/to the hrtimer emulation a lot.

Signed-off-by: Yunhong Jiang <yunhong.jiang@intel.com>
---
 arch/x86/kvm/lapic.c | 108 +++++++++++++++++++++++++++++++++++++++++++++++++--
 arch/x86/kvm/lapic.h |  10 +++++
 arch/x86/kvm/vmx.c   |  26 +++++++++++++
 arch/x86/kvm/x86.c   |   6 +++
 4 files changed, 147 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 12c416929d9c..c9e32bf1a613 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1320,7 +1320,7 @@ void wait_lapic_expire(struct kvm_vcpu *vcpu)
 		__delay(tsc_deadline - guest_tsc);
 }
 
-static void start_sw_tscdeadline(struct kvm_lapic *apic)
+static void start_sw_tscdeadline(struct kvm_lapic *apic, int no_expire)
 {
 	u64 guest_tsc, tscdeadline = apic->lapic_timer.tscdeadline;
 	u64 ns = 0;
@@ -1337,7 +1337,8 @@ static void start_sw_tscdeadline(struct kvm_lapic *apic)
 
 	now = apic->lapic_timer.timer.base->get_time();
 	guest_tsc = kvm_read_l1_tsc(vcpu, rdtsc());
-	if (likely(tscdeadline > guest_tsc)) {
+	/* Not trigger the apic_timer if invoked from sched_out */
+	if (no_expire || likely(tscdeadline > guest_tsc)) {
 		ns = (tscdeadline - guest_tsc) * 1000000ULL;
 		do_div(ns, this_tsc_khz);
 		expire = ktime_add_ns(now, ns);
@@ -1396,9 +1397,110 @@ static void start_apic_timer(struct kvm_lapic *apic)
 			   ktime_to_ns(ktime_add_ns(now,
 					apic->lapic_timer.period)));
 	} else if (apic_lvtt_tscdeadline(apic)) {
-		start_sw_tscdeadline(apic);
+		/* lapic timer in tsc deadline mode */
+		if (hw_emul_timer(apic)) {
+			if (unlikely(!apic->lapic_timer.tscdeadline ||
+					!apic->vcpu->arch.virtual_tsc_khz))
+				return;
+
+			/* Expired timer will be checked on vcpu_run() */
+			apic->lapic_timer.hw_emulation = HWEMUL_ENABLED;
+		} else
+			start_sw_tscdeadline(apic, 0);
+	}
+}
+
+void switch_to_hw_lapic_timer(struct kvm_vcpu *vcpu)
+{
+	struct kvm_lapic *apic = vcpu->arch.apic;
+
+	if (apic->lapic_timer.hw_emulation)
+		return;
+
+	if (apic_lvtt_tscdeadline(apic) &&
+	    !atomic_read(&apic->lapic_timer.pending)) {
+		hrtimer_cancel(&apic->lapic_timer.timer);
+		/* In case the timer triggered in above small window */
+		if (!atomic_read(&apic->lapic_timer.pending))
+			apic->lapic_timer.hw_emulation = HWEMUL_ENABLED;
+	}
+}
+EXPORT_SYMBOL_GPL(switch_to_hw_lapic_timer);
+
+void switch_to_sw_lapic_timer(struct kvm_vcpu *vcpu)
+{
+	struct kvm_lapic *apic = vcpu->arch.apic;
+
+	if (!apic->lapic_timer.hw_emulation)
+		return;
+
+	if (apic->lapic_timer.hw_emulation == HWEMUL_INJECTED)
+		kvm_x86_ops->clear_hwemul_timer(vcpu);
+	apic->lapic_timer.hw_emulation = 0;
+
+	if (atomic_read(&apic->lapic_timer.pending))
+		return;
+
+	/* Don't trigger the apic_timer_expired() for deadlock */
+	start_sw_tscdeadline(apic, 1);
+}
+EXPORT_SYMBOL_GPL(switch_to_sw_lapic_timer);
+
+/*
+ * Check the hwemul timer status.
+ * -1: hwemul timer is not enabled
+ * >0: hwemul timer it not expired yet, the return is the delta tsc
+ *  0: hwemul timer expired already
+ */
+int check_apic_hwemul_timer(struct kvm_vcpu *vcpu)
+{
+	struct kvm_lapic *apic = vcpu->arch.apic;
+
+	if (apic->lapic_timer.hw_emulation) {
+		u64 tscdeadline = apic->lapic_timer.tscdeadline;
+		u64 guest_tsc = kvm_read_l1_tsc(vcpu, rdtsc());
+
+		if (tscdeadline <= guest_tsc)
+			return 0;
+		else
+			return (tscdeadline - guest_tsc);
+	}
+	return -1;
+}
+
+int inject_expired_hwemul_timer(struct kvm_vcpu *vcpu)
+{
+	if (!check_apic_hwemul_timer(vcpu)) {
+		struct kvm_lapic *apic = vcpu->arch.apic;
+
+		if (apic->lapic_timer.hw_emulation == HWEMUL_INJECTED)
+			kvm_x86_ops->clear_hwemul_timer(vcpu);
+		apic->lapic_timer.hw_emulation = 0;
+		atomic_inc(&apic->lapic_timer.pending);
+		kvm_set_pending_timer(vcpu);
+		return 1;
 	}
+
+	return 0;
+}
+
+int inject_pending_hwemul_timer(struct kvm_vcpu *vcpu)
+{
+	u64 hwemultsc;
+
+	hwemultsc = check_apic_hwemul_timer(vcpu);
+	/* Just before vmentry, so inject even if expired */
+	if (hwemultsc >= 0) {
+		struct kvm_lapic *apic = vcpu->arch.apic;
+
+		kvm_x86_ops->set_hwemul_timer(vcpu, hwemultsc);
+		apic->lapic_timer.hw_emulation = HWEMUL_INJECTED;
+		return 1;
+	}
+
+	return 0;
 }
+EXPORT_SYMBOL_GPL(inject_pending_hwemul_timer);
 
 static void apic_manage_nmi_watchdog(struct kvm_lapic *apic, u32 lvt0_val)
 {
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 891c6da7d4aa..5037d7bf609a 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -12,6 +12,10 @@
 #define KVM_APIC_SHORT_MASK	0xc0000
 #define KVM_APIC_DEST_MASK	0x800
 
+#define HWEMUL_ENABLED		1
+/* The VMCS has been set for the vmx preemption timer */
+#define HWEMUL_INJECTED		2
+
 struct kvm_timer {
 	struct hrtimer timer;
 	s64 period; 				/* unit: ns */
@@ -20,6 +24,7 @@ struct kvm_timer {
 	u64 tscdeadline;
 	u64 expired_tscdeadline;
 	atomic_t pending;			/* accumulated triggered timers */
+	int hw_emulation;
 };
 
 struct kvm_lapic {
@@ -212,4 +217,9 @@ bool kvm_intr_is_single_vcpu_fast(struct kvm *kvm, struct kvm_lapic_irq *irq,
 			struct kvm_vcpu **dest_vcpu);
 int kvm_vector_to_index(u32 vector, u32 dest_vcpus,
 			const unsigned long *bitmap, u32 bitmap_size);
+void switch_to_sw_lapic_timer(struct kvm_vcpu *vcpu);
+void switch_to_hw_lapic_timer(struct kvm_vcpu *vcpu);
+int check_apic_hwemul_timer(struct kvm_vcpu *vcpu);
+int inject_expired_hwemul_timer(struct kvm_vcpu *vcpu);
+int inject_pending_hwemul_timer(struct kvm_vcpu *vcpu);
 #endif
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 5475a7699ee5..f3659ab45b30 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -7572,6 +7572,23 @@ static int handle_pcommit(struct kvm_vcpu *vcpu)
 	return 1;
 }
 
+static int handle_preemption_timer(struct kvm_vcpu *vcpu)
+{
+	struct kvm_lapic *apic = vcpu->arch.apic;
+
+	if (apic->lapic_timer.hw_emulation != HWEMUL_INJECTED)
+		printk(KERN_WARNING "Preemption timer w/o hwemulation\n");
+
+	if (!atomic_read(&apic->lapic_timer.pending)) {
+		atomic_inc(&apic->lapic_timer.pending);
+		kvm_make_request(KVM_REQ_PENDING_TIMER, vcpu);
+	}
+
+	apic->lapic_timer.hw_emulation = 0;
+	vmcs_clear_bits(PIN_BASED_VM_EXEC_CONTROL,
+			PIN_BASED_VMX_PREEMPTION_TIMER);
+	return 1;
+}
 /*
  * The exit handlers return 1 if the exit was handled fully and guest execution
  * may resume.  Otherwise they set the kvm_run parameter to indicate what needs
@@ -7623,6 +7640,7 @@ static int (*const kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
 	[EXIT_REASON_XRSTORS]                 = handle_xrstors,
 	[EXIT_REASON_PML_FULL]		      = handle_pml_full,
 	[EXIT_REASON_PCOMMIT]                 = handle_pcommit,
+	[EXIT_REASON_PREEMPTION_TIMER]	      = handle_preemption_timer,
 };
 
 static const int kvm_vmx_max_exit_handlers =
@@ -8674,6 +8692,8 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
 	if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP)
 		vmx_set_interrupt_shadow(vcpu, 0);
 
+	inject_pending_hwemul_timer(vcpu);
+
 	if (vmx->guest_pkru_valid)
 		__write_pkru(vmx->guest_pkru);
 
@@ -10693,10 +10713,16 @@ static void vmx_sched_in(struct kvm_vcpu *vcpu, int cpu)
 {
 	if (ple_gap)
 		shrink_ple_window(vcpu);
+	if (vmx_hwemul_timer(vcpu))
+		switch_to_hw_lapic_timer(vcpu);
 }
 
 static void vmx_sched_out(struct kvm_vcpu *vcpu)
 {
+	struct kvm_lapic *apic = vcpu->arch.apic;
+
+	if (apic->lapic_timer.hw_emulation)
+		switch_to_sw_lapic_timer(vcpu);
 }
 
 static void vmx_slot_enable_log_dirty(struct kvm *kvm,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5776473be362..a613bcfda59a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6608,6 +6608,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 
 	local_irq_disable();
 
+	inject_expired_hwemul_timer(vcpu);
+
 	if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests
 	    || need_resched() || signal_pending(current)) {
 		vcpu->mode = OUTSIDE_GUEST_MODE;
@@ -6773,6 +6775,10 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
 			break;
 
 		clear_bit(KVM_REQ_PENDING_TIMER, &vcpu->requests);
+
+		/* Inject the hwemul timer if expired to avoid one VMExit */
+		inject_expired_hwemul_timer(vcpu);
+
 		if (kvm_cpu_has_pending_timer(vcpu))
 			kvm_inject_pending_timer_irqs(vcpu);
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 5/5] Adding trace for the hwemul_timer
  2016-05-20  1:44 [RFC PATCH 0/5] Utilizing VMX preemption for timer virtualization Yunhong Jiang
                   ` (3 preceding siblings ...)
  2016-05-20  1:45 ` [RFC PATCH 4/5] Utilize the vmx preemption timer for tsc deadline timer Yunhong Jiang
@ 2016-05-20  1:45 ` Yunhong Jiang
  2016-05-20 10:28   ` Paolo Bonzini
  2016-05-20  6:03 ` [RFC PATCH 0/5] Utilizing VMX preemption for timer virtualization Jan Kiszka
  2016-05-20 18:18 ` Marcelo Tosatti
  6 siblings, 1 reply; 29+ messages in thread
From: Yunhong Jiang @ 2016-05-20  1:45 UTC (permalink / raw)
  To: kvm; +Cc: rkrcmar, pbonzini

From: Yunhong Jiang <yunhong.jiang@gmail.com>

Adding some tracepoint to trace the hwemul_timer, like switching between
the software method and hwemul method.

Signed-off-by: Yunhong Jiang <yunhong.jiang@intel.com>
---
 arch/x86/kvm/lapic.c |  5 +++++
 arch/x86/kvm/trace.h | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx.c   |  2 ++
 arch/x86/kvm/x86.c   |  1 +
 4 files changed, 56 insertions(+)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index c9e32bf1a613..de7e20ac11f5 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1416,6 +1416,7 @@ void switch_to_hw_lapic_timer(struct kvm_vcpu *vcpu)
 
 	if (apic->lapic_timer.hw_emulation)
 		return;
+	trace_kvm_hw_emul_sched_in(apic->lapic_timer.hw_emulation);
 
 	if (apic_lvtt_tscdeadline(apic) &&
 	    !atomic_read(&apic->lapic_timer.pending)) {
@@ -1434,6 +1435,8 @@ void switch_to_sw_lapic_timer(struct kvm_vcpu *vcpu)
 	if (!apic->lapic_timer.hw_emulation)
 		return;
 
+	trace_kvm_hw_emul_sched_out(apic->lapic_timer.hw_emulation);
+
 	if (apic->lapic_timer.hw_emulation == HWEMUL_INJECTED)
 		kvm_x86_ops->clear_hwemul_timer(vcpu);
 	apic->lapic_timer.hw_emulation = 0;
@@ -1495,6 +1498,8 @@ int inject_pending_hwemul_timer(struct kvm_vcpu *vcpu)
 
 		kvm_x86_ops->set_hwemul_timer(vcpu, hwemultsc);
 		apic->lapic_timer.hw_emulation = HWEMUL_INJECTED;
+		trace_kvm_hw_emul_entry(vcpu->vcpu_id,
+				apic->lapic_timer.hw_emulation);
 		return 1;
 	}
 
diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index 8de925031b5c..b937f6055b48 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -1348,6 +1348,54 @@ TRACE_EVENT(kvm_avic_unaccelerated_access,
 		  __entry->vec)
 );
 
+TRACE_EVENT(kvm_hw_emul_sched_in,
+       TP_PROTO(unsigned int prev_value),
+       TP_ARGS(prev_value),
+       TP_STRUCT__entry(
+               __field(        unsigned int, prev_value)
+       ),
+       TP_fast_assign(
+               __entry->prev_value = prev_value;
+       ),
+       TP_printk("prev hw_emu value %x\n", __entry->prev_value)
+);
+
+TRACE_EVENT(kvm_hw_emul_sched_out,
+       TP_PROTO(unsigned int prev_value),
+       TP_ARGS(prev_value),
+       TP_STRUCT__entry(
+               __field(        unsigned int, prev_value)
+       ),
+       TP_fast_assign(
+               __entry->prev_value = prev_value;
+       ),
+       TP_printk("prev hw_emu value %x\n", __entry->prev_value)
+);
+
+TRACE_EVENT(kvm_hw_emul_state,
+       TP_PROTO(unsigned int vcpu_id, bool exit, unsigned int hw_emulated),
+       TP_ARGS(vcpu_id, exit, hw_emulated),
+       TP_STRUCT__entry(
+               __field(        unsigned int, vcpu_id)
+               __field(        bool, exit)
+               __field(        unsigned int, hw_emulated)
+       ),
+       TP_fast_assign(
+               __entry->vcpu_id = vcpu_id;
+               __entry->exit = exit;
+               __entry->hw_emulated = hw_emulated;
+       ),
+       TP_printk("vcpu_id %x %s preemption timer %x\n",
+		 __entry->vcpu_id,
+		 __entry->exit ? "exit" : "entry",
+		 __entry->hw_emulated)
+);
+#define trace_kvm_hw_emul_entry(vcpu_id, hw_emulated) \
+	trace_kvm_hw_emul_state(vcpu_id, false, hw_emulated)
+
+#define trace_kvm_hw_emul_exit(vcpu_id, hw_emulated) \
+	trace_kvm_hw_emul_state(vcpu_id, true, hw_emulated)
+
 #endif /* _TRACE_KVM_H */
 
 #undef TRACE_INCLUDE_PATH
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index f3659ab45b30..8e9c9d845a35 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -7576,6 +7576,8 @@ static int handle_preemption_timer(struct kvm_vcpu *vcpu)
 {
 	struct kvm_lapic *apic = vcpu->arch.apic;
 
+	trace_kvm_hw_emul_exit(vcpu->vcpu_id, apic->lapic_timer.hw_emulation);
+
 	if (apic->lapic_timer.hw_emulation != HWEMUL_INJECTED)
 		printk(KERN_WARNING "Preemption timer w/o hwemulation\n");
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a613bcfda59a..3aff9ac06733 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8449,3 +8449,4 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_pml_full);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_pi_irte_update);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_unaccelerated_access);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_incomplete_ipi);
+EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_hw_emul_state);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 0/5] Utilizing VMX preemption for timer virtualization
  2016-05-20  1:44 [RFC PATCH 0/5] Utilizing VMX preemption for timer virtualization Yunhong Jiang
                   ` (4 preceding siblings ...)
  2016-05-20  1:45 ` [RFC PATCH 5/5] Adding trace for the hwemul_timer Yunhong Jiang
@ 2016-05-20  6:03 ` Jan Kiszka
  2016-05-20  9:41   ` Paolo Bonzini
  2016-05-20 21:50   ` Jiang, Yunhong
  2016-05-20 18:18 ` Marcelo Tosatti
  6 siblings, 2 replies; 29+ messages in thread
From: Jan Kiszka @ 2016-05-20  6:03 UTC (permalink / raw)
  To: Yunhong Jiang, kvm; +Cc: rkrcmar, pbonzini

On 2016-05-20 03:44, Yunhong Jiang wrote:
> The VMX-preemption timer is a feature on VMX, it counts down, from the
> value loaded by VM entry, in VMX nonroot operation. When the timer
> counts down to zero, it stops counting down and a VM exit occurs.
> 
> The VMX preemption timer for tsc deadline timer virtualization. The
> VMX preemption timer is armed when the vCPU is running, and a VMExit
> will happen if the virtual TSC deadline timer expires.
> 
> When the vCPU thread is scheduled out, the tsc deadline timer
> virtualization will be switched to use the current solution, i.e. use
> the timer for it. It's switched back to VMX preemption timer when the
> vCPU thread is scheduled int.
> 
> This solution replace the complex OS's hrtimer system, and also the
> host timer interrupt handling cost, with a preemption_timer VMexit. It
> fits well for some NFV usage scenario, when the vCPU is bound to a
> pCPU and the pCPU is isolated, or some similar scenarioes.
> 
> However, it possibly has impact if the vCPU thread is scheduled in/out
> very frequently, because it switches from/to the hrtimer emulation a
> lot. A module parameter is provided to turn it on or off.

IIRC, VMX preemption timer was broken on several older CPUs. That was
one reason we didn't use it to emulated nested preemption timer. Were
all those CPUs also not exposing the TCP deadline timer?

In any case, just checking for the feature bit like you do in patch 2
shouldn't be enough to catch those CPUs. Or were there microcode patches
distributed by now that translated the errata into bit removals?

Jan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 0/5] Utilizing VMX preemption for timer virtualization
  2016-05-20  6:03 ` [RFC PATCH 0/5] Utilizing VMX preemption for timer virtualization Jan Kiszka
@ 2016-05-20  9:41   ` Paolo Bonzini
  2016-05-20 21:50   ` Jiang, Yunhong
  1 sibling, 0 replies; 29+ messages in thread
From: Paolo Bonzini @ 2016-05-20  9:41 UTC (permalink / raw)
  To: Jan Kiszka, Yunhong Jiang, kvm; +Cc: rkrcmar



On 20/05/2016 08:03, Jan Kiszka wrote:
> IIRC, VMX preemption timer was broken on several older CPUs. That was
> one reason we didn't use it to emulated nested preemption timer. Were
> all those CPUs also not exposing the TCP deadline timer?
> 
> In any case, just checking for the feature bit like you do in patch 2
> shouldn't be enough to catch those CPUs. Or were there microcode patches
> distributed by now that translated the errata into bit removals?

Yes, we need to include the list of affected CPU steppings  (VirtualBox
has it).  Leaving this out is fine for an RFC, though.

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 2/5] Utilize the vmx preemption timer
  2016-05-20  1:45 ` [RFC PATCH 2/5] Utilize the vmx preemption timer Yunhong Jiang
@ 2016-05-20  9:45   ` Paolo Bonzini
  0 siblings, 0 replies; 29+ messages in thread
From: Paolo Bonzini @ 2016-05-20  9:45 UTC (permalink / raw)
  To: Yunhong Jiang, kvm; +Cc: rkrcmar



On 20/05/2016 03:45, Yunhong Jiang wrote:
> From: Yunhong Jiang <yunhong.jiang@gmail.com>
> 
> Adding the basic VMX preemption timer functionality, including checking
> if the feature is supported, setup/clean the VMX preemption timer. Also
> adds a parameter to state if the VMX preemption timer should be utilized.
> 
> Signed-off-by: Yunhong Jiang <yunhong.jiang@intel.com>
> ---
>  arch/x86/include/asm/kvm_host.h |  4 ++++
>  arch/x86/kvm/lapic.c            |  7 +++++++
>  arch/x86/kvm/vmx.c              | 45 ++++++++++++++++++++++++++++++++++++++++-
>  3 files changed, 55 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 5e6b3ce7748f..8e58db20b3a4 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1006,6 +1006,10 @@ struct kvm_x86_ops {
>  	int (*update_pi_irte)(struct kvm *kvm, unsigned int host_irq,
>  			      uint32_t guest_irq, bool set);
>  	void (*apicv_post_state_restore)(struct kvm_vcpu *vcpu);
> +
> +	int (*hw_emul_timer)(struct kvm_vcpu *vcpu);
> +	void (*set_hwemul_timer)(struct kvm_vcpu *vcpu, u64 tsc);
> +	void (*clear_hwemul_timer)(struct kvm_vcpu *vcpu);
>  };
>  
>  struct kvm_arch_async_pf {
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index bbb5b283ff63..8908ee514f6c 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -256,6 +256,13 @@ static inline int apic_lvtt_tscdeadline(struct kvm_lapic *apic)
>  	return apic->lapic_timer.timer_mode == APIC_LVT_TIMER_TSCDEADLINE;
>  }
>  
> +static inline int hw_emul_timer(struct kvm_lapic *apic)
> +{
> +	if (kvm_x86_ops->hw_emul_timer)
> +		return kvm_x86_ops->hw_emul_timer(apic->vcpu);
> +	return 0;
> +}
> +
>  static inline int apic_lvt_nmi_mode(u32 lvt_val)
>  {
>  	return (lvt_val & (APIC_MODE_MASK | APIC_LVT_MASKED)) == APIC_DM_NMI;
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 9e078ff29f86..5475a7699ee5 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -110,6 +110,9 @@ module_param_named(pml, enable_pml, bool, S_IRUGO);
>  
>  #define KVM_VMX_TSC_MULTIPLIER_MAX     0xffffffffffffffffULL
>  
> +static bool __read_mostly hwemul_timer;
> +module_param_named(hwemul_timer, hwemul_timer, bool, S_IRUGO);
> +
>  #define KVM_GUEST_CR0_MASK (X86_CR0_NW | X86_CR0_CD)
>  #define KVM_VM_CR0_ALWAYS_ON_UNRESTRICTED_GUEST (X86_CR0_WP | X86_CR0_NE)
>  #define KVM_VM_CR0_ALWAYS_ON						\
> @@ -1056,6 +1059,20 @@ static inline bool cpu_has_vmx_virtual_intr_delivery(void)
>  		SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY;
>  }
>  
> +static inline bool cpu_has_preemption_timer(void)
> +{
> +	return vmcs_config.pin_based_exec_ctrl &
> +		PIN_BASED_VMX_PREEMPTION_TIMER;
> +}
> +
> +static inline int cpu_preemption_timer_multi(void)
> +{
> +	u64 vmx_msr;
> +
> +	rdmsrl(MSR_IA32_VMX_MISC, vmx_msr);
> +	return vmx_msr & VMX_MISC_PREEMPTION_TIMER_RATE_MASK;
> +}
> +
>  static inline bool cpu_has_vmx_posted_intr(void)
>  {
>  	return IS_ENABLED(CONFIG_X86_LOCAL_APIC) &&
> @@ -3306,7 +3323,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
>  		return -EIO;
>  
>  	min = PIN_BASED_EXT_INTR_MASK | PIN_BASED_NMI_EXITING;
> -	opt = PIN_BASED_VIRTUAL_NMIS | PIN_BASED_POSTED_INTR;
> +	opt = PIN_BASED_VIRTUAL_NMIS | PIN_BASED_POSTED_INTR |
> +		 PIN_BASED_VMX_PREEMPTION_TIMER;
>  	if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_PINBASED_CTLS,
>  				&_pin_based_exec_control) < 0)
>  		return -EIO;
> @@ -4779,6 +4797,8 @@ static u32 vmx_pin_based_exec_ctrl(struct vcpu_vmx *vmx)
>  
>  	if (!kvm_vcpu_apicv_active(&vmx->vcpu))
>  		pin_based_exec_ctrl &= ~PIN_BASED_POSTED_INTR;
> +	/* Enable the preemption timer dynamically */
> +	pin_based_exec_ctrl &= ~PIN_BASED_VMX_PREEMPTION_TIMER;
>  	return pin_based_exec_ctrl;
>  }
>  
> @@ -10650,6 +10670,25 @@ static int vmx_check_intercept(struct kvm_vcpu *vcpu,
>  	return X86EMUL_CONTINUE;
>  }
>  
> +static int vmx_hwemul_timer(struct kvm_vcpu *vcpu)
> +{
> +	return hwemul_timer && cpu_has_preemption_timer();
> +}

Please clear the vmx_x86_ops members instead if the preemption timer is
not usable.  Then you can check kvm_x86_ops->set_hwemul_timer and
kvm_x86_ops->clear_hwemul_timer instead of calling this function.

For what it's worth, I prefer "vmx_{set,cancel}_hv_timer" instead.

> +static void vmx_set_hwemul_timer(struct kvm_vcpu *vcpu, u64 target_tsc)

This is not a target_tsc, it is a delta_tsc.

> +{
> +	vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
> +			target_tsc >> cpu_preemption_timer_multi());

Please cache the value of cpu_preemption_timer_multi(); rdmsr is slow.

Thanks,

Paolo

> +	vmcs_set_bits(PIN_BASED_VM_EXEC_CONTROL,
> +			PIN_BASED_VMX_PREEMPTION_TIMER);
> +}
> +
> +static void vmx_clear_hwemul_timer(struct kvm_vcpu *vcpu)
> +{
> +	vmcs_clear_bits(PIN_BASED_VM_EXEC_CONTROL,
> +			PIN_BASED_VMX_PREEMPTION_TIMER);
> +}
> +
>  static void vmx_sched_in(struct kvm_vcpu *vcpu, int cpu)
>  {
>  	if (ple_gap)
> @@ -11018,6 +11057,10 @@ static struct kvm_x86_ops vmx_x86_ops = {
>  	.pmu_ops = &intel_pmu_ops,
>  
>  	.update_pi_irte = vmx_update_pi_irte,
> +
> +	.hw_emul_timer = vmx_hwemul_timer,
> +	.set_hwemul_timer = vmx_set_hwemul_timer,
> +	.clear_hwemul_timer = vmx_clear_hwemul_timer,
>  };
>  
>  static int __init vmx_init(void)
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 3/5] Separate the start_sw_tscdeadline
  2016-05-20  1:45 ` [RFC PATCH 3/5] Separate the start_sw_tscdeadline Yunhong Jiang
@ 2016-05-20 10:16   ` Paolo Bonzini
  0 siblings, 0 replies; 29+ messages in thread
From: Paolo Bonzini @ 2016-05-20 10:16 UTC (permalink / raw)
  To: Yunhong Jiang, kvm; +Cc: rkrcmar



On 20/05/2016 03:45, Yunhong Jiang wrote:
> +static void start_sw_tscdeadline(struct kvm_lapic *apic)

Please rename to start_tscdeadline_hrtimer.

Paolo

> +{
> +	u64 guest_tsc, tscdeadline = apic->lapic_timer.tscdeadline;
> +	u64 ns = 0;
> +	ktime_t expire;
> +	struct kvm_vcpu *vcpu = apic->vcpu;
> +	unsigned long this_tsc_khz = vcpu->arch.virtual_tsc_khz;
> +	unsigned long flags;
> +	ktime_t now;
> +
> +	if (unlikely(!tscdeadline || !this_tsc_khz))
> +		return;
> +
> +	local_irq_save(flags);
> +
> +	now = apic->lapic_timer.timer.base->get_time();
> +	guest_tsc = kvm_read_l1_tsc(vcpu, rdtsc());
> +	if (likely(tscdeadline > guest_tsc)) {
> +		ns = (tscdeadline - guest_tsc) * 1000000ULL;
> +		do_div(ns, this_tsc_khz);
> +		expire = ktime_add_ns(now, ns);
> +		expire = ktime_sub_ns(expire, lapic_timer_advance_ns);
> +		hrtimer_start(&apic->lapic_timer.timer,
> +				expire, HRTIMER_MODE_ABS_PINNED);
> +	} else
> +		apic_timer_expired(apic);

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 5/5] Adding trace for the hwemul_timer
  2016-05-20  1:45 ` [RFC PATCH 5/5] Adding trace for the hwemul_timer Yunhong Jiang
@ 2016-05-20 10:28   ` Paolo Bonzini
  0 siblings, 0 replies; 29+ messages in thread
From: Paolo Bonzini @ 2016-05-20 10:28 UTC (permalink / raw)
  To: Yunhong Jiang, kvm; +Cc: rkrcmar



On 20/05/2016 03:45, Yunhong Jiang wrote:
> From: Yunhong Jiang <yunhong.jiang@gmail.com>
> 
> Adding some tracepoint to trace the hwemul_timer, like switching between
> the software method and hwemul method.
> 
> Signed-off-by: Yunhong Jiang <yunhong.jiang@intel.com>
> ---
>  arch/x86/kvm/lapic.c |  5 +++++
>  arch/x86/kvm/trace.h | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
>  arch/x86/kvm/vmx.c   |  2 ++
>  arch/x86/kvm/x86.c   |  1 +
>  4 files changed, 56 insertions(+)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index c9e32bf1a613..de7e20ac11f5 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -1416,6 +1416,7 @@ void switch_to_hw_lapic_timer(struct kvm_vcpu *vcpu)
>  
>  	if (apic->lapic_timer.hw_emulation)
>  		return;
> +	trace_kvm_hw_emul_sched_in(apic->lapic_timer.hw_emulation);
>  
>  	if (apic_lvtt_tscdeadline(apic) &&
>  	    !atomic_read(&apic->lapic_timer.pending)) {
> @@ -1434,6 +1435,8 @@ void switch_to_sw_lapic_timer(struct kvm_vcpu *vcpu)
>  	if (!apic->lapic_timer.hw_emulation)
>  		return;
>  
> +	trace_kvm_hw_emul_sched_out(apic->lapic_timer.hw_emulation);
> +
>  	if (apic->lapic_timer.hw_emulation == HWEMUL_INJECTED)
>  		kvm_x86_ops->clear_hwemul_timer(vcpu);
>  	apic->lapic_timer.hw_emulation = 0;
> @@ -1495,6 +1498,8 @@ int inject_pending_hwemul_timer(struct kvm_vcpu *vcpu)
>  
>  		kvm_x86_ops->set_hwemul_timer(vcpu, hwemultsc);
>  		apic->lapic_timer.hw_emulation = HWEMUL_INJECTED;
> +		trace_kvm_hw_emul_entry(vcpu->vcpu_id,
> +				apic->lapic_timer.hw_emulation);
>  		return 1;
>  	}
>  
> diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
> index 8de925031b5c..b937f6055b48 100644
> --- a/arch/x86/kvm/trace.h
> +++ b/arch/x86/kvm/trace.h
> @@ -1348,6 +1348,54 @@ TRACE_EVENT(kvm_avic_unaccelerated_access,
>  		  __entry->vec)
>  );
>  
> +TRACE_EVENT(kvm_hw_emul_sched_in,
> +       TP_PROTO(unsigned int prev_value),
> +       TP_ARGS(prev_value),
> +       TP_STRUCT__entry(
> +               __field(        unsigned int, prev_value)
> +       ),
> +       TP_fast_assign(
> +               __entry->prev_value = prev_value;
> +       ),
> +       TP_printk("prev hw_emu value %x\n", __entry->prev_value)
> +);
> +
> +TRACE_EVENT(kvm_hw_emul_sched_out,
> +       TP_PROTO(unsigned int prev_value),
> +       TP_ARGS(prev_value),
> +       TP_STRUCT__entry(
> +               __field(        unsigned int, prev_value)
> +       ),
> +       TP_fast_assign(
> +               __entry->prev_value = prev_value;
> +       ),
> +       TP_printk("prev hw_emu value %x\n", __entry->prev_value)
> +);
> +
> +TRACE_EVENT(kvm_hw_emul_state,
> +       TP_PROTO(unsigned int vcpu_id, bool exit, unsigned int hw_emulated),
> +       TP_ARGS(vcpu_id, exit, hw_emulated),
> +       TP_STRUCT__entry(
> +               __field(        unsigned int, vcpu_id)
> +               __field(        bool, exit)
> +               __field(        unsigned int, hw_emulated)
> +       ),
> +       TP_fast_assign(
> +               __entry->vcpu_id = vcpu_id;
> +               __entry->exit = exit;
> +               __entry->hw_emulated = hw_emulated;
> +       ),
> +       TP_printk("vcpu_id %x %s preemption timer %x\n",
> +		 __entry->vcpu_id,
> +		 __entry->exit ? "exit" : "entry",
> +		 __entry->hw_emulated)
> +);
> +#define trace_kvm_hw_emul_entry(vcpu_id, hw_emulated) \
> +	trace_kvm_hw_emul_state(vcpu_id, false, hw_emulated)
> +
> +#define trace_kvm_hw_emul_exit(vcpu_id, hw_emulated) \
> +	trace_kvm_hw_emul_state(vcpu_id, true, hw_emulated)
> +
>  #endif /* _TRACE_KVM_H */
>  
>  #undef TRACE_INCLUDE_PATH
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index f3659ab45b30..8e9c9d845a35 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -7576,6 +7576,8 @@ static int handle_preemption_timer(struct kvm_vcpu *vcpu)
>  {
>  	struct kvm_lapic *apic = vcpu->arch.apic;
>  
> +	trace_kvm_hw_emul_exit(vcpu->vcpu_id, apic->lapic_timer.hw_emulation);

I think this is not necessary.  However, I would like a tracepoint where
the preemption timer is set.

Also, please add the tracepoints together with the function.

Thanks,

Paolo

>  	if (apic->lapic_timer.hw_emulation != HWEMUL_INJECTED)
>  		printk(KERN_WARNING "Preemption timer w/o hwemulation\n");
>  
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index a613bcfda59a..3aff9ac06733 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -8449,3 +8449,4 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_pml_full);
>  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_pi_irte_update);
>  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_unaccelerated_access);
>  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_incomplete_ipi);
> +EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_hw_emul_state);
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 4/5] Utilize the vmx preemption timer for tsc deadline timer
  2016-05-20  1:45 ` [RFC PATCH 4/5] Utilize the vmx preemption timer for tsc deadline timer Yunhong Jiang
@ 2016-05-20 10:34   ` Paolo Bonzini
  2016-05-20 22:06     ` Jiang, Yunhong
  0 siblings, 1 reply; 29+ messages in thread
From: Paolo Bonzini @ 2016-05-20 10:34 UTC (permalink / raw)
  To: Yunhong Jiang, kvm; +Cc: rkrcmar



On 20/05/2016 03:45, Yunhong Jiang wrote:
> From: Yunhong Jiang <yunhong.jiang@gmail.com>
> 
> Utilizing the VMX preemption timer for tsc deadline timer
> virtualization. The VMX preemption timer is armed when the vCPU is
> running, and a VMExit will happen if the virtual TSC deadline timer
> expires.
> 
> When the vCPU thread is scheduled out, the tsc deadline timer
> virtualization will be switched to use the current solution, i.e. use
> the timer for it. It's switched back to VMX preemption timer when the
> vCPU thread is scheduled int.
> 
> This solution avoids the complex OS's hrtimer system, and also the host
> timer interrupt handling cost, with a preemption_timer VMexit. It fits
> well for some NFV usage scenario, when the vCPU is bound to a pCPU and
> the pCPU is isolated, or some similar scenario.
> 
> However, it possibly has impact if the vCPU thread is scheduled in/out
> very frequently, because it switches from/to the hrtimer emulation a lot.
> 
> Signed-off-by: Yunhong Jiang <yunhong.jiang@intel.com>
> ---
>  arch/x86/kvm/lapic.c | 108 +++++++++++++++++++++++++++++++++++++++++++++++++--
>  arch/x86/kvm/lapic.h |  10 +++++
>  arch/x86/kvm/vmx.c   |  26 +++++++++++++
>  arch/x86/kvm/x86.c   |   6 +++
>  4 files changed, 147 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 5776473be362..a613bcfda59a 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -6608,6 +6608,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  
>  	local_irq_disable();
>  
> +	inject_expired_hwemul_timer(vcpu);

Is this really fast enough (and does it trigger often enough) that it is
worth slowing down all vmenters?

I'd rather call inject_expired_hwemul_timer from the preemption timer
vmexit handler instead.  inject_pending_hwemul_timer will set the
preemption timer countdown to zero if the deadline of the guest LAPIC
timer has passed already.  This should be relatively rare.

This is the only major change I would like to see.  Everything else is
more cleanup and cosmetics.

>  	if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests
>  	    || need_resched() || signal_pending(current)) {
>  		vcpu->mode = OUTSIDE_GUEST_MODE;
> @@ -6773,6 +6775,10 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
>  			break;
>  
>  		clear_bit(KVM_REQ_PENDING_TIMER, &vcpu->requests);
> +
> +		/* Inject the hwemul timer if expired to avoid one VMExit */
> +		inject_expired_hwemul_timer(vcpu);

Same as above.

>  		if (kvm_cpu_has_pending_timer(vcpu))
>  			kvm_inject_pending_timer_irqs(vcpu);
>  
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 12c416929d9c..c9e32bf1a613 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -1320,7 +1320,7 @@ void wait_lapic_expire(struct kvm_vcpu *vcpu)
>  		__delay(tsc_deadline - guest_tsc);
>  }
>  
> -static void start_sw_tscdeadline(struct kvm_lapic *apic)
> +static void start_sw_tscdeadline(struct kvm_lapic *apic, int no_expire)
>  {
>  	u64 guest_tsc, tscdeadline = apic->lapic_timer.tscdeadline;
>  	u64 ns = 0;
> @@ -1337,7 +1337,8 @@ static void start_sw_tscdeadline(struct kvm_lapic *apic)
>  
>  	now = apic->lapic_timer.timer.base->get_time();
>  	guest_tsc = kvm_read_l1_tsc(vcpu, rdtsc());
> -	if (likely(tscdeadline > guest_tsc)) {
> +	/* Not trigger the apic_timer if invoked from sched_out */

This comment is not necessary here.  Rather, you should explain the
deadlock better in switch_to_sw_lapic_timer.

> +	if (no_expire || likely(tscdeadline > guest_tsc)) {
>  		ns = (tscdeadline - guest_tsc) * 1000000ULL;
>  		do_div(ns, this_tsc_khz);
>  		expire = ktime_add_ns(now, ns);
> @@ -1396,9 +1397,110 @@ static void start_apic_timer(struct kvm_lapic *apic)
>  			   ktime_to_ns(ktime_add_ns(now,
>  					apic->lapic_timer.period)));
>  	} else if (apic_lvtt_tscdeadline(apic)) {
> -		start_sw_tscdeadline(apic);
> +		/* lapic timer in tsc deadline mode */
> +		if (hw_emul_timer(apic)) {
> +			if (unlikely(!apic->lapic_timer.tscdeadline ||
> +					!apic->vcpu->arch.virtual_tsc_khz))
> +				return;
> +
> +			/* Expired timer will be checked on vcpu_run() */
> +			apic->lapic_timer.hw_emulation = HWEMUL_ENABLED;
> +		} else
> +			start_sw_tscdeadline(apic, 0);
> +	}
> +}
> +
> +void switch_to_hw_lapic_timer(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm_lapic *apic = vcpu->arch.apic;
> +
> +	if (apic->lapic_timer.hw_emulation)
> +		return;

This "if" never triggers.  Please change it to a WARN_ON?

> +	if (apic_lvtt_tscdeadline(apic) &&
> +	    !atomic_read(&apic->lapic_timer.pending)) {
> +		hrtimer_cancel(&apic->lapic_timer.timer);
> +		/* In case the timer triggered in above small window */
> +		if (!atomic_read(&apic->lapic_timer.pending))
> +			apic->lapic_timer.hw_emulation = HWEMUL_ENABLED;
> +	}
> +}
> +EXPORT_SYMBOL_GPL(switch_to_hw_lapic_timer);
> +
> +void switch_to_sw_lapic_timer(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm_lapic *apic = vcpu->arch.apic;
> +
> +	if (!apic->lapic_timer.hw_emulation)
> +		return;

This "if" never triggers.  Please change it to a WARN_ON.

> +	if (apic->lapic_timer.hw_emulation == HWEMUL_INJECTED)
> +		kvm_x86_ops->clear_hwemul_timer(vcpu);
> +	apic->lapic_timer.hw_emulation = 0;
> +	if (atomic_read(&apic->lapic_timer.pending))
> +		return;
> +
> +	/* Don't trigger the apic_timer_expired() for deadlock */

Can you explain this better?  For example:

	/*
	 * Calling apic_timer_expired() from here results in a deadlock,
	 * because...
	 */

> +	start_sw_tscdeadline(apic, 1);
> +}
> +EXPORT_SYMBOL_GPL(switch_to_sw_lapic_timer);
> +
> +/*
> + * Check the hwemul timer status.
> + * -1: hwemul timer is not enabled
> + * >0: hwemul timer it not expired yet, the return is the delta tsc
> + *  0: hwemul timer expired already
> + */
> +int check_apic_hwemul_timer(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm_lapic *apic = vcpu->arch.apic;
> +
> +	if (apic->lapic_timer.hw_emulation) {
> +		u64 tscdeadline = apic->lapic_timer.tscdeadline;
> +		u64 guest_tsc = kvm_read_l1_tsc(vcpu, rdtsc());
> +
> +		if (tscdeadline <= guest_tsc)
> +			return 0;
> +		else
> +			return (tscdeadline - guest_tsc);
> +	}
> +	return -1;
> +}
> +
> +int inject_expired_hwemul_timer(struct kvm_vcpu *vcpu)
> +{
> +	if (!check_apic_hwemul_timer(vcpu)) {
> +		struct kvm_lapic *apic = vcpu->arch.apic;
> +
> +		if (apic->lapic_timer.hw_emulation == HWEMUL_INJECTED)
> +			kvm_x86_ops->clear_hwemul_timer(vcpu);

If you call this function from the vmexit handler, you can do the
clearing unconditionally.

Also if you call this function from the vmexit handler you _know_ that
apic->lapic_timer.hw_emulation will be HWEMUL_INJECTED, and you can then
simplify this function to just:

	WARN_ON(apic->lapic_timer.hw_emulation != HWEMUL_INJECTED);
	WARN_ON(swait_active(&vcpu->wq));
	kvm_x86_ops->clear_hwemul_timer(vcpu);
	apic->lapic_timer.hw_emulation = 0;
	apic_timer_expired(apic);

> +		apic->lapic_timer.hw_emulation = 0;
> +		atomic_inc(&apic->lapic_timer.pending);
> +		kvm_set_pending_timer(vcpu);
> +		return 1;
>  	}
> +
> +	return 0;
> +}
> +
> +int inject_pending_hwemul_timer(struct kvm_vcpu *vcpu)
> +{
> +	u64 hwemultsc;
> +
> +	hwemultsc = check_apic_hwemul_timer(vcpu);
> +	/* Just before vmentry, so inject even if expired */
> +	if (hwemultsc >= 0) {

If you follow the suggestion above, this becomes the only caller of
check_apic_hwemul_timer.  The function can be:

	if (apic->lapic_timer.hw_emulation == HWEMUL_NOT_USED)
		return;

	... compute delta...

	kvm_x86_ops->set_hwemul_timer(vcpu, tsc_delta);
	apic->lapic_timer.hw_emulation = HWEMUL_INJECTED;

> +		struct kvm_lapic *apic = vcpu->arch.apic;
> +
> +		kvm_x86_ops->set_hwemul_timer(vcpu, hwemultsc);
> +		apic->lapic_timer.hw_emulation = HWEMUL_INJECTED;
> +		return 1;
> +	}
> +
> +	return 0;
>  }
> +EXPORT_SYMBOL_GPL(inject_pending_hwemul_timer);
>  
>  static void apic_manage_nmi_watchdog(struct kvm_lapic *apic, u32 lvt0_val)
>  {
> diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
> index 891c6da7d4aa..5037d7bf609a 100644
> --- a/arch/x86/kvm/lapic.h
> +++ b/arch/x86/kvm/lapic.h
> @@ -12,6 +12,10 @@
>  #define KVM_APIC_SHORT_MASK	0xc0000
>  #define KVM_APIC_DEST_MASK	0x800
>  
> +#define HWEMUL_ENABLED		1
> +/* The VMCS has been set for the vmx preemption timer */
> +#define HWEMUL_INJECTED		2

Please define an enum with all three values:

enum {
    HV_TIMER_NOT_USED,
    HV_TIMER_NEEDS_ARMING,
    HV_TIMER_ARMED
};

and check for "!= HV_TIMER_NOT_USED" rather than simply "!= 0" or
"!apic->timer.hw_emulation".

>  struct kvm_timer {
>  	struct hrtimer timer;
>  	s64 period; 				/* unit: ns */
> @@ -20,6 +24,7 @@ struct kvm_timer {
>  	u64 tscdeadline;
>  	u64 expired_tscdeadline;
>  	atomic_t pending;			/* accumulated triggered timers */
> +	int hw_emulation;

Please rename to something like "hv_timer_state".

>  };
>  
>  struct kvm_lapic {
> @@ -212,4 +217,9 @@ bool kvm_intr_is_single_vcpu_fast(struct kvm *kvm, struct kvm_lapic_irq *irq,
>  			struct kvm_vcpu **dest_vcpu);
>  int kvm_vector_to_index(u32 vector, u32 dest_vcpus,
>  			const unsigned long *bitmap, u32 bitmap_size);
> +void switch_to_sw_lapic_timer(struct kvm_vcpu *vcpu);
> +void switch_to_hw_lapic_timer(struct kvm_vcpu *vcpu);

Please be consistent in the naming; you're using both "hwemul" and "hw".
 I'd prefer to use "hv", though "hw" is okay too (and then you should.

> +int check_apic_hwemul_timer(struct kvm_vcpu *vcpu);

check_apic_hwemul_timer can be static.  (Also please rename to something
like kvm_lapic_get_tsc_delta).

> +int inject_expired_hwemul_timer(struct kvm_vcpu *vcpu);

Please rename to something like kvm_lapic_check_timer.

Also, do not return int if you don't use the return value.

> +int inject_pending_hwemul_timer(struct kvm_vcpu *vcpu);

Please rename to something like kvm_lapic_arm_hv_timer.

Also, do not return int if you don't use the return value.

> +static int handle_preemption_timer(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm_lapic *apic = vcpu->arch.apic;
> +
> +	if (apic->lapic_timer.hw_emulation != HWEMUL_INJECTED)
> +		printk(KERN_WARNING "Preemption timer w/o hwemulation\n");
> +
> +	if (!atomic_read(&apic->lapic_timer.pending)) {
> +		atomic_inc(&apic->lapic_timer.pending);
> +		kvm_make_request(KVM_REQ_PENDING_TIMER, vcpu);
> +	}
> +
> +	apic->lapic_timer.hw_emulation = 0;
> +	vmcs_clear_bits(PIN_BASED_VM_EXEC_CONTROL,
> +			PIN_BASED_VMX_PREEMPTION_TIMER);

Please just call inject_expired_hwemul_timer.  Using vcpu->arch.apic
from here violates the abstraction.

> +	return 1;
> +}
>  /*
>   * The exit handlers return 1 if the exit was handled fully and guest execution
>   * may resume.  Otherwise they set the kvm_run parameter to indicate what needs
> @@ -7623,6 +7640,7 @@ static int (*const kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
>  	[EXIT_REASON_XRSTORS]                 = handle_xrstors,
>  	[EXIT_REASON_PML_FULL]		      = handle_pml_full,
>  	[EXIT_REASON_PCOMMIT]                 = handle_pcommit,
> +	[EXIT_REASON_PREEMPTION_TIMER]	      = handle_preemption_timer,
>  };
>  
>  static const int kvm_vmx_max_exit_handlers =
> @@ -8674,6 +8692,8 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
>  	if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP)
>  		vmx_set_interrupt_shadow(vcpu, 0);
>  
> +	inject_pending_hwemul_timer(vcpu);
> +
>  	if (vmx->guest_pkru_valid)
>  		__write_pkru(vmx->guest_pkru);
>  
> @@ -10693,10 +10713,16 @@ static void vmx_sched_in(struct kvm_vcpu *vcpu, int cpu)
>  {
>  	if (ple_gap)
>  		shrink_ple_window(vcpu);
> +	if (vmx_hwemul_timer(vcpu))
> +		switch_to_hw_lapic_timer(vcpu);

Please call this from kvm_arch_sched_in (checking
kvm_x86_ops->set_hwemul_timer for non-NULL).

>  }
>  
>  static void vmx_sched_out(struct kvm_vcpu *vcpu)
>  {
> +	struct kvm_lapic *apic = vcpu->arch.apic;
> +
> +	if (apic->lapic_timer.hw_emulation)
> +		switch_to_sw_lapic_timer(vcpu);

Please move this "if" to kvm_arch_sched_out instead of adding the
kvm_x86_ops member.

Thanks,

Paolo

>  }
>  
>  static void vmx_slot_enable_log_dirty(struct kvm *kvm,

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 0/5] Utilizing VMX preemption for timer virtualization
  2016-05-20  1:44 [RFC PATCH 0/5] Utilizing VMX preemption for timer virtualization Yunhong Jiang
                   ` (5 preceding siblings ...)
  2016-05-20  6:03 ` [RFC PATCH 0/5] Utilizing VMX preemption for timer virtualization Jan Kiszka
@ 2016-05-20 18:18 ` Marcelo Tosatti
  2016-05-20 18:21   ` Marcelo Tosatti
                     ` (2 more replies)
  6 siblings, 3 replies; 29+ messages in thread
From: Marcelo Tosatti @ 2016-05-20 18:18 UTC (permalink / raw)
  To: Yunhong Jiang; +Cc: kvm, rkrcmar, pbonzini

On Thu, May 19, 2016 at 06:44:58PM -0700, Yunhong Jiang wrote:
> The VMX-preemption timer is a feature on VMX, it counts down, from the
> value loaded by VM entry, in VMX nonroot operation. When the timer
> counts down to zero, it stops counting down and a VM exit occurs.
> 
> The VMX preemption timer for tsc deadline timer virtualization. The
> VMX preemption timer is armed when the vCPU is running, and a VMExit
> will happen if the virtual TSC deadline timer expires.
> 
> When the vCPU thread is scheduled out, the tsc deadline timer
> virtualization will be switched to use the current solution, i.e. use
> the timer for it. It's switched back to VMX preemption timer when the
> vCPU thread is scheduled int.
> 
> This solution replace the complex OS's hrtimer system, and also the
> host timer interrupt handling cost, with a preemption_timer VMexit. It
> fits well for some NFV usage scenario, when the vCPU is bound to a
> pCPU and the pCPU is isolated, or some similar scenarioes.
> 
> However, it possibly has impact if the vCPU thread is scheduled in/out
> very frequently, because it switches from/to the hrtimer emulation a
> lot. A module parameter is provided to turn it on or off.
> 
> Signed-off-by: Yunhong Jiang <yunhong.jiang@intel.com>

Hi Yunhong Jiang,

This adds cost to the VM-exit and VM-entry paths (additional
instructions and i-cache pressure). Also it adds cost to
kvm_sched_out.

What is the benefit the switch from external interrupt to VMX preemption 
timer brings?


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 0/5] Utilizing VMX preemption for timer virtualization
  2016-05-20 18:18 ` Marcelo Tosatti
@ 2016-05-20 18:21   ` Marcelo Tosatti
  2016-05-20 20:49   ` Paolo Bonzini
  2016-05-20 22:18   ` Jiang, Yunhong
  2 siblings, 0 replies; 29+ messages in thread
From: Marcelo Tosatti @ 2016-05-20 18:21 UTC (permalink / raw)
  To: Yunhong Jiang; +Cc: kvm, rkrcmar, pbonzini

On Fri, May 20, 2016 at 03:18:30PM -0300, Marcelo Tosatti wrote:
> On Thu, May 19, 2016 at 06:44:58PM -0700, Yunhong Jiang wrote:
> > The VMX-preemption timer is a feature on VMX, it counts down, from the
> > value loaded by VM entry, in VMX nonroot operation. When the timer
> > counts down to zero, it stops counting down and a VM exit occurs.
> > 
> > The VMX preemption timer for tsc deadline timer virtualization. The
> > VMX preemption timer is armed when the vCPU is running, and a VMExit
> > will happen if the virtual TSC deadline timer expires.
> > 
> > When the vCPU thread is scheduled out, the tsc deadline timer
> > virtualization will be switched to use the current solution, i.e. use
> > the timer for it. It's switched back to VMX preemption timer when the
> > vCPU thread is scheduled int.
> > 
> > This solution replace the complex OS's hrtimer system, and also the
> > host timer interrupt handling cost, with a preemption_timer VMexit. It
> > fits well for some NFV usage scenario, when the vCPU is bound to a
> > pCPU and the pCPU is isolated, or some similar scenarioes.
> > 
> > However, it possibly has impact if the vCPU thread is scheduled in/out
> > very frequently, because it switches from/to the hrtimer emulation a
> > lot. A module parameter is provided to turn it on or off.
> > 
> > Signed-off-by: Yunhong Jiang <yunhong.jiang@intel.com>
> 
> Hi Yunhong Jiang,
> 
> This adds cost to the VM-exit and VM-entry paths (additional
> instructions and i-cache pressure). Also it adds cost to
> kvm_sched_out.
> 
> What is the benefit the switch from external interrupt to VMX preemption 
> timer brings?

Do you have numbers for the improved interrupt latency in the guest ?


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 0/5] Utilizing VMX preemption for timer virtualization
  2016-05-20 18:18 ` Marcelo Tosatti
  2016-05-20 18:21   ` Marcelo Tosatti
@ 2016-05-20 20:49   ` Paolo Bonzini
  2016-05-20 22:27     ` Jiang, Yunhong
  2016-05-20 22:18   ` Jiang, Yunhong
  2 siblings, 1 reply; 29+ messages in thread
From: Paolo Bonzini @ 2016-05-20 20:49 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Yunhong Jiang, kvm, rkrcmar

> On Thu, May 19, 2016 at 06:44:58PM -0700, Yunhong Jiang wrote:
> > The VMX-preemption timer is a feature on VMX, it counts down, from the
> > value loaded by VM entry, in VMX nonroot operation. When the timer
> > counts down to zero, it stops counting down and a VM exit occurs.
> > 
> > The VMX preemption timer for tsc deadline timer virtualization. The
> > VMX preemption timer is armed when the vCPU is running, and a VMExit
> > will happen if the virtual TSC deadline timer expires.
> > 
> > When the vCPU thread is scheduled out, the tsc deadline timer
> > virtualization will be switched to use the current solution, i.e. use
> > the timer for it. It's switched back to VMX preemption timer when the
> > vCPU thread is scheduled int.
> > 
> > This solution replace the complex OS's hrtimer system, and also the
> > host timer interrupt handling cost, with a preemption_timer VMexit. It
> > fits well for some NFV usage scenario, when the vCPU is bound to a
> > pCPU and the pCPU is isolated, or some similar scenarioes.
> > 
> > However, it possibly has impact if the vCPU thread is scheduled in/out
> > very frequently, because it switches from/to the hrtimer emulation a
> > lot. A module parameter is provided to turn it on or off.
> > 
> > Signed-off-by: Yunhong Jiang <yunhong.jiang@intel.com>
> 
> Hi Yunhong Jiang,
> 
> This adds cost to the VM-exit and VM-entry paths (additional
> instructions and i-cache pressure). Also it adds cost to
> kvm_sched_out.

For now my review limited itself to making the code nicer without
touching the overall design too much.

I'm confident that we can reduce it to a dozen instructions on vmentry
and only pay the cost of hrtimer_start on failed HLT polls (enabling
the hrtimer only before going to sleep).  Assuming that device
interrupts are delivered while the guest is running or during a
successful HLT polls, that would be very rare.

> What is the benefit the switch from external interrupt to VMX preemption
> timer brings?

hrtimer_start/hrtimer_cancel do show up in profiles for dynticks guests.
Since they touch a red-black tree, it shouldn't be hard to outperform them.

Paolo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [RFC PATCH 0/5] Utilizing VMX preemption for timer virtualization
  2016-05-20  6:03 ` [RFC PATCH 0/5] Utilizing VMX preemption for timer virtualization Jan Kiszka
  2016-05-20  9:41   ` Paolo Bonzini
@ 2016-05-20 21:50   ` Jiang, Yunhong
  1 sibling, 0 replies; 29+ messages in thread
From: Jiang, Yunhong @ 2016-05-20 21:50 UTC (permalink / raw)
  To: Jan Kiszka, Yunhong Jiang, kvm; +Cc: rkrcmar, pbonzini



> -----Original Message-----
> From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On
> Behalf Of Jan Kiszka
> Sent: Thursday, May 19, 2016 11:04 PM
> To: Yunhong Jiang <yunhong.jiang@linux.intel.com>; kvm@vger.kernel.org
> Cc: rkrcmar@redhat.com; pbonzini@redhat.com
> Subject: Re: [RFC PATCH 0/5] Utilizing VMX preemption for timer
> virtualization
> 
> On 2016-05-20 03:44, Yunhong Jiang wrote:
> > The VMX-preemption timer is a feature on VMX, it counts down, from the
> > value loaded by VM entry, in VMX nonroot operation. When the timer
> > counts down to zero, it stops counting down and a VM exit occurs.
> >
> > The VMX preemption timer for tsc deadline timer virtualization. The
> > VMX preemption timer is armed when the vCPU is running, and a VMExit
> > will happen if the virtual TSC deadline timer expires.
> >
> > When the vCPU thread is scheduled out, the tsc deadline timer
> > virtualization will be switched to use the current solution, i.e. use
> > the timer for it. It's switched back to VMX preemption timer when the
> > vCPU thread is scheduled int.
> >
> > This solution replace the complex OS's hrtimer system, and also the
> > host timer interrupt handling cost, with a preemption_timer VMexit. It
> > fits well for some NFV usage scenario, when the vCPU is bound to a
> > pCPU and the pCPU is isolated, or some similar scenarioes.
> >
> > However, it possibly has impact if the vCPU thread is scheduled in/out
> > very frequently, because it switches from/to the hrtimer emulation a
> > lot. A module parameter is provided to turn it on or off.
> 
> IIRC, VMX preemption timer was broken on several older CPUs. That was
> one reason we didn't use it to emulated nested preemption timer. Were
> all those CPUs also not exposing the TCP deadline timer?
> 
> In any case, just checking for the feature bit like you do in patch 2
> shouldn't be enough to catch those CPUs. Or were there microcode patches
> distributed by now that translated the errata into bit removals?

I didn't know about this and will check internally.

Replying on outlook now because linux side is broken.
Sorry if anything wrong on the format.

--jyh

> 
> Jan
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [RFC PATCH 4/5] Utilize the vmx preemption timer for tsc deadline timer
  2016-05-20 10:34   ` Paolo Bonzini
@ 2016-05-20 22:06     ` Jiang, Yunhong
  2016-05-21 12:38       ` Paolo Bonzini
  2016-05-22  0:21       ` Wanpeng Li
  0 siblings, 2 replies; 29+ messages in thread
From: Jiang, Yunhong @ 2016-05-20 22:06 UTC (permalink / raw)
  To: Paolo Bonzini, Yunhong Jiang, kvm; +Cc: rkrcmar



> -----Original Message-----
> From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On
> Behalf Of Paolo Bonzini
> Sent: Friday, May 20, 2016 3:34 AM
> To: Yunhong Jiang <yunhong.jiang@linux.intel.com>; kvm@vger.kernel.org
> Cc: rkrcmar@redhat.com
> Subject: Re: [RFC PATCH 4/5] Utilize the vmx preemption timer for tsc
> deadline timer
> 
> 
> 
> On 20/05/2016 03:45, Yunhong Jiang wrote:
> > From: Yunhong Jiang <yunhong.jiang@gmail.com>
> >
> > Utilizing the VMX preemption timer for tsc deadline timer
> > virtualization. The VMX preemption timer is armed when the vCPU is
> > running, and a VMExit will happen if the virtual TSC deadline timer
> > expires.
> >
> > When the vCPU thread is scheduled out, the tsc deadline timer
> > virtualization will be switched to use the current solution, i.e. use
> > the timer for it. It's switched back to VMX preemption timer when the
> > vCPU thread is scheduled int.
> >
> > This solution avoids the complex OS's hrtimer system, and also the host
> > timer interrupt handling cost, with a preemption_timer VMexit. It fits
> > well for some NFV usage scenario, when the vCPU is bound to a pCPU and
> > the pCPU is isolated, or some similar scenario.
> >
> > However, it possibly has impact if the vCPU thread is scheduled in/out
> > very frequently, because it switches from/to the hrtimer emulation a lot.
> >
> > Signed-off-by: Yunhong Jiang <yunhong.jiang@intel.com>
> > ---
> >  arch/x86/kvm/lapic.c | 108
> +++++++++++++++++++++++++++++++++++++++++++++++++--
> >  arch/x86/kvm/lapic.h |  10 +++++
> >  arch/x86/kvm/vmx.c   |  26 +++++++++++++
> >  arch/x86/kvm/x86.c   |   6 +++
> >  4 files changed, 147 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 5776473be362..a613bcfda59a 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -6608,6 +6608,8 @@ static int vcpu_enter_guest(struct kvm_vcpu
> *vcpu)
> >
> >  	local_irq_disable();
> >
> > +	inject_expired_hwemul_timer(vcpu);
> 
> Is this really fast enough (and does it trigger often enough) that it is
> worth slowing down all vmenters?
> 
> I'd rather call inject_expired_hwemul_timer from the preemption timer
> vmexit handler instead.  inject_pending_hwemul_timer will set the
> preemption timer countdown to zero if the deadline of the guest LAPIC
> timer has passed already.  This should be relatively rare.

Sure and will take this way on the new patch set. I'd give some reson why it's this way now. 
Originally this patch was for cyclictest on guest with latency less than 15us for 24 hours.
So, if the timer expires already before VM entry, we try to inject it immediately,
instead of waiting for an extra VMExit, which may be 4~5 us.

But I do agree that this is not so good if not care for this rare care and I will change as your suggestion.

Thanks
--jyh

> 
> This is the only major change I would like to see.  Everything else is
> more cleanup and cosmetics.
> 
> >  	if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests
> >  	    || need_resched() || signal_pending(current)) {
> >  		vcpu->mode = OUTSIDE_GUEST_MODE;
> > @@ -6773,6 +6775,10 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
> >  			break;
> >
> >  		clear_bit(KVM_REQ_PENDING_TIMER, &vcpu->requests);
> > +
> > +		/* Inject the hwemul timer if expired to avoid one VMExit */
> > +		inject_expired_hwemul_timer(vcpu);
> 
> Same as above.
> 
> >  		if (kvm_cpu_has_pending_timer(vcpu))
> >  			kvm_inject_pending_timer_irqs(vcpu);
> >
> >
> > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > index 12c416929d9c..c9e32bf1a613 100644
> > --- a/arch/x86/kvm/lapic.c
> > +++ b/arch/x86/kvm/lapic.c
> > @@ -1320,7 +1320,7 @@ void wait_lapic_expire(struct kvm_vcpu *vcpu)
> >  		__delay(tsc_deadline - guest_tsc);
> >  }
> >
> > -static void start_sw_tscdeadline(struct kvm_lapic *apic)
> > +static void start_sw_tscdeadline(struct kvm_lapic *apic, int no_expire)
> >  {
> >  	u64 guest_tsc, tscdeadline = apic->lapic_timer.tscdeadline;
> >  	u64 ns = 0;
> > @@ -1337,7 +1337,8 @@ static void start_sw_tscdeadline(struct
> kvm_lapic *apic)
> >
> >  	now = apic->lapic_timer.timer.base->get_time();
> >  	guest_tsc = kvm_read_l1_tsc(vcpu, rdtsc());
> > -	if (likely(tscdeadline > guest_tsc)) {
> > +	/* Not trigger the apic_timer if invoked from sched_out */
> 
> This comment is not necessary here.  Rather, you should explain the
> deadlock better in switch_to_sw_lapic_timer.
> 
> > +	if (no_expire || likely(tscdeadline > guest_tsc)) {
> >  		ns = (tscdeadline - guest_tsc) * 1000000ULL;
> >  		do_div(ns, this_tsc_khz);
> >  		expire = ktime_add_ns(now, ns);
> > @@ -1396,9 +1397,110 @@ static void start_apic_timer(struct kvm_lapic
> *apic)
> >  			   ktime_to_ns(ktime_add_ns(now,
> >  					apic->lapic_timer.period)));
> >  	} else if (apic_lvtt_tscdeadline(apic)) {
> > -		start_sw_tscdeadline(apic);
> > +		/* lapic timer in tsc deadline mode */
> > +		if (hw_emul_timer(apic)) {
> > +			if (unlikely(!apic->lapic_timer.tscdeadline ||
> > +					!apic->vcpu->arch.virtual_tsc_khz))
> > +				return;
> > +
> > +			/* Expired timer will be checked on vcpu_run() */
> > +			apic->lapic_timer.hw_emulation =
> HWEMUL_ENABLED;
> > +		} else
> > +			start_sw_tscdeadline(apic, 0);
> > +	}
> > +}
> > +
> > +void switch_to_hw_lapic_timer(struct kvm_vcpu *vcpu)
> > +{
> > +	struct kvm_lapic *apic = vcpu->arch.apic;
> > +
> > +	if (apic->lapic_timer.hw_emulation)
> > +		return;
> 
> This "if" never triggers.  Please change it to a WARN_ON?
> 
> > +	if (apic_lvtt_tscdeadline(apic) &&
> > +	    !atomic_read(&apic->lapic_timer.pending)) {
> > +		hrtimer_cancel(&apic->lapic_timer.timer);
> > +		/* In case the timer triggered in above small window */
> > +		if (!atomic_read(&apic->lapic_timer.pending))
> > +			apic->lapic_timer.hw_emulation =
> HWEMUL_ENABLED;
> > +	}
> > +}
> > +EXPORT_SYMBOL_GPL(switch_to_hw_lapic_timer);
> > +
> > +void switch_to_sw_lapic_timer(struct kvm_vcpu *vcpu)
> > +{
> > +	struct kvm_lapic *apic = vcpu->arch.apic;
> > +
> > +	if (!apic->lapic_timer.hw_emulation)
> > +		return;
> 
> This "if" never triggers.  Please change it to a WARN_ON.
> 
> > +	if (apic->lapic_timer.hw_emulation == HWEMUL_INJECTED)
> > +		kvm_x86_ops->clear_hwemul_timer(vcpu);
> > +	apic->lapic_timer.hw_emulation = 0;
> > +	if (atomic_read(&apic->lapic_timer.pending))
> > +		return;
> > +
> > +	/* Don't trigger the apic_timer_expired() for deadlock */
> 
> Can you explain this better?  For example:
> 
> 	/*
> 	 * Calling apic_timer_expired() from here results in a deadlock,
> 	 * because...
> 	 */
> 
> > +	start_sw_tscdeadline(apic, 1);
> > +}
> > +EXPORT_SYMBOL_GPL(switch_to_sw_lapic_timer);
> > +
> > +/*
> > + * Check the hwemul timer status.
> > + * -1: hwemul timer is not enabled
> > + * >0: hwemul timer it not expired yet, the return is the delta tsc
> > + *  0: hwemul timer expired already
> > + */
> > +int check_apic_hwemul_timer(struct kvm_vcpu *vcpu)
> > +{
> > +	struct kvm_lapic *apic = vcpu->arch.apic;
> > +
> > +	if (apic->lapic_timer.hw_emulation) {
> > +		u64 tscdeadline = apic->lapic_timer.tscdeadline;
> > +		u64 guest_tsc = kvm_read_l1_tsc(vcpu, rdtsc());
> > +
> > +		if (tscdeadline <= guest_tsc)
> > +			return 0;
> > +		else
> > +			return (tscdeadline - guest_tsc);
> > +	}
> > +	return -1;
> > +}
> > +
> > +int inject_expired_hwemul_timer(struct kvm_vcpu *vcpu)
> > +{
> > +	if (!check_apic_hwemul_timer(vcpu)) {
> > +		struct kvm_lapic *apic = vcpu->arch.apic;
> > +
> > +		if (apic->lapic_timer.hw_emulation == HWEMUL_INJECTED)
> > +			kvm_x86_ops->clear_hwemul_timer(vcpu);
> 
> If you call this function from the vmexit handler, you can do the
> clearing unconditionally.
> 
> Also if you call this function from the vmexit handler you _know_ that
> apic->lapic_timer.hw_emulation will be HWEMUL_INJECTED, and you can
> then
> simplify this function to just:
> 
> 	WARN_ON(apic->lapic_timer.hw_emulation != HWEMUL_INJECTED);
> 	WARN_ON(swait_active(&vcpu->wq));
> 	kvm_x86_ops->clear_hwemul_timer(vcpu);
> 	apic->lapic_timer.hw_emulation = 0;
> 	apic_timer_expired(apic);
> 
> > +		apic->lapic_timer.hw_emulation = 0;
> > +		atomic_inc(&apic->lapic_timer.pending);
> > +		kvm_set_pending_timer(vcpu);
> > +		return 1;
> >  	}
> > +
> > +	return 0;
> > +}
> > +
> > +int inject_pending_hwemul_timer(struct kvm_vcpu *vcpu)
> > +{
> > +	u64 hwemultsc;
> > +
> > +	hwemultsc = check_apic_hwemul_timer(vcpu);
> > +	/* Just before vmentry, so inject even if expired */
> > +	if (hwemultsc >= 0) {
> 
> If you follow the suggestion above, this becomes the only caller of
> check_apic_hwemul_timer.  The function can be:
> 
> 	if (apic->lapic_timer.hw_emulation == HWEMUL_NOT_USED)
> 		return;
> 
> 	... compute delta...
> 
> 	kvm_x86_ops->set_hwemul_timer(vcpu, tsc_delta);
> 	apic->lapic_timer.hw_emulation = HWEMUL_INJECTED;
> 
> > +		struct kvm_lapic *apic = vcpu->arch.apic;
> > +
> > +		kvm_x86_ops->set_hwemul_timer(vcpu, hwemultsc);
> > +		apic->lapic_timer.hw_emulation = HWEMUL_INJECTED;
> > +		return 1;
> > +	}
> > +
> > +	return 0;
> >  }
> > +EXPORT_SYMBOL_GPL(inject_pending_hwemul_timer);
> >
> >  static void apic_manage_nmi_watchdog(struct kvm_lapic *apic, u32
> lvt0_val)
> >  {
> > diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
> > index 891c6da7d4aa..5037d7bf609a 100644
> > --- a/arch/x86/kvm/lapic.h
> > +++ b/arch/x86/kvm/lapic.h
> > @@ -12,6 +12,10 @@
> >  #define KVM_APIC_SHORT_MASK	0xc0000
> >  #define KVM_APIC_DEST_MASK	0x800
> >
> > +#define HWEMUL_ENABLED		1
> > +/* The VMCS has been set for the vmx preemption timer */
> > +#define HWEMUL_INJECTED		2
> 
> Please define an enum with all three values:
> 
> enum {
>     HV_TIMER_NOT_USED,
>     HV_TIMER_NEEDS_ARMING,
>     HV_TIMER_ARMED
> };
> 
> and check for "!= HV_TIMER_NOT_USED" rather than simply "!= 0" or
> "!apic->timer.hw_emulation".
> 
> >  struct kvm_timer {
> >  	struct hrtimer timer;
> >  	s64 period; 				/* unit: ns */
> > @@ -20,6 +24,7 @@ struct kvm_timer {
> >  	u64 tscdeadline;
> >  	u64 expired_tscdeadline;
> >  	atomic_t pending;			/* accumulated triggered
> timers */
> > +	int hw_emulation;
> 
> Please rename to something like "hv_timer_state".
> 
> >  };
> >
> >  struct kvm_lapic {
> > @@ -212,4 +217,9 @@ bool kvm_intr_is_single_vcpu_fast(struct kvm
> *kvm, struct kvm_lapic_irq *irq,
> >  			struct kvm_vcpu **dest_vcpu);
> >  int kvm_vector_to_index(u32 vector, u32 dest_vcpus,
> >  			const unsigned long *bitmap, u32 bitmap_size);
> > +void switch_to_sw_lapic_timer(struct kvm_vcpu *vcpu);
> > +void switch_to_hw_lapic_timer(struct kvm_vcpu *vcpu);
> 
> Please be consistent in the naming; you're using both "hwemul" and "hw".
>  I'd prefer to use "hv", though "hw" is okay too (and then you should.
> 
> > +int check_apic_hwemul_timer(struct kvm_vcpu *vcpu);
> 
> check_apic_hwemul_timer can be static.  (Also please rename to something
> like kvm_lapic_get_tsc_delta).
> 
> > +int inject_expired_hwemul_timer(struct kvm_vcpu *vcpu);
> 
> Please rename to something like kvm_lapic_check_timer.
> 
> Also, do not return int if you don't use the return value.
> 
> > +int inject_pending_hwemul_timer(struct kvm_vcpu *vcpu);
> 
> Please rename to something like kvm_lapic_arm_hv_timer.
> 
> Also, do not return int if you don't use the return value.
> 
> > +static int handle_preemption_timer(struct kvm_vcpu *vcpu)
> > +{
> > +	struct kvm_lapic *apic = vcpu->arch.apic;
> > +
> > +	if (apic->lapic_timer.hw_emulation != HWEMUL_INJECTED)
> > +		printk(KERN_WARNING "Preemption timer w/o
> hwemulation\n");
> > +
> > +	if (!atomic_read(&apic->lapic_timer.pending)) {
> > +		atomic_inc(&apic->lapic_timer.pending);
> > +		kvm_make_request(KVM_REQ_PENDING_TIMER, vcpu);
> > +	}
> > +
> > +	apic->lapic_timer.hw_emulation = 0;
> > +	vmcs_clear_bits(PIN_BASED_VM_EXEC_CONTROL,
> > +			PIN_BASED_VMX_PREEMPTION_TIMER);
> 
> Please just call inject_expired_hwemul_timer.  Using vcpu->arch.apic
> from here violates the abstraction.
> 
> > +	return 1;
> > +}
> >  /*
> >   * The exit handlers return 1 if the exit was handled fully and guest
> execution
> >   * may resume.  Otherwise they set the kvm_run parameter to indicate
> what needs
> > @@ -7623,6 +7640,7 @@ static int (*const
> kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
> >  	[EXIT_REASON_XRSTORS]                 = handle_xrstors,
> >  	[EXIT_REASON_PML_FULL]		      = handle_pml_full,
> >  	[EXIT_REASON_PCOMMIT]                 = handle_pcommit,
> > +	[EXIT_REASON_PREEMPTION_TIMER]	      =
> handle_preemption_timer,
> >  };
> >
> >  static const int kvm_vmx_max_exit_handlers =
> > @@ -8674,6 +8692,8 @@ static void __noclone vmx_vcpu_run(struct
> kvm_vcpu *vcpu)
> >  	if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP)
> >  		vmx_set_interrupt_shadow(vcpu, 0);
> >
> > +	inject_pending_hwemul_timer(vcpu);
> > +
> >  	if (vmx->guest_pkru_valid)
> >  		__write_pkru(vmx->guest_pkru);
> >
> > @@ -10693,10 +10713,16 @@ static void vmx_sched_in(struct kvm_vcpu
> *vcpu, int cpu)
> >  {
> >  	if (ple_gap)
> >  		shrink_ple_window(vcpu);
> > +	if (vmx_hwemul_timer(vcpu))
> > +		switch_to_hw_lapic_timer(vcpu);
> 
> Please call this from kvm_arch_sched_in (checking
> kvm_x86_ops->set_hwemul_timer for non-NULL).
> 
> >  }
> >
> >  static void vmx_sched_out(struct kvm_vcpu *vcpu)
> >  {
> > +	struct kvm_lapic *apic = vcpu->arch.apic;
> > +
> > +	if (apic->lapic_timer.hw_emulation)
> > +		switch_to_sw_lapic_timer(vcpu);
> 
> Please move this "if" to kvm_arch_sched_out instead of adding the
> kvm_x86_ops member.
> 
> Thanks,
> 
> Paolo
> 
> >  }
> >
> >  static void vmx_slot_enable_log_dirty(struct kvm *kvm,
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [RFC PATCH 0/5] Utilizing VMX preemption for timer virtualization
  2016-05-20 18:18 ` Marcelo Tosatti
  2016-05-20 18:21   ` Marcelo Tosatti
  2016-05-20 20:49   ` Paolo Bonzini
@ 2016-05-20 22:18   ` Jiang, Yunhong
  2016-05-21  0:45     ` Marcelo Tosatti
  2 siblings, 1 reply; 29+ messages in thread
From: Jiang, Yunhong @ 2016-05-20 22:18 UTC (permalink / raw)
  To: Marcelo Tosatti, Yunhong Jiang; +Cc: kvm, rkrcmar, pbonzini



> -----Original Message-----
> From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On
> Behalf Of Marcelo Tosatti
> Sent: Friday, May 20, 2016 11:19 AM
> To: Yunhong Jiang <yunhong.jiang@linux.intel.com>
> Cc: kvm@vger.kernel.org; rkrcmar@redhat.com; pbonzini@redhat.com
> Subject: Re: [RFC PATCH 0/5] Utilizing VMX preemption for timer
> virtualization
> 
> On Thu, May 19, 2016 at 06:44:58PM -0700, Yunhong Jiang wrote:
> > The VMX-preemption timer is a feature on VMX, it counts down, from the
> > value loaded by VM entry, in VMX nonroot operation. When the timer
> > counts down to zero, it stops counting down and a VM exit occurs.
> >
> > The VMX preemption timer for tsc deadline timer virtualization. The
> > VMX preemption timer is armed when the vCPU is running, and a VMExit
> > will happen if the virtual TSC deadline timer expires.
> >
> > When the vCPU thread is scheduled out, the tsc deadline timer
> > virtualization will be switched to use the current solution, i.e. use
> > the timer for it. It's switched back to VMX preemption timer when the
> > vCPU thread is scheduled int.
> >
> > This solution replace the complex OS's hrtimer system, and also the
> > host timer interrupt handling cost, with a preemption_timer VMexit. It
> > fits well for some NFV usage scenario, when the vCPU is bound to a
> > pCPU and the pCPU is isolated, or some similar scenarioes.
> >
> > However, it possibly has impact if the vCPU thread is scheduled in/out
> > very frequently, because it switches from/to the hrtimer emulation a
> > lot. A module parameter is provided to turn it on or off.
> >
> > Signed-off-by: Yunhong Jiang <yunhong.jiang@intel.com>
> 
> Hi Yunhong Jiang,
> 
> This adds cost to the VM-exit and VM-entry paths (additional
> instructions and i-cache pressure). Also it adds cost to
> kvm_sched_out.

Hi, Marcelo,
     Thanks for reply. As reply to Paolo's previous mail, I will change it so
that there will be no extra cost to VM-exit/entry paths.

     Yes, it add costs to kvm_sched_out/in as stated in the commit
message, so it's not good if the thread is scheduled in/out a lot.

> 
> What is the benefit the switch from external interrupt to VMX preemption
> timer brings?

The cost is the interrupt exit cost + HRtimer handler cost. I will get data on my next patch set.

Thanks
--jyh

> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [RFC PATCH 0/5] Utilizing VMX preemption for timer virtualization
  2016-05-20 20:49   ` Paolo Bonzini
@ 2016-05-20 22:27     ` Jiang, Yunhong
  2016-05-20 23:53       ` yunhong jiang
  0 siblings, 1 reply; 29+ messages in thread
From: Jiang, Yunhong @ 2016-05-20 22:27 UTC (permalink / raw)
  To: Paolo Bonzini, Marcelo Tosatti; +Cc: Yunhong Jiang, kvm, rkrcmar



> -----Original Message-----
> From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On
> Behalf Of Paolo Bonzini
> Sent: Friday, May 20, 2016 1:49 PM
> To: Marcelo Tosatti <mtosatti@redhat.com>
> Cc: Yunhong Jiang <yunhong.jiang@linux.intel.com>; kvm@vger.kernel.org;
> rkrcmar@redhat.com
> Subject: Re: [RFC PATCH 0/5] Utilizing VMX preemption for timer
> virtualization
> 
> > On Thu, May 19, 2016 at 06:44:58PM -0700, Yunhong Jiang wrote:
> > > The VMX-preemption timer is a feature on VMX, it counts down, from
> the
> > > value loaded by VM entry, in VMX nonroot operation. When the timer
> > > counts down to zero, it stops counting down and a VM exit occurs.
> > >
> > > The VMX preemption timer for tsc deadline timer virtualization. The
> > > VMX preemption timer is armed when the vCPU is running, and a VMExit
> > > will happen if the virtual TSC deadline timer expires.
> > >
> > > When the vCPU thread is scheduled out, the tsc deadline timer
> > > virtualization will be switched to use the current solution, i.e. use
> > > the timer for it. It's switched back to VMX preemption timer when the
> > > vCPU thread is scheduled int.
> > >
> > > This solution replace the complex OS's hrtimer system, and also the
> > > host timer interrupt handling cost, with a preemption_timer VMexit. It
> > > fits well for some NFV usage scenario, when the vCPU is bound to a
> > > pCPU and the pCPU is isolated, or some similar scenarioes.
> > >
> > > However, it possibly has impact if the vCPU thread is scheduled in/out
> > > very frequently, because it switches from/to the hrtimer emulation a
> > > lot. A module parameter is provided to turn it on or off.
> > >
> > > Signed-off-by: Yunhong Jiang <yunhong.jiang@intel.com>
> >
> > Hi Yunhong Jiang,
> >
> > This adds cost to the VM-exit and VM-entry paths (additional
> > instructions and i-cache pressure). Also it adds cost to
> > kvm_sched_out.
> 
> For now my review limited itself to making the code nicer without
> touching the overall design too much.
> 
> I'm confident that we can reduce it to a dozen instructions on vmentry

If we set the preemption timer on the first vm-entry, and then there is a
vm-exit because of another reason, on the second vm-entry, do we need
update the VMX_PREEMPTION_TIMER_VALUE value? If no, then we don't
need anything special on the vmentry. We simply set the VMCS in sched_in
or when guest program the timer.

Thanks
--jyh

> and only pay the cost of hrtimer_start on failed HLT polls (enabling
> the hrtimer only before going to sleep).  Assuming that device
> interrupts are delivered while the guest is running or during a
> successful HLT polls, that would be very rare.
> 
> > What is the benefit the switch from external interrupt to VMX preemption
> > timer brings?
> 
> hrtimer_start/hrtimer_cancel do show up in profiles for dynticks guests.
> Since they touch a red-black tree, it shouldn't be hard to outperform them.
> 
> Paolo
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 0/5] Utilizing VMX preemption for timer virtualization
  2016-05-20 22:27     ` Jiang, Yunhong
@ 2016-05-20 23:53       ` yunhong jiang
  0 siblings, 0 replies; 29+ messages in thread
From: yunhong jiang @ 2016-05-20 23:53 UTC (permalink / raw)
  To: Jiang, Yunhong; +Cc: Paolo Bonzini, Marcelo Tosatti, kvm, rkrcmar

On Fri, 20 May 2016 22:27:08 +0000
"Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:

> 
> 
> > -----Original Message-----
> > From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org]
> > On Behalf Of Paolo Bonzini
> > Sent: Friday, May 20, 2016 1:49 PM
> > To: Marcelo Tosatti <mtosatti@redhat.com>
> > Cc: Yunhong Jiang <yunhong.jiang@linux.intel.com>;
> > kvm@vger.kernel.org; rkrcmar@redhat.com
> > Subject: Re: [RFC PATCH 0/5] Utilizing VMX preemption for timer
> > virtualization
> > 
> > > On Thu, May 19, 2016 at 06:44:58PM -0700, Yunhong Jiang wrote:
> > > > The VMX-preemption timer is a feature on VMX, it counts down,
> > > > from
> > the
> > > > value loaded by VM entry, in VMX nonroot operation. When the
> > > > timer counts down to zero, it stops counting down and a VM exit
> > > > occurs.
> > > >
> > > > The VMX preemption timer for tsc deadline timer virtualization.
> > > > The VMX preemption timer is armed when the vCPU is running, and
> > > > a VMExit will happen if the virtual TSC deadline timer expires.
> > > >
> > > > When the vCPU thread is scheduled out, the tsc deadline timer
> > > > virtualization will be switched to use the current solution,
> > > > i.e. use the timer for it. It's switched back to VMX preemption
> > > > timer when the vCPU thread is scheduled int.
> > > >
> > > > This solution replace the complex OS's hrtimer system, and also
> > > > the host timer interrupt handling cost, with a preemption_timer
> > > > VMexit. It fits well for some NFV usage scenario, when the vCPU
> > > > is bound to a pCPU and the pCPU is isolated, or some similar
> > > > scenarioes.
> > > >
> > > > However, it possibly has impact if the vCPU thread is scheduled
> > > > in/out very frequently, because it switches from/to the hrtimer
> > > > emulation a lot. A module parameter is provided to turn it on
> > > > or off.
> > > >
> > > > Signed-off-by: Yunhong Jiang <yunhong.jiang@intel.com>
> > >
> > > Hi Yunhong Jiang,
> > >
> > > This adds cost to the VM-exit and VM-entry paths (additional
> > > instructions and i-cache pressure). Also it adds cost to
> > > kvm_sched_out.
> > 
> > For now my review limited itself to making the code nicer without
> > touching the overall design too much.
> > 
> > I'm confident that we can reduce it to a dozen instructions on
> > vmentry
> 
> If we set the preemption timer on the first vm-entry, and then there
> is a vm-exit because of another reason, on the second vm-entry, do we
> need update the VMX_PREEMPTION_TIMER_VALUE value? If no, then we don't
> need anything special on the vmentry. We simply set the VMCS in
> sched_in or when guest program the timer.

Please just ignore this, brain dead at that time. We have to setup the
VMX_PREEMPTION_TIMER_VALUE every time.

--jyh

> 
> Thanks
> --jyh
> 
> > and only pay the cost of hrtimer_start on failed HLT polls (enabling
> > the hrtimer only before going to sleep).  Assuming that device
> > interrupts are delivered while the guest is running or during a
> > successful HLT polls, that would be very rare.
> > 
> > > What is the benefit the switch from external interrupt to VMX
> > > preemption timer brings?
> > 
> > hrtimer_start/hrtimer_cancel do show up in profiles for dynticks
> > guests. Since they touch a red-black tree, it shouldn't be hard to
> > outperform them.
> > 
> > Paolo
> > --
> > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 0/5] Utilizing VMX preemption for timer virtualization
  2016-05-20 22:18   ` Jiang, Yunhong
@ 2016-05-21  0:45     ` Marcelo Tosatti
  0 siblings, 0 replies; 29+ messages in thread
From: Marcelo Tosatti @ 2016-05-21  0:45 UTC (permalink / raw)
  To: Jiang, Yunhong; +Cc: Yunhong Jiang, kvm, rkrcmar, pbonzini

GOn Fri, May 20, 2016 at 10:18:47PM +0000, Jiang, Yunhong wrote:
> 
> 
> > -----Original Message-----
> > From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On
> > Behalf Of Marcelo Tosatti
> > Sent: Friday, May 20, 2016 11:19 AM
> > To: Yunhong Jiang <yunhong.jiang@linux.intel.com>
> > Cc: kvm@vger.kernel.org; rkrcmar@redhat.com; pbonzini@redhat.com
> > Subject: Re: [RFC PATCH 0/5] Utilizing VMX preemption for timer
> > virtualization
> > 
> > On Thu, May 19, 2016 at 06:44:58PM -0700, Yunhong Jiang wrote:
> > > The VMX-preemption timer is a feature on VMX, it counts down, from the
> > > value loaded by VM entry, in VMX nonroot operation. When the timer
> > > counts down to zero, it stops counting down and a VM exit occurs.
> > >
> > > The VMX preemption timer for tsc deadline timer virtualization. The
> > > VMX preemption timer is armed when the vCPU is running, and a VMExit
> > > will happen if the virtual TSC deadline timer expires.
> > >
> > > When the vCPU thread is scheduled out, the tsc deadline timer
> > > virtualization will be switched to use the current solution, i.e. use
> > > the timer for it. It's switched back to VMX preemption timer when the
> > > vCPU thread is scheduled int.
> > >
> > > This solution replace the complex OS's hrtimer system, and also the
> > > host timer interrupt handling cost, with a preemption_timer VMexit. It
> > > fits well for some NFV usage scenario, when the vCPU is bound to a
> > > pCPU and the pCPU is isolated, or some similar scenarioes.
> > >
> > > However, it possibly has impact if the vCPU thread is scheduled in/out
> > > very frequently, because it switches from/to the hrtimer emulation a
> > > lot. A module parameter is provided to turn it on or off.
> > >
> > > Signed-off-by: Yunhong Jiang <yunhong.jiang@intel.com>
> > 
> > Hi Yunhong Jiang,
> > 
> > This adds cost to the VM-exit and VM-entry paths (additional
> > instructions and i-cache pressure). Also it adds cost to
> > kvm_sched_out.
> 
> Hi, Marcelo,
>      Thanks for reply. As reply to Paolo's previous mail, I will change it so
> that there will be no extra cost to VM-exit/entry paths.
> 
>      Yes, it add costs to kvm_sched_out/in as stated in the commit
> message, so it's not good if the thread is scheduled in/out a lot.
> 
> > 
> > What is the benefit the switch from external interrupt to VMX preemption
> > timer brings?
> 
> The cost is the interrupt exit cost + HRtimer handler cost. I will get data on my next patch set.
> 
> Thanks
> --jyh

The data will be useful.

Paolo's argument about hrtimer_start/stop makes sense. 



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 4/5] Utilize the vmx preemption timer for tsc deadline timer
  2016-05-20 22:06     ` Jiang, Yunhong
@ 2016-05-21 12:38       ` Paolo Bonzini
  2016-05-22  0:21       ` Wanpeng Li
  1 sibling, 0 replies; 29+ messages in thread
From: Paolo Bonzini @ 2016-05-21 12:38 UTC (permalink / raw)
  To: Yunhong Jiang; +Cc: Yunhong Jiang, kvm, rkrcmar



----- Original Message -----
> From: "Yunhong Jiang" <yunhong.jiang@intel.com>
> To: "Paolo Bonzini" <pbonzini@redhat.com>, "Yunhong Jiang" <yunhong.jiang@linux.intel.com>, kvm@vger.kernel.org
> Cc: rkrcmar@redhat.com
> Sent: Saturday, May 21, 2016 12:06:16 AM
> Subject: RE: [RFC PATCH 4/5] Utilize the vmx preemption timer for tsc deadline timer
> 
> 
> 
> > -----Original Message-----
> > From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On
> > Behalf Of Paolo Bonzini
> > Sent: Friday, May 20, 2016 3:34 AM
> > To: Yunhong Jiang <yunhong.jiang@linux.intel.com>; kvm@vger.kernel.org
> > Cc: rkrcmar@redhat.com
> > Subject: Re: [RFC PATCH 4/5] Utilize the vmx preemption timer for tsc
> > deadline timer
> > 
> > 
> > 
> > On 20/05/2016 03:45, Yunhong Jiang wrote:
> > > From: Yunhong Jiang <yunhong.jiang@gmail.com>
> > >
> > > Utilizing the VMX preemption timer for tsc deadline timer
> > > virtualization. The VMX preemption timer is armed when the vCPU is
> > > running, and a VMExit will happen if the virtual TSC deadline timer
> > > expires.
> > >
> > > When the vCPU thread is scheduled out, the tsc deadline timer
> > > virtualization will be switched to use the current solution, i.e. use
> > > the timer for it. It's switched back to VMX preemption timer when the
> > > vCPU thread is scheduled int.
> > >
> > > This solution avoids the complex OS's hrtimer system, and also the host
> > > timer interrupt handling cost, with a preemption_timer VMexit. It fits
> > > well for some NFV usage scenario, when the vCPU is bound to a pCPU and
> > > the pCPU is isolated, or some similar scenario.
> > >
> > > However, it possibly has impact if the vCPU thread is scheduled in/out
> > > very frequently, because it switches from/to the hrtimer emulation a lot.
> > >
> > > Signed-off-by: Yunhong Jiang <yunhong.jiang@intel.com>
> > > ---
> > >  arch/x86/kvm/lapic.c | 108
> > +++++++++++++++++++++++++++++++++++++++++++++++++--
> > >  arch/x86/kvm/lapic.h |  10 +++++
> > >  arch/x86/kvm/vmx.c   |  26 +++++++++++++
> > >  arch/x86/kvm/x86.c   |   6 +++
> > >  4 files changed, 147 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > index 5776473be362..a613bcfda59a 100644
> > > --- a/arch/x86/kvm/x86.c
> > > +++ b/arch/x86/kvm/x86.c
> > > @@ -6608,6 +6608,8 @@ static int vcpu_enter_guest(struct kvm_vcpu
> > *vcpu)
> > >
> > >  	local_irq_disable();
> > >
> > > +	inject_expired_hwemul_timer(vcpu);
> > 
> > Is this really fast enough (and does it trigger often enough) that it is
> > worth slowing down all vmenters?
> > 
> > I'd rather call inject_expired_hwemul_timer from the preemption timer
> > vmexit handler instead.  inject_pending_hwemul_timer will set the
> > preemption timer countdown to zero if the deadline of the guest LAPIC
> > timer has passed already.  This should be relatively rare.
> 
> Sure and will take this way on the new patch set. I'd give some reson why
> it's this way now.  Originally this patch was for cyclictest on guest
> with latency less than 15us for 24 hours.  So, if the timer expires already
> before VM entry, we try to inject it immediately, instead of waiting for
> an extra VMExit, which may be 4~5 us.

This seems too much...  A vmexit+vmentry on Ivy Bridge or newer is around
1200-1500 cycles, that should give 1-2 microseconds at most including the time
to inject the interrupt.

There are a few more ideas that I have about optimizing the preemption timer,
hopefully we can get it down to that and not pessimize the sched_out/sched_in
case.  Instead, I think what we want to touch is the blocking/unblocking
callback.  Wanpeng Li's patches to handle the APIC timer specially in
kvm_vcpu_block could help too for this.  However, there's time for that.
Please keep sched_out/sched_in in your next submission, and we can work on
it a step at a time.

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 4/5] Utilize the vmx preemption timer for tsc deadline timer
  2016-05-20 22:06     ` Jiang, Yunhong
  2016-05-21 12:38       ` Paolo Bonzini
@ 2016-05-22  0:21       ` Wanpeng Li
  2016-05-23 22:58         ` yunhong jiang
  1 sibling, 1 reply; 29+ messages in thread
From: Wanpeng Li @ 2016-05-22  0:21 UTC (permalink / raw)
  To: Jiang, Yunhong; +Cc: Paolo Bonzini, Yunhong Jiang, kvm, rkrcmar

2016-05-21 6:06 GMT+08:00 Jiang, Yunhong <yunhong.jiang@intel.com>:
>
>
>> -----Original Message-----
>> From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On
>> Behalf Of Paolo Bonzini
>> Sent: Friday, May 20, 2016 3:34 AM
>> To: Yunhong Jiang <yunhong.jiang@linux.intel.com>; kvm@vger.kernel.org
>> Cc: rkrcmar@redhat.com
>> Subject: Re: [RFC PATCH 4/5] Utilize the vmx preemption timer for tsc
>> deadline timer
>>
>>
>>
>> On 20/05/2016 03:45, Yunhong Jiang wrote:
>> > From: Yunhong Jiang <yunhong.jiang@gmail.com>
>> >
>> > Utilizing the VMX preemption timer for tsc deadline timer
>> > virtualization. The VMX preemption timer is armed when the vCPU is
>> > running, and a VMExit will happen if the virtual TSC deadline timer
>> > expires.
>> >
>> > When the vCPU thread is scheduled out, the tsc deadline timer
>> > virtualization will be switched to use the current solution, i.e. use
>> > the timer for it. It's switched back to VMX preemption timer when the
>> > vCPU thread is scheduled int.
>> >
>> > This solution avoids the complex OS's hrtimer system, and also the host
>> > timer interrupt handling cost, with a preemption_timer VMexit. It fits
>> > well for some NFV usage scenario, when the vCPU is bound to a pCPU and
>> > the pCPU is isolated, or some similar scenario.
>> >
>> > However, it possibly has impact if the vCPU thread is scheduled in/out
>> > very frequently, because it switches from/to the hrtimer emulation a lot.
>> >
>> > Signed-off-by: Yunhong Jiang <yunhong.jiang@intel.com>
>> > ---
>> >  arch/x86/kvm/lapic.c | 108
>> +++++++++++++++++++++++++++++++++++++++++++++++++--
>> >  arch/x86/kvm/lapic.h |  10 +++++
>> >  arch/x86/kvm/vmx.c   |  26 +++++++++++++
>> >  arch/x86/kvm/x86.c   |   6 +++
>> >  4 files changed, 147 insertions(+), 3 deletions(-)
>> >
>> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> > index 5776473be362..a613bcfda59a 100644
>> > --- a/arch/x86/kvm/x86.c
>> > +++ b/arch/x86/kvm/x86.c
>> > @@ -6608,6 +6608,8 @@ static int vcpu_enter_guest(struct kvm_vcpu
>> *vcpu)
>> >
>> >     local_irq_disable();
>> >
>> > +   inject_expired_hwemul_timer(vcpu);
>>
>> Is this really fast enough (and does it trigger often enough) that it is
>> worth slowing down all vmenters?
>>
>> I'd rather call inject_expired_hwemul_timer from the preemption timer
>> vmexit handler instead.  inject_pending_hwemul_timer will set the
>> preemption timer countdown to zero if the deadline of the guest LAPIC
>> timer has passed already.  This should be relatively rare.
>
> Sure and will take this way on the new patch set. I'd give some reson why it's this way now.
> Originally this patch was for cyclictest on guest with latency less than 15us for 24 hours.
> So, if the timer expires already before VM entry, we try to inject it immediately,
> instead of waiting for an extra VMExit, which may be 4~5 us.

inject_expired_hwemul_timer() just set the pending bit, and still need
a vmexit to final exit to vcpu_run() which is the only place to check
pending and inject APIC_LVTT, so why add inject_expired_hwemul_timer()
in vcpu_enter_guest() can avoid an extra vmexit?

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 4/5] Utilize the vmx preemption timer for tsc deadline timer
  2016-05-22  0:21       ` Wanpeng Li
@ 2016-05-23 22:58         ` yunhong jiang
  2016-05-24  0:53           ` Wanpeng Li
  0 siblings, 1 reply; 29+ messages in thread
From: yunhong jiang @ 2016-05-23 22:58 UTC (permalink / raw)
  To: Wanpeng Li; +Cc: Jiang, Yunhong, Paolo Bonzini, kvm, rkrcmar

On Sun, 22 May 2016 08:21:50 +0800
Wanpeng Li <kernellwp@gmail.com> wrote:

> 2016-05-21 6:06 GMT+08:00 Jiang, Yunhong <yunhong.jiang@intel.com>:
> >
> >
> >> -----Original Message-----
> >> From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org]
> >> On Behalf Of Paolo Bonzini
> >> Sent: Friday, May 20, 2016 3:34 AM
> >> To: Yunhong Jiang <yunhong.jiang@linux.intel.com>;
> >> kvm@vger.kernel.org Cc: rkrcmar@redhat.com
> >> Subject: Re: [RFC PATCH 4/5] Utilize the vmx preemption timer for
> >> tsc deadline timer
> >>
> >>
> >>
> >> On 20/05/2016 03:45, Yunhong Jiang wrote:
> >> > From: Yunhong Jiang <yunhong.jiang@gmail.com>
> >> >
> >> > Utilizing the VMX preemption timer for tsc deadline timer
> >> > virtualization. The VMX preemption timer is armed when the vCPU
> >> > is running, and a VMExit will happen if the virtual TSC deadline
> >> > timer expires.
> >> >
> >> > When the vCPU thread is scheduled out, the tsc deadline timer
> >> > virtualization will be switched to use the current solution,
> >> > i.e. use the timer for it. It's switched back to VMX preemption
> >> > timer when the vCPU thread is scheduled int.
> >> >
> >> > This solution avoids the complex OS's hrtimer system, and also
> >> > the host timer interrupt handling cost, with a preemption_timer
> >> > VMexit. It fits well for some NFV usage scenario, when the vCPU
> >> > is bound to a pCPU and the pCPU is isolated, or some similar
> >> > scenario.
> >> >
> >> > However, it possibly has impact if the vCPU thread is scheduled
> >> > in/out very frequently, because it switches from/to the hrtimer
> >> > emulation a lot.
> >> >
> >> > Signed-off-by: Yunhong Jiang <yunhong.jiang@intel.com>
> >> > ---
> >> >  arch/x86/kvm/lapic.c | 108
> >> +++++++++++++++++++++++++++++++++++++++++++++++++--
> >> >  arch/x86/kvm/lapic.h |  10 +++++
> >> >  arch/x86/kvm/vmx.c   |  26 +++++++++++++
> >> >  arch/x86/kvm/x86.c   |   6 +++
> >> >  4 files changed, 147 insertions(+), 3 deletions(-)
> >> >
> >> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> >> > index 5776473be362..a613bcfda59a 100644
> >> > --- a/arch/x86/kvm/x86.c
> >> > +++ b/arch/x86/kvm/x86.c
> >> > @@ -6608,6 +6608,8 @@ static int vcpu_enter_guest(struct kvm_vcpu
> >> *vcpu)
> >> >
> >> >     local_irq_disable();
> >> >
> >> > +   inject_expired_hwemul_timer(vcpu);
> >>
> >> Is this really fast enough (and does it trigger often enough) that
> >> it is worth slowing down all vmenters?
> >>
> >> I'd rather call inject_expired_hwemul_timer from the preemption
> >> timer vmexit handler instead.  inject_pending_hwemul_timer will
> >> set the preemption timer countdown to zero if the deadline of the
> >> guest LAPIC timer has passed already.  This should be relatively
> >> rare.
> >
> > Sure and will take this way on the new patch set. I'd give some
> > reson why it's this way now. Originally this patch was for
> > cyclictest on guest with latency less than 15us for 24 hours. So,
> > if the timer expires already before VM entry, we try to inject it
> > immediately, instead of waiting for an extra VMExit, which may be
> > 4~5 us.
> 
> inject_expired_hwemul_timer() just set the pending bit, and still need
> a vmexit to final exit to vcpu_run() which is the only place to check
> pending and inject APIC_LVTT, so why add inject_expired_hwemul_timer()
> in vcpu_enter_guest() can avoid an extra vmexit?

The inject_expired_hwemul_timer() will invoke the kvm_make_request(), which
will cause the kvm try to inject the timer interrupt directly. Please notice
the vcpu_enter_guest() will recheck the requests late.

Thanks
--jyh

> 
> Regards,
> Wanpeng Li


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 4/5] Utilize the vmx preemption timer for tsc deadline timer
  2016-05-23 22:58         ` yunhong jiang
@ 2016-05-24  0:53           ` Wanpeng Li
  2016-05-24  0:55             ` yunhong jiang
  0 siblings, 1 reply; 29+ messages in thread
From: Wanpeng Li @ 2016-05-24  0:53 UTC (permalink / raw)
  To: yunhong jiang; +Cc: Jiang, Yunhong, Paolo Bonzini, kvm, rkrcmar

2016-05-24 6:58 GMT+08:00 yunhong jiang <yunhong.jiang@linux.intel.com>:
> On Sun, 22 May 2016 08:21:50 +0800
> Wanpeng Li <kernellwp@gmail.com> wrote:
>
>> 2016-05-21 6:06 GMT+08:00 Jiang, Yunhong <yunhong.jiang@intel.com>:
>> >
>> >
>> >> -----Original Message-----
>> >> From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org]
>> >> On Behalf Of Paolo Bonzini
>> >> Sent: Friday, May 20, 2016 3:34 AM
>> >> To: Yunhong Jiang <yunhong.jiang@linux.intel.com>;
>> >> kvm@vger.kernel.org Cc: rkrcmar@redhat.com
>> >> Subject: Re: [RFC PATCH 4/5] Utilize the vmx preemption timer for
>> >> tsc deadline timer
>> >>
>> >>
>> >>
>> >> On 20/05/2016 03:45, Yunhong Jiang wrote:
>> >> > From: Yunhong Jiang <yunhong.jiang@gmail.com>
>> >> >
>> >> > Utilizing the VMX preemption timer for tsc deadline timer
>> >> > virtualization. The VMX preemption timer is armed when the vCPU
>> >> > is running, and a VMExit will happen if the virtual TSC deadline
>> >> > timer expires.
>> >> >
>> >> > When the vCPU thread is scheduled out, the tsc deadline timer
>> >> > virtualization will be switched to use the current solution,
>> >> > i.e. use the timer for it. It's switched back to VMX preemption
>> >> > timer when the vCPU thread is scheduled int.
>> >> >
>> >> > This solution avoids the complex OS's hrtimer system, and also
>> >> > the host timer interrupt handling cost, with a preemption_timer
>> >> > VMexit. It fits well for some NFV usage scenario, when the vCPU
>> >> > is bound to a pCPU and the pCPU is isolated, or some similar
>> >> > scenario.
>> >> >
>> >> > However, it possibly has impact if the vCPU thread is scheduled
>> >> > in/out very frequently, because it switches from/to the hrtimer
>> >> > emulation a lot.
>> >> >
>> >> > Signed-off-by: Yunhong Jiang <yunhong.jiang@intel.com>
>> >> > ---
>> >> >  arch/x86/kvm/lapic.c | 108
>> >> +++++++++++++++++++++++++++++++++++++++++++++++++--
>> >> >  arch/x86/kvm/lapic.h |  10 +++++
>> >> >  arch/x86/kvm/vmx.c   |  26 +++++++++++++
>> >> >  arch/x86/kvm/x86.c   |   6 +++
>> >> >  4 files changed, 147 insertions(+), 3 deletions(-)
>> >> >
>> >> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> >> > index 5776473be362..a613bcfda59a 100644
>> >> > --- a/arch/x86/kvm/x86.c
>> >> > +++ b/arch/x86/kvm/x86.c
>> >> > @@ -6608,6 +6608,8 @@ static int vcpu_enter_guest(struct kvm_vcpu
>> >> *vcpu)
>> >> >
>> >> >     local_irq_disable();
>> >> >
>> >> > +   inject_expired_hwemul_timer(vcpu);
>> >>
>> >> Is this really fast enough (and does it trigger often enough) that
>> >> it is worth slowing down all vmenters?
>> >>
>> >> I'd rather call inject_expired_hwemul_timer from the preemption
>> >> timer vmexit handler instead.  inject_pending_hwemul_timer will
>> >> set the preemption timer countdown to zero if the deadline of the
>> >> guest LAPIC timer has passed already.  This should be relatively
>> >> rare.
>> >
>> > Sure and will take this way on the new patch set. I'd give some
>> > reson why it's this way now. Originally this patch was for
>> > cyclictest on guest with latency less than 15us for 24 hours. So,
>> > if the timer expires already before VM entry, we try to inject it
>> > immediately, instead of waiting for an extra VMExit, which may be
>> > 4~5 us.
>>
>> inject_expired_hwemul_timer() just set the pending bit, and still need
>> a vmexit to final exit to vcpu_run() which is the only place to check
>> pending and inject APIC_LVTT, so why add inject_expired_hwemul_timer()
>> in vcpu_enter_guest() can avoid an extra vmexit?
>
> The inject_expired_hwemul_timer() will invoke the kvm_make_request(), which
> will cause the kvm try to inject the timer interrupt directly. Please notice
> the vcpu_enter_guest() will recheck the requests late.

Actually I did't find another place inject pending timer irqs except
in vcpu_run, though kvm_set_pending_timer mention that there is
implicitly checked in vcpu_enter_guest.

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 4/5] Utilize the vmx preemption timer for tsc deadline timer
  2016-05-24  0:53           ` Wanpeng Li
@ 2016-05-24  0:55             ` yunhong jiang
  2016-05-24  1:16               ` Wanpeng Li
  0 siblings, 1 reply; 29+ messages in thread
From: yunhong jiang @ 2016-05-24  0:55 UTC (permalink / raw)
  To: Wanpeng Li; +Cc: Jiang, Yunhong, Paolo Bonzini, kvm, rkrcmar

On Tue, 24 May 2016 08:53:14 +0800
Wanpeng Li <kernellwp@gmail.com> wrote:

> 2016-05-24 6:58 GMT+08:00 yunhong jiang
> <yunhong.jiang@linux.intel.com>:
> > On Sun, 22 May 2016 08:21:50 +0800
> > Wanpeng Li <kernellwp@gmail.com> wrote:
> >
> >> 2016-05-21 6:06 GMT+08:00 Jiang, Yunhong <yunhong.jiang@intel.com>:
> >> >
> >> >
> >> >> -----Original Message-----
> >> >> From: kvm-owner@vger.kernel.org
> >> >> [mailto:kvm-owner@vger.kernel.org] On Behalf Of Paolo Bonzini
> >> >> Sent: Friday, May 20, 2016 3:34 AM
> >> >> To: Yunhong Jiang <yunhong.jiang@linux.intel.com>;
> >> >> kvm@vger.kernel.org Cc: rkrcmar@redhat.com
> >> >> Subject: Re: [RFC PATCH 4/5] Utilize the vmx preemption timer
> >> >> for tsc deadline timer
> >> >>
> >> >>
> >> >>
> >> >> On 20/05/2016 03:45, Yunhong Jiang wrote:
> >> >> > From: Yunhong Jiang <yunhong.jiang@gmail.com>
> >> >> >
> >> >> > Utilizing the VMX preemption timer for tsc deadline timer
> >> >> > virtualization. The VMX preemption timer is armed when the
> >> >> > vCPU is running, and a VMExit will happen if the virtual TSC
> >> >> > deadline timer expires.
> >> >> >
> >> >> > When the vCPU thread is scheduled out, the tsc deadline timer
> >> >> > virtualization will be switched to use the current solution,
> >> >> > i.e. use the timer for it. It's switched back to VMX
> >> >> > preemption timer when the vCPU thread is scheduled int.
> >> >> >
> >> >> > This solution avoids the complex OS's hrtimer system, and also
> >> >> > the host timer interrupt handling cost, with a
> >> >> > preemption_timer VMexit. It fits well for some NFV usage
> >> >> > scenario, when the vCPU is bound to a pCPU and the pCPU is
> >> >> > isolated, or some similar scenario.
> >> >> >
> >> >> > However, it possibly has impact if the vCPU thread is
> >> >> > scheduled in/out very frequently, because it switches from/to
> >> >> > the hrtimer emulation a lot.
> >> >> >
> >> >> > Signed-off-by: Yunhong Jiang <yunhong.jiang@intel.com>
> >> >> > ---
> >> >> >  arch/x86/kvm/lapic.c | 108
> >> >> +++++++++++++++++++++++++++++++++++++++++++++++++--
> >> >> >  arch/x86/kvm/lapic.h |  10 +++++
> >> >> >  arch/x86/kvm/vmx.c   |  26 +++++++++++++
> >> >> >  arch/x86/kvm/x86.c   |   6 +++
> >> >> >  4 files changed, 147 insertions(+), 3 deletions(-)
> >> >> >
> >> >> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> >> >> > index 5776473be362..a613bcfda59a 100644
> >> >> > --- a/arch/x86/kvm/x86.c
> >> >> > +++ b/arch/x86/kvm/x86.c
> >> >> > @@ -6608,6 +6608,8 @@ static int vcpu_enter_guest(struct
> >> >> > kvm_vcpu
> >> >> *vcpu)
> >> >> >
> >> >> >     local_irq_disable();
> >> >> >
> >> >> > +   inject_expired_hwemul_timer(vcpu);
> >> >>
> >> >> Is this really fast enough (and does it trigger often enough)
> >> >> that it is worth slowing down all vmenters?
> >> >>
> >> >> I'd rather call inject_expired_hwemul_timer from the preemption
> >> >> timer vmexit handler instead.  inject_pending_hwemul_timer will
> >> >> set the preemption timer countdown to zero if the deadline of
> >> >> the guest LAPIC timer has passed already.  This should be
> >> >> relatively rare.
> >> >
> >> > Sure and will take this way on the new patch set. I'd give some
> >> > reson why it's this way now. Originally this patch was for
> >> > cyclictest on guest with latency less than 15us for 24 hours. So,
> >> > if the timer expires already before VM entry, we try to inject it
> >> > immediately, instead of waiting for an extra VMExit, which may be
> >> > 4~5 us.
> >>
> >> inject_expired_hwemul_timer() just set the pending bit, and still
> >> need a vmexit to final exit to vcpu_run() which is the only place
> >> to check pending and inject APIC_LVTT, so why add
> >> inject_expired_hwemul_timer() in vcpu_enter_guest() can avoid an
> >> extra vmexit?
> >
> > The inject_expired_hwemul_timer() will invoke the
> > kvm_make_request(), which will cause the kvm try to inject the
> > timer interrupt directly. Please notice the vcpu_enter_guest() will
> > recheck the requests late.
> 
> Actually I did't find another place inject pending timer irqs except
> in vcpu_run, though kvm_set_pending_timer mention that there is
> implicitly checked in vcpu_enter_guest.

Hi, Wanpeng, thanks for the check. Please have a look on the changes to the
arch/x86/kvm/x86.c on the patch, the inject_expired_hwemul_timer() is called
twice there.

Of course, per Paolo's review, this code path will be removed on the next
submission.

Thanks
--jyh

> 
> Regards,
> Wanpeng Li


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 4/5] Utilize the vmx preemption timer for tsc deadline timer
  2016-05-24  0:55             ` yunhong jiang
@ 2016-05-24  1:16               ` Wanpeng Li
  2016-05-24  1:20                 ` yunhong jiang
  0 siblings, 1 reply; 29+ messages in thread
From: Wanpeng Li @ 2016-05-24  1:16 UTC (permalink / raw)
  To: yunhong jiang; +Cc: Jiang, Yunhong, Paolo Bonzini, kvm, rkrcmar

2016-05-24 8:55 GMT+08:00 yunhong jiang <yunhong.jiang@linux.intel.com>:
> On Tue, 24 May 2016 08:53:14 +0800
> Wanpeng Li <kernellwp@gmail.com> wrote:
>
>> 2016-05-24 6:58 GMT+08:00 yunhong jiang
>> <yunhong.jiang@linux.intel.com>:
>> > On Sun, 22 May 2016 08:21:50 +0800
>> > Wanpeng Li <kernellwp@gmail.com> wrote:
>> >
>> >> 2016-05-21 6:06 GMT+08:00 Jiang, Yunhong <yunhong.jiang@intel.com>:
>> >> >
>> >> >
>> >> >> -----Original Message-----
>> >> >> From: kvm-owner@vger.kernel.org
>> >> >> [mailto:kvm-owner@vger.kernel.org] On Behalf Of Paolo Bonzini
>> >> >> Sent: Friday, May 20, 2016 3:34 AM
>> >> >> To: Yunhong Jiang <yunhong.jiang@linux.intel.com>;
>> >> >> kvm@vger.kernel.org Cc: rkrcmar@redhat.com
>> >> >> Subject: Re: [RFC PATCH 4/5] Utilize the vmx preemption timer
>> >> >> for tsc deadline timer
>> >> >>
>> >> >>
>> >> >>
>> >> >> On 20/05/2016 03:45, Yunhong Jiang wrote:
>> >> >> > From: Yunhong Jiang <yunhong.jiang@gmail.com>
>> >> >> >
>> >> >> > Utilizing the VMX preemption timer for tsc deadline timer
>> >> >> > virtualization. The VMX preemption timer is armed when the
>> >> >> > vCPU is running, and a VMExit will happen if the virtual TSC
>> >> >> > deadline timer expires.
>> >> >> >
>> >> >> > When the vCPU thread is scheduled out, the tsc deadline timer
>> >> >> > virtualization will be switched to use the current solution,
>> >> >> > i.e. use the timer for it. It's switched back to VMX
>> >> >> > preemption timer when the vCPU thread is scheduled int.
>> >> >> >
>> >> >> > This solution avoids the complex OS's hrtimer system, and also
>> >> >> > the host timer interrupt handling cost, with a
>> >> >> > preemption_timer VMexit. It fits well for some NFV usage
>> >> >> > scenario, when the vCPU is bound to a pCPU and the pCPU is
>> >> >> > isolated, or some similar scenario.
>> >> >> >
>> >> >> > However, it possibly has impact if the vCPU thread is
>> >> >> > scheduled in/out very frequently, because it switches from/to
>> >> >> > the hrtimer emulation a lot.
>> >> >> >
>> >> >> > Signed-off-by: Yunhong Jiang <yunhong.jiang@intel.com>
>> >> >> > ---
>> >> >> >  arch/x86/kvm/lapic.c | 108
>> >> >> +++++++++++++++++++++++++++++++++++++++++++++++++--
>> >> >> >  arch/x86/kvm/lapic.h |  10 +++++
>> >> >> >  arch/x86/kvm/vmx.c   |  26 +++++++++++++
>> >> >> >  arch/x86/kvm/x86.c   |   6 +++
>> >> >> >  4 files changed, 147 insertions(+), 3 deletions(-)
>> >> >> >
>> >> >> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> >> >> > index 5776473be362..a613bcfda59a 100644
>> >> >> > --- a/arch/x86/kvm/x86.c
>> >> >> > +++ b/arch/x86/kvm/x86.c
>> >> >> > @@ -6608,6 +6608,8 @@ static int vcpu_enter_guest(struct
>> >> >> > kvm_vcpu
>> >> >> *vcpu)
>> >> >> >
>> >> >> >     local_irq_disable();
>> >> >> >
>> >> >> > +   inject_expired_hwemul_timer(vcpu);
>> >> >>
>> >> >> Is this really fast enough (and does it trigger often enough)
>> >> >> that it is worth slowing down all vmenters?
>> >> >>
>> >> >> I'd rather call inject_expired_hwemul_timer from the preemption
>> >> >> timer vmexit handler instead.  inject_pending_hwemul_timer will
>> >> >> set the preemption timer countdown to zero if the deadline of
>> >> >> the guest LAPIC timer has passed already.  This should be
>> >> >> relatively rare.
>> >> >
>> >> > Sure and will take this way on the new patch set. I'd give some
>> >> > reson why it's this way now. Originally this patch was for
>> >> > cyclictest on guest with latency less than 15us for 24 hours. So,
>> >> > if the timer expires already before VM entry, we try to inject it
>> >> > immediately, instead of waiting for an extra VMExit, which may be
>> >> > 4~5 us.
>> >>
>> >> inject_expired_hwemul_timer() just set the pending bit, and still
>> >> need a vmexit to final exit to vcpu_run() which is the only place
>> >> to check pending and inject APIC_LVTT, so why add
>> >> inject_expired_hwemul_timer() in vcpu_enter_guest() can avoid an
>> >> extra vmexit?
>> >
>> > The inject_expired_hwemul_timer() will invoke the
>> > kvm_make_request(), which will cause the kvm try to inject the
>> > timer interrupt directly. Please notice the vcpu_enter_guest() will
>> > recheck the requests late.
>>
>> Actually I did't find another place inject pending timer irqs except
>> in vcpu_run, though kvm_set_pending_timer mention that there is
>> implicitly checked in vcpu_enter_guest.
>
> Hi, Wanpeng, thanks for the check. Please have a look on the changes to the
> arch/x86/kvm/x86.c on the patch, the inject_expired_hwemul_timer() is called
> twice there.

Yes.

>
> Of course, per Paolo's review, this code path will be removed on the next
> submission.

So my question still is why there is no timer irqs inject in
vcpu_enter_guest, though kvm_set_pending_timer mention that there is
implicitly checked in vcpu_enter_guest.

Ping Paolo, ;-)

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 4/5] Utilize the vmx preemption timer for tsc deadline timer
  2016-05-24  1:16               ` Wanpeng Li
@ 2016-05-24  1:20                 ` yunhong jiang
  2016-05-24  1:32                   ` Wanpeng Li
  0 siblings, 1 reply; 29+ messages in thread
From: yunhong jiang @ 2016-05-24  1:20 UTC (permalink / raw)
  To: Wanpeng Li; +Cc: Jiang, Yunhong, Paolo Bonzini, kvm, rkrcmar

On Tue, 24 May 2016 09:16:03 +0800
Wanpeng Li <kernellwp@gmail.com> wrote:

> 2016-05-24 8:55 GMT+08:00 yunhong jiang
> <yunhong.jiang@linux.intel.com>:
> > On Tue, 24 May 2016 08:53:14 +0800
> > Wanpeng Li <kernellwp@gmail.com> wrote:
> >
> >> 2016-05-24 6:58 GMT+08:00 yunhong jiang
> >> <yunhong.jiang@linux.intel.com>:
> >> > On Sun, 22 May 2016 08:21:50 +0800
> >> > Wanpeng Li <kernellwp@gmail.com> wrote:
> >> >
> >> >> 2016-05-21 6:06 GMT+08:00 Jiang, Yunhong
> >> >> <yunhong.jiang@intel.com>:
> >> >> >
> >> >> >
> >> >> >> -----Original Message-----
> >> >> >> From: kvm-owner@vger.kernel.org
> >> >> >> [mailto:kvm-owner@vger.kernel.org] On Behalf Of Paolo Bonzini
> >> >> >> Sent: Friday, May 20, 2016 3:34 AM
> >> >> >> To: Yunhong Jiang <yunhong.jiang@linux.intel.com>;
> >> >> >> kvm@vger.kernel.org Cc: rkrcmar@redhat.com
> >> >> >> Subject: Re: [RFC PATCH 4/5] Utilize the vmx preemption timer
> >> >> >> for tsc deadline timer
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> On 20/05/2016 03:45, Yunhong Jiang wrote:
> >> >> >> > From: Yunhong Jiang <yunhong.jiang@gmail.com>
> >> >> >> >
> >> >> >> > Utilizing the VMX preemption timer for tsc deadline timer
> >> >> >> > virtualization. The VMX preemption timer is armed when the
> >> >> >> > vCPU is running, and a VMExit will happen if the virtual
> >> >> >> > TSC deadline timer expires.
> >> >> >> >
> >> >> >> > When the vCPU thread is scheduled out, the tsc deadline
> >> >> >> > timer virtualization will be switched to use the current
> >> >> >> > solution, i.e. use the timer for it. It's switched back to
> >> >> >> > VMX preemption timer when the vCPU thread is scheduled int.
> >> >> >> >
> >> >> >> > This solution avoids the complex OS's hrtimer system, and
> >> >> >> > also the host timer interrupt handling cost, with a
> >> >> >> > preemption_timer VMexit. It fits well for some NFV usage
> >> >> >> > scenario, when the vCPU is bound to a pCPU and the pCPU is
> >> >> >> > isolated, or some similar scenario.
> >> >> >> >
> >> >> >> > However, it possibly has impact if the vCPU thread is
> >> >> >> > scheduled in/out very frequently, because it switches
> >> >> >> > from/to the hrtimer emulation a lot.
> >> >> >> >
> >> >> >> > Signed-off-by: Yunhong Jiang <yunhong.jiang@intel.com>
> >> >> >> > ---
> >> >> >> >  arch/x86/kvm/lapic.c | 108
> >> >> >> +++++++++++++++++++++++++++++++++++++++++++++++++--
> >> >> >> >  arch/x86/kvm/lapic.h |  10 +++++
> >> >> >> >  arch/x86/kvm/vmx.c   |  26 +++++++++++++
> >> >> >> >  arch/x86/kvm/x86.c   |   6 +++
> >> >> >> >  4 files changed, 147 insertions(+), 3 deletions(-)
> >> >> >> >
> >> >> >> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> >> >> >> > index 5776473be362..a613bcfda59a 100644
> >> >> >> > --- a/arch/x86/kvm/x86.c
> >> >> >> > +++ b/arch/x86/kvm/x86.c
> >> >> >> > @@ -6608,6 +6608,8 @@ static int vcpu_enter_guest(struct
> >> >> >> > kvm_vcpu
> >> >> >> *vcpu)
> >> >> >> >
> >> >> >> >     local_irq_disable();
> >> >> >> >
> >> >> >> > +   inject_expired_hwemul_timer(vcpu);
> >> >> >>
> >> >> >> Is this really fast enough (and does it trigger often enough)
> >> >> >> that it is worth slowing down all vmenters?
> >> >> >>
> >> >> >> I'd rather call inject_expired_hwemul_timer from the
> >> >> >> preemption timer vmexit handler instead.
> >> >> >> inject_pending_hwemul_timer will set the preemption timer
> >> >> >> countdown to zero if the deadline of the guest LAPIC timer
> >> >> >> has passed already.  This should be relatively rare.
> >> >> >
> >> >> > Sure and will take this way on the new patch set. I'd give
> >> >> > some reson why it's this way now. Originally this patch was
> >> >> > for cyclictest on guest with latency less than 15us for 24
> >> >> > hours. So, if the timer expires already before VM entry, we
> >> >> > try to inject it immediately, instead of waiting for an extra
> >> >> > VMExit, which may be 4~5 us.
> >> >>
> >> >> inject_expired_hwemul_timer() just set the pending bit, and
> >> >> still need a vmexit to final exit to vcpu_run() which is the
> >> >> only place to check pending and inject APIC_LVTT, so why add
> >> >> inject_expired_hwemul_timer() in vcpu_enter_guest() can avoid an
> >> >> extra vmexit?
> >> >
> >> > The inject_expired_hwemul_timer() will invoke the
> >> > kvm_make_request(), which will cause the kvm try to inject the
> >> > timer interrupt directly. Please notice the vcpu_enter_guest()
> >> > will recheck the requests late.
> >>
> >> Actually I did't find another place inject pending timer irqs
> >> except in vcpu_run, though kvm_set_pending_timer mention that
> >> there is implicitly checked in vcpu_enter_guest.
> >
> > Hi, Wanpeng, thanks for the check. Please have a look on the
> > changes to the arch/x86/kvm/x86.c on the patch, the
> > inject_expired_hwemul_timer() is called twice there.
> 
> Yes.
> 
> >
> > Of course, per Paolo's review, this code path will be removed on
> > the next submission.
> 
> So my question still is why there is no timer irqs inject in
> vcpu_enter_guest, though kvm_set_pending_timer mention that there is
> implicitly checked in vcpu_enter_guest.

Do you mean the code at
http://lxr.free-electrons.com/source/arch/x86/kvm/x86.c#L6607 ? It will check
if ther are any event and if yes, it will exit.

--jyh
> 
> Ping Paolo, ;-)
> 
> Regards,
> Wanpeng Li


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 4/5] Utilize the vmx preemption timer for tsc deadline timer
  2016-05-24  1:20                 ` yunhong jiang
@ 2016-05-24  1:32                   ` Wanpeng Li
  0 siblings, 0 replies; 29+ messages in thread
From: Wanpeng Li @ 2016-05-24  1:32 UTC (permalink / raw)
  To: yunhong jiang; +Cc: Jiang, Yunhong, Paolo Bonzini, kvm, rkrcmar

2016-05-24 9:20 GMT+08:00 yunhong jiang <yunhong.jiang@linux.intel.com>:
> On Tue, 24 May 2016 09:16:03 +0800
> Wanpeng Li <kernellwp@gmail.com> wrote:
>
>> 2016-05-24 8:55 GMT+08:00 yunhong jiang
>> <yunhong.jiang@linux.intel.com>:
>> > On Tue, 24 May 2016 08:53:14 +0800
>> > Wanpeng Li <kernellwp@gmail.com> wrote:
>> >
>> >> 2016-05-24 6:58 GMT+08:00 yunhong jiang
>> >> <yunhong.jiang@linux.intel.com>:
>> >> > On Sun, 22 May 2016 08:21:50 +0800
>> >> > Wanpeng Li <kernellwp@gmail.com> wrote:
>> >> >
>> >> >> 2016-05-21 6:06 GMT+08:00 Jiang, Yunhong
>> >> >> <yunhong.jiang@intel.com>:
>> >> >> >
>> >> >> >
>> >> >> >> -----Original Message-----
>> >> >> >> From: kvm-owner@vger.kernel.org
>> >> >> >> [mailto:kvm-owner@vger.kernel.org] On Behalf Of Paolo Bonzini
>> >> >> >> Sent: Friday, May 20, 2016 3:34 AM
>> >> >> >> To: Yunhong Jiang <yunhong.jiang@linux.intel.com>;
>> >> >> >> kvm@vger.kernel.org Cc: rkrcmar@redhat.com
>> >> >> >> Subject: Re: [RFC PATCH 4/5] Utilize the vmx preemption timer
>> >> >> >> for tsc deadline timer
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> On 20/05/2016 03:45, Yunhong Jiang wrote:
>> >> >> >> > From: Yunhong Jiang <yunhong.jiang@gmail.com>
>> >> >> >> >
>> >> >> >> > Utilizing the VMX preemption timer for tsc deadline timer
>> >> >> >> > virtualization. The VMX preemption timer is armed when the
>> >> >> >> > vCPU is running, and a VMExit will happen if the virtual
>> >> >> >> > TSC deadline timer expires.
>> >> >> >> >
>> >> >> >> > When the vCPU thread is scheduled out, the tsc deadline
>> >> >> >> > timer virtualization will be switched to use the current
>> >> >> >> > solution, i.e. use the timer for it. It's switched back to
>> >> >> >> > VMX preemption timer when the vCPU thread is scheduled int.
>> >> >> >> >
>> >> >> >> > This solution avoids the complex OS's hrtimer system, and
>> >> >> >> > also the host timer interrupt handling cost, with a
>> >> >> >> > preemption_timer VMexit. It fits well for some NFV usage
>> >> >> >> > scenario, when the vCPU is bound to a pCPU and the pCPU is
>> >> >> >> > isolated, or some similar scenario.
>> >> >> >> >
>> >> >> >> > However, it possibly has impact if the vCPU thread is
>> >> >> >> > scheduled in/out very frequently, because it switches
>> >> >> >> > from/to the hrtimer emulation a lot.
>> >> >> >> >
>> >> >> >> > Signed-off-by: Yunhong Jiang <yunhong.jiang@intel.com>
>> >> >> >> > ---
>> >> >> >> >  arch/x86/kvm/lapic.c | 108
>> >> >> >> +++++++++++++++++++++++++++++++++++++++++++++++++--
>> >> >> >> >  arch/x86/kvm/lapic.h |  10 +++++
>> >> >> >> >  arch/x86/kvm/vmx.c   |  26 +++++++++++++
>> >> >> >> >  arch/x86/kvm/x86.c   |   6 +++
>> >> >> >> >  4 files changed, 147 insertions(+), 3 deletions(-)
>> >> >> >> >
>> >> >> >> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> >> >> >> > index 5776473be362..a613bcfda59a 100644
>> >> >> >> > --- a/arch/x86/kvm/x86.c
>> >> >> >> > +++ b/arch/x86/kvm/x86.c
>> >> >> >> > @@ -6608,6 +6608,8 @@ static int vcpu_enter_guest(struct
>> >> >> >> > kvm_vcpu
>> >> >> >> *vcpu)
>> >> >> >> >
>> >> >> >> >     local_irq_disable();
>> >> >> >> >
>> >> >> >> > +   inject_expired_hwemul_timer(vcpu);
>> >> >> >>
>> >> >> >> Is this really fast enough (and does it trigger often enough)
>> >> >> >> that it is worth slowing down all vmenters?
>> >> >> >>
>> >> >> >> I'd rather call inject_expired_hwemul_timer from the
>> >> >> >> preemption timer vmexit handler instead.
>> >> >> >> inject_pending_hwemul_timer will set the preemption timer
>> >> >> >> countdown to zero if the deadline of the guest LAPIC timer
>> >> >> >> has passed already.  This should be relatively rare.
>> >> >> >
>> >> >> > Sure and will take this way on the new patch set. I'd give
>> >> >> > some reson why it's this way now. Originally this patch was
>> >> >> > for cyclictest on guest with latency less than 15us for 24
>> >> >> > hours. So, if the timer expires already before VM entry, we
>> >> >> > try to inject it immediately, instead of waiting for an extra
>> >> >> > VMExit, which may be 4~5 us.
>> >> >>
>> >> >> inject_expired_hwemul_timer() just set the pending bit, and
>> >> >> still need a vmexit to final exit to vcpu_run() which is the
>> >> >> only place to check pending and inject APIC_LVTT, so why add
>> >> >> inject_expired_hwemul_timer() in vcpu_enter_guest() can avoid an
>> >> >> extra vmexit?
>> >> >
>> >> > The inject_expired_hwemul_timer() will invoke the
>> >> > kvm_make_request(), which will cause the kvm try to inject the
>> >> > timer interrupt directly. Please notice the vcpu_enter_guest()
>> >> > will recheck the requests late.
>> >>
>> >> Actually I did't find another place inject pending timer irqs
>> >> except in vcpu_run, though kvm_set_pending_timer mention that
>> >> there is implicitly checked in vcpu_enter_guest.
>> >
>> > Hi, Wanpeng, thanks for the check. Please have a look on the
>> > changes to the arch/x86/kvm/x86.c on the patch, the
>> > inject_expired_hwemul_timer() is called twice there.
>>
>> Yes.
>>
>> >
>> > Of course, per Paolo's review, this code path will be removed on
>> > the next submission.
>>
>> So my question still is why there is no timer irqs inject in
>> vcpu_enter_guest, though kvm_set_pending_timer mention that there is
>> implicitly checked in vcpu_enter_guest.
>
> Do you mean the code at
> http://lxr.free-electrons.com/source/arch/x86/kvm/x86.c#L6607 ? It will check
> if ther are any event and if yes, it will exit.

I see, thanks. ;-)

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2016-05-24  1:32 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-20  1:44 [RFC PATCH 0/5] Utilizing VMX preemption for timer virtualization Yunhong Jiang
2016-05-20  1:44 ` [RFC PATCH 1/5] Add the kvm sched_out hook Yunhong Jiang
2016-05-20  1:45 ` [RFC PATCH 2/5] Utilize the vmx preemption timer Yunhong Jiang
2016-05-20  9:45   ` Paolo Bonzini
2016-05-20  1:45 ` [RFC PATCH 3/5] Separate the start_sw_tscdeadline Yunhong Jiang
2016-05-20 10:16   ` Paolo Bonzini
2016-05-20  1:45 ` [RFC PATCH 4/5] Utilize the vmx preemption timer for tsc deadline timer Yunhong Jiang
2016-05-20 10:34   ` Paolo Bonzini
2016-05-20 22:06     ` Jiang, Yunhong
2016-05-21 12:38       ` Paolo Bonzini
2016-05-22  0:21       ` Wanpeng Li
2016-05-23 22:58         ` yunhong jiang
2016-05-24  0:53           ` Wanpeng Li
2016-05-24  0:55             ` yunhong jiang
2016-05-24  1:16               ` Wanpeng Li
2016-05-24  1:20                 ` yunhong jiang
2016-05-24  1:32                   ` Wanpeng Li
2016-05-20  1:45 ` [RFC PATCH 5/5] Adding trace for the hwemul_timer Yunhong Jiang
2016-05-20 10:28   ` Paolo Bonzini
2016-05-20  6:03 ` [RFC PATCH 0/5] Utilizing VMX preemption for timer virtualization Jan Kiszka
2016-05-20  9:41   ` Paolo Bonzini
2016-05-20 21:50   ` Jiang, Yunhong
2016-05-20 18:18 ` Marcelo Tosatti
2016-05-20 18:21   ` Marcelo Tosatti
2016-05-20 20:49   ` Paolo Bonzini
2016-05-20 22:27     ` Jiang, Yunhong
2016-05-20 23:53       ` yunhong jiang
2016-05-20 22:18   ` Jiang, Yunhong
2016-05-21  0:45     ` Marcelo Tosatti

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.