All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC 0/7] kvm pvtimer
@ 2017-12-08  8:39 Quan Xu
  2017-12-08  8:39 ` [PATCH RFC 1/7] kvm: x86: emulate MSR_KVM_PV_TIMER_EN MSR Quan Xu
                   ` (7 more replies)
  0 siblings, 8 replies; 16+ messages in thread
From: Quan Xu @ 2017-12-08  8:39 UTC (permalink / raw)
  To: pbonzini, rkrcmar; +Cc: yang.zhang.wz, kvm, linux-kernel, Ben Luo

From: Ben Luo <bn0418@gmail.com>

This patchset introduces a new paravirtualized mechanism to reduce VM-exit
caused by guest timer accessing.

In general, KVM guest programs tsc-deadline timestamp to MSR_IA32_TSC_DEADLINE
MSR. This will cause a VM-exit, and then KVM handles this timer for guest.

Also kvm always registers timer on the CPU which vCPU was running on. Even
though vCPU thread is rescheduled to another CPU, the timer will be migrated
to the target CPU as well. When timer expired, timer interrupt could make
guest-mode vCPU VM-exit on this CPU.

When pvtimer is enabled:

   - The tsc-deadline timestamp is mostly recorded in share page with less
     VM-exit. We Introduce a periodically working kthread to scan share page
     and synchronize timer setting for guest on a dedicated CPU.

   - Since the working kthread scans periodically, some of the timer events
     may be lost or delayed. We have to program these tsc-deadline timestamps
     to MSR_IA32_TSC_DEADLINE as normal, which will cause VM-exit and KVM will
     signal the working thread through IPI to program timer, instread of
     registering on current CPU.

   - Timer interrupt will be delivered by posted interrupt mechanism to vCPUs
     with less VM-exit.

Ben Luo (7):
  kvm: x86: emulate MSR_KVM_PV_TIMER_EN MSR
  kvm: x86: add a function to exchange value
  KVM: timer: synchronize tsc-deadline timestamp for guest
  KVM: timer: program timer to a dedicated CPU
  KVM: timer: ignore timer migration if pvtimer is enabled
  Doc/KVM: introduce a new cpuid bit for kvm pvtimer
  kvm: guest: reprogram guest timer

 Documentation/virtual/kvm/cpuid.txt  |    4 +
 arch/x86/include/asm/kvm_host.h      |    5 +
 arch/x86/include/asm/kvm_para.h      |    9 ++
 arch/x86/include/uapi/asm/kvm_para.h |    7 ++
 arch/x86/kernel/apic/apic.c          |    9 +-
 arch/x86/kernel/kvm.c                |   38 ++++++++
 arch/x86/kvm/cpuid.c                 |    1 +
 arch/x86/kvm/lapic.c                 |  170 +++++++++++++++++++++++++++++++++-
 arch/x86/kvm/lapic.h                 |   11 ++
 arch/x86/kvm/x86.c                   |   15 +++-
 include/linux/kvm_host.h             |    3 +
 virt/kvm/kvm_main.c                  |   42 +++++++++
 12 files changed, 308 insertions(+), 6 deletions(-)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH RFC 1/7] kvm: x86: emulate MSR_KVM_PV_TIMER_EN MSR
  2017-12-08  8:39 [PATCH RFC 0/7] kvm pvtimer Quan Xu
@ 2017-12-08  8:39 ` Quan Xu
  2017-12-08  8:39 ` [PATCH RFC 2/7] kvm: x86: add a function to exchange value Quan Xu
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: Quan Xu @ 2017-12-08  8:39 UTC (permalink / raw)
  To: pbonzini, rkrcmar; +Cc: yang.zhang.wz, kvm, linux-kernel, Ben Luo, Quan Xu

From: Ben Luo <bn0418@gmail.com>

Guest enables pv timer functionality using this MSR

Signed-off-by: Yang Zhang <yang.zhang.wz@gmail.com>
Signed-off-by: Quan Xu <quan.xu0@gmail.com>
Signed-off-by: Ben Luo <bn0418@gmail.com>
---
 arch/x86/include/asm/kvm_host.h      |    5 +++++
 arch/x86/include/uapi/asm/kvm_para.h |    6 ++++++
 arch/x86/kvm/lapic.c                 |   22 ++++++++++++++++++++++
 arch/x86/kvm/lapic.h                 |    6 ++++++
 arch/x86/kvm/x86.c                   |    8 ++++++++
 5 files changed, 47 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c73e493..641b4aa 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -684,6 +684,11 @@ struct kvm_vcpu_arch {
 		bool pv_unhalted;
 	} pv;
 
+	struct {
+		u64 msr_val;
+		struct gfn_to_hva_cache data;
+	} pv_timer;
+
 	int pending_ioapic_eoi;
 	int pending_external_vector;
 
diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index 554aa8f..3dd6116 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -41,6 +41,7 @@
 #define MSR_KVM_ASYNC_PF_EN 0x4b564d02
 #define MSR_KVM_STEAL_TIME  0x4b564d03
 #define MSR_KVM_PV_EOI_EN      0x4b564d04
+#define MSR_KVM_PV_TIMER_EN    0x4b564d05
 
 struct kvm_steal_time {
 	__u64 steal;
@@ -64,6 +65,11 @@ struct kvm_clock_pairing {
 #define KVM_STEAL_VALID_BITS ((-1ULL << (KVM_STEAL_ALIGNMENT_BITS + 1)))
 #define KVM_STEAL_RESERVED_MASK (((1 << KVM_STEAL_ALIGNMENT_BITS) - 1 ) << 1)
 
+struct pvtimer_vcpu_event_info {
+	__u64 expire_tsc;
+	__u64 next_sync_tsc;
+} __attribute__((__packed__));
+
 #define KVM_MAX_MMU_OP_BATCH           32
 
 #define KVM_ASYNC_PF_ENABLED			(1 << 0)
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 36c90d6..55c9ba3 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1991,6 +1991,7 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event)
 		kvm_lapic_set_base(vcpu,
 				vcpu->arch.apic_base | MSR_IA32_APICBASE_BSP);
 	vcpu->arch.pv_eoi.msr_val = 0;
+	vcpu->arch.pv_timer.msr_val = 0;
 	apic_update_ppr(apic);
 	if (vcpu->arch.apicv_active) {
 		kvm_x86_ops->apicv_post_state_restore(vcpu);
@@ -2478,6 +2479,27 @@ int kvm_lapic_enable_pv_eoi(struct kvm_vcpu *vcpu, u64 data)
 					 addr, sizeof(u8));
 }
 
+int kvm_lapic_enable_pv_timer(struct kvm_vcpu *vcpu, u64 data)
+{
+	u64 addr = data & ~KVM_MSR_ENABLED;
+	int ret;
+
+	if (!lapic_in_kernel(vcpu))
+		return 1;
+
+	if (!IS_ALIGNED(addr, 4))
+		return 1;
+
+	vcpu->arch.pv_timer.msr_val = data;
+	if (!pv_timer_enabled(vcpu))
+		return 0;
+
+	ret = kvm_gfn_to_hva_cache_init(vcpu->kvm, &vcpu->arch.pv_timer.data,
+				addr, sizeof(struct pvtimer_vcpu_event_info));
+
+	return ret;
+}
+
 void kvm_apic_accept_events(struct kvm_vcpu *vcpu)
 {
 	struct kvm_lapic *apic = vcpu->arch.apic;
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 4b9935a..539a738 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -113,6 +113,7 @@ static inline bool kvm_hv_vapic_assist_page_enabled(struct kvm_vcpu *vcpu)
 }
 
 int kvm_lapic_enable_pv_eoi(struct kvm_vcpu *vcpu, u64 data);
+int kvm_lapic_enable_pv_timer(struct kvm_vcpu *vcpu, u64 data);
 void kvm_lapic_init(void);
 void kvm_lapic_exit(void);
 
@@ -207,6 +208,11 @@ static inline int kvm_lapic_latched_init(struct kvm_vcpu *vcpu)
 	return lapic_in_kernel(vcpu) && test_bit(KVM_APIC_INIT, &vcpu->arch.apic->pending_events);
 }
 
+static inline bool pv_timer_enabled(struct kvm_vcpu *vcpu)
+{
+	return vcpu->arch.pv_timer.msr_val & KVM_MSR_ENABLED;
+}
+
 bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector);
 
 void wait_lapic_expire(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 03869eb..5668774 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1025,6 +1025,7 @@ bool kvm_rdpmc(struct kvm_vcpu *vcpu)
 	HV_X64_MSR_STIMER0_CONFIG,
 	HV_X64_MSR_APIC_ASSIST_PAGE, MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME,
 	MSR_KVM_PV_EOI_EN,
+	MSR_KVM_PV_TIMER_EN,
 
 	MSR_IA32_TSC_ADJUST,
 	MSR_IA32_TSCDEADLINE,
@@ -2279,6 +2280,10 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		if (kvm_lapic_enable_pv_eoi(vcpu, data))
 			return 1;
 		break;
+	case MSR_KVM_PV_TIMER_EN:
+		if (kvm_lapic_enable_pv_timer(vcpu, data))
+			return 1;
+		break;
 
 	case MSR_IA32_MCG_CTL:
 	case MSR_IA32_MCG_STATUS:
@@ -2510,6 +2515,9 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_KVM_PV_EOI_EN:
 		msr_info->data = vcpu->arch.pv_eoi.msr_val;
 		break;
+	case MSR_KVM_PV_TIMER_EN:
+		msr_info->data = vcpu->arch.pv_timer.msr_val;
+		break;
 	case MSR_IA32_P5_MC_ADDR:
 	case MSR_IA32_P5_MC_TYPE:
 	case MSR_IA32_MCG_CAP:
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH RFC 2/7] kvm: x86: add a function to exchange value
  2017-12-08  8:39 [PATCH RFC 0/7] kvm pvtimer Quan Xu
  2017-12-08  8:39 ` [PATCH RFC 1/7] kvm: x86: emulate MSR_KVM_PV_TIMER_EN MSR Quan Xu
@ 2017-12-08  8:39 ` Quan Xu
  2017-12-08  8:39 ` [PATCH RFC 3/7] KVM: timer: synchronize tsc-deadline timestamp for guest Quan Xu
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: Quan Xu @ 2017-12-08  8:39 UTC (permalink / raw)
  To: pbonzini, rkrcmar; +Cc: yang.zhang.wz, kvm, linux-kernel, Ben Luo, Quan Xu

From: Ben Luo <bn0418@gmail.com>

Introduce kvm_xchg_guest_cached to exchange value with guest
page atomically.

Signed-off-by: Yang Zhang <yang.zhang.wz@gmail.com>
Signed-off-by: Quan Xu <quan.xu0@gmail.com>
Signed-off-by: Ben Luo <bn0418@gmail.com>
---
 include/linux/kvm_host.h |    3 +++
 virt/kvm/kvm_main.c      |   42 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 45 insertions(+), 0 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 6882538..32949ed 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -688,6 +688,9 @@ int kvm_write_guest_offset_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc,
 			   void *data, int offset, unsigned long len);
 int kvm_gfn_to_hva_cache_init(struct kvm *kvm, struct gfn_to_hva_cache *ghc,
 			      gpa_t gpa, unsigned long len);
+unsigned long kvm_xchg_guest_cached(struct kvm *kvm,
+	struct gfn_to_hva_cache *ghc, unsigned long offset,
+	unsigned long new, int size);
 int kvm_clear_guest_page(struct kvm *kvm, gfn_t gfn, int offset, int len);
 int kvm_clear_guest(struct kvm *kvm, gpa_t gpa, unsigned long len);
 struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 9deb5a2..3149e17 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2010,6 +2010,48 @@ int kvm_read_guest_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc,
 }
 EXPORT_SYMBOL_GPL(kvm_read_guest_cached);
 
+unsigned long kvm_xchg_guest_cached(struct kvm *kvm,
+		struct gfn_to_hva_cache *ghc, unsigned long offset,
+		unsigned long new, int size)
+{
+	unsigned long r;
+	void *kva;
+	struct page *page;
+	kvm_pfn_t pfn;
+
+	WARN_ON(offset > ghc->len);
+
+	pfn = gfn_to_pfn_atomic(kvm, ghc->gpa >> PAGE_SHIFT);
+	page = kvm_pfn_to_page(pfn);
+
+	if (is_error_page(page))
+		return -EFAULT;
+
+	kva = kmap_atomic(page) + offset_in_page(ghc->gpa) + offset;
+	switch (size) {
+	case 1:
+		r = xchg((char *)kva, new);
+		break;
+	case 2:
+		r = xchg((short *)kva, new);
+		break;
+	case 4:
+		r = xchg((int *)kva, new);
+		break;
+	case 8:
+		r = xchg((long *)kva, new);
+		break;
+	default:
+		kunmap_atomic(kva);
+		return -EFAULT;
+	}
+
+	kunmap_atomic(kva);
+	mark_page_dirty_in_slot(ghc->memslot, ghc->gpa >> PAGE_SHIFT);
+
+	return r;
+}
+
 int kvm_clear_guest_page(struct kvm *kvm, gfn_t gfn, int offset, int len)
 {
 	const void *zero_page = (const void *) __va(page_to_phys(ZERO_PAGE(0)));
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH RFC 3/7] KVM: timer: synchronize tsc-deadline timestamp for guest
  2017-12-08  8:39 [PATCH RFC 0/7] kvm pvtimer Quan Xu
  2017-12-08  8:39 ` [PATCH RFC 1/7] kvm: x86: emulate MSR_KVM_PV_TIMER_EN MSR Quan Xu
  2017-12-08  8:39 ` [PATCH RFC 2/7] kvm: x86: add a function to exchange value Quan Xu
@ 2017-12-08  8:39 ` Quan Xu
  2017-12-08 15:06   ` Konrad Rzeszutek Wilk
  2017-12-08  8:39 ` [PATCH RFC 4/7] KVM: timer: program timer to a dedicated CPU Quan Xu
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 16+ messages in thread
From: Quan Xu @ 2017-12-08  8:39 UTC (permalink / raw)
  To: pbonzini, rkrcmar; +Cc: yang.zhang.wz, kvm, linux-kernel, Ben Luo, Quan Xu

From: Ben Luo <bn0418@gmail.com>

In general, KVM guest programs tsc-deadline timestamp to
MSR_IA32_TSC_DEADLINE MSR. This will cause a VM-exit, and
then KVM handles this timer for guest.

The tsc-deadline timestamp is mostly recorded in share page
with less VM-exit. We Introduce a periodically working kthread
to scan share page and synchronize timer setting for guest
on a dedicated CPU.

Signed-off-by: Yang Zhang <yang.zhang.wz@gmail.com>
Signed-off-by: Quan Xu <quan.xu0@gmail.com>
Signed-off-by: Ben Luo <bn0418@gmail.com>
---
 arch/x86/kvm/lapic.c |  138 ++++++++++++++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/lapic.h |    5 ++
 2 files changed, 143 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 55c9ba3..20a23bb 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -36,6 +36,10 @@
 #include <asm/delay.h>
 #include <linux/atomic.h>
 #include <linux/jump_label.h>
+#include <linux/ktime.h>
+#include <linux/kthread.h>
+#include <linux/module.h>
+#include <linux/mmu_context.h>
 #include "kvm_cache_regs.h"
 #include "irq.h"
 #include "trace.h"
@@ -70,6 +74,12 @@
 #define APIC_BROADCAST			0xFF
 #define X2APIC_BROADCAST		0xFFFFFFFFul
 
+static struct hrtimer pv_sync_timer;
+static long pv_timer_period_ns = PVTIMER_PERIOD_NS;
+static struct task_struct *pv_timer_polling_worker;
+
+module_param(pv_timer_period_ns, long, 0644);
+
 static inline int apic_test_vector(int vec, void *bitmap)
 {
 	return test_bit(VEC_POS(vec), (bitmap) + REG_POS(vec));
@@ -2542,8 +2552,130 @@ void kvm_apic_accept_events(struct kvm_vcpu *vcpu)
 	}
 }
 
+static enum hrtimer_restart pv_sync_timer_callback(struct hrtimer *timer)
+{
+	hrtimer_forward_now(timer, ns_to_ktime(pv_timer_period_ns));
+	wake_up_process(pv_timer_polling_worker);
+
+	return HRTIMER_RESTART;
+}
+
+void kvm_apic_sync_pv_timer(void *data)
+{
+	struct kvm_vcpu *vcpu = data;
+	struct kvm_lapic *apic = vcpu->arch.apic;
+	unsigned long flags, this_tsc_khz = vcpu->arch.virtual_tsc_khz;
+	u64 guest_tsc, expire_tsc;
+	long rem_tsc;
+
+	if (!lapic_in_kernel(vcpu) || !pv_timer_enabled(vcpu))
+		return;
+
+	local_irq_save(flags);
+	guest_tsc = kvm_read_l1_tsc(vcpu, rdtsc());
+	rem_tsc = ktime_to_ns(hrtimer_get_remaining(&pv_sync_timer))
+			* this_tsc_khz;
+	if (rem_tsc <= 0)
+		rem_tsc += pv_timer_period_ns * this_tsc_khz;
+	do_div(rem_tsc, 1000000L);
+
+	/*
+	 * make sure guest_tsc and rem_tsc are assigned before to update
+	 * next_sync_tsc.
+	 */
+	smp_wmb();
+	kvm_xchg_guest_cached(vcpu->kvm, &vcpu->arch.pv_timer.data,
+		offsetof(struct pvtimer_vcpu_event_info, next_sync_tsc),
+		guest_tsc + rem_tsc, 8);
+
+	/* make sure next_sync_tsc is visible */
+	smp_wmb();
+
+	expire_tsc = kvm_xchg_guest_cached(vcpu->kvm, &vcpu->arch.pv_timer.data,
+			offsetof(struct pvtimer_vcpu_event_info, expire_tsc),
+			0UL, 8);
+
+	/* make sure expire_tsc is visible */
+	smp_wmb();
+
+	if (expire_tsc) {
+		if (expire_tsc > guest_tsc)
+			/*
+			 * As we bind this thread to a dedicated CPU through
+			 * IPI, the timer is registered on that dedicated
+			 * CPU here.
+			 */
+			kvm_set_lapic_tscdeadline_msr(apic->vcpu, expire_tsc);
+		else
+			/* deliver immediately if expired */
+			kvm_apic_local_deliver(apic, APIC_LVTT);
+	}
+	local_irq_restore(flags);
+}
+
+static int pv_timer_polling(void *arg)
+{
+	struct kvm *kvm;
+	struct kvm_vcpu *vcpu;
+	int i;
+	mm_segment_t oldfs = get_fs();
+
+	while (1) {
+		set_current_state(TASK_INTERRUPTIBLE);
+
+		if (kthread_should_stop()) {
+			__set_current_state(TASK_RUNNING);
+			break;
+		}
+
+		spin_lock(&kvm_lock);
+		__set_current_state(TASK_RUNNING);
+		list_for_each_entry(kvm, &vm_list, vm_list) {
+			set_fs(USER_DS);
+			use_mm(kvm->mm);
+			kvm_for_each_vcpu(i, vcpu, kvm) {
+				kvm_apic_sync_pv_timer(vcpu);
+			}
+			unuse_mm(kvm->mm);
+			set_fs(oldfs);
+		}
+
+		spin_unlock(&kvm_lock);
+
+		schedule();
+	}
+
+	return 0;
+}
+
+static void kvm_pv_timer_init(void)
+{
+	ktime_t ktime = ktime_set(0, pv_timer_period_ns);
+
+	hrtimer_init(&pv_sync_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_PINNED);
+	pv_sync_timer.function = &pv_sync_timer_callback;
+
+	/* kthread for pv_timer sync buffer */
+	pv_timer_polling_worker = kthread_create(pv_timer_polling, NULL,
+						"pv_timer_polling_worker/%d",
+						PVTIMER_SYNC_CPU);
+	if (IS_ERR(pv_timer_polling_worker)) {
+		pr_warn_once("kvm: failed to create thread for pv_timer\n");
+		pv_timer_polling_worker = NULL;
+		hrtimer_cancel(&pv_sync_timer);
+
+		return;
+	}
+
+	kthread_bind(pv_timer_polling_worker, PVTIMER_SYNC_CPU);
+	wake_up_process(pv_timer_polling_worker);
+	hrtimer_start(&pv_sync_timer, ktime, HRTIMER_MODE_REL);
+}
+
 void kvm_lapic_init(void)
 {
+	kvm_pv_timer_init();
+
 	/* do not patch jump label more than once per second */
 	jump_label_rate_limit(&apic_hw_disabled, HZ);
 	jump_label_rate_limit(&apic_sw_disabled, HZ);
@@ -2551,6 +2683,12 @@ void kvm_lapic_init(void)
 
 void kvm_lapic_exit(void)
 {
+	if (pv_timer_polling_worker) {
+		hrtimer_cancel(&pv_sync_timer);
+		kthread_stop(pv_timer_polling_worker);
+		pv_timer_polling_worker = NULL;
+	}
+
 	static_key_deferred_flush(&apic_hw_disabled);
 	static_key_deferred_flush(&apic_sw_disabled);
 }
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 539a738..4588d59 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -16,6 +16,9 @@
 #define APIC_BUS_CYCLE_NS       1
 #define APIC_BUS_FREQUENCY      (1000000000ULL / APIC_BUS_CYCLE_NS)
 
+#define PVTIMER_SYNC_CPU	(NR_CPUS - 1) /* dedicated CPU */
+#define PVTIMER_PERIOD_NS	250000L /* pvtimer default period */
+
 struct kvm_timer {
 	struct hrtimer timer;
 	s64 period; 				/* unit: ns */
@@ -213,6 +216,8 @@ static inline bool pv_timer_enabled(struct kvm_vcpu *vcpu)
 	return vcpu->arch.pv_timer.msr_val & KVM_MSR_ENABLED;
 }
 
+void kvm_apic_sync_pv_timer(void *data);
+
 bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector);
 
 void wait_lapic_expire(struct kvm_vcpu *vcpu);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH RFC 4/7] KVM: timer: program timer to a dedicated CPU
  2017-12-08  8:39 [PATCH RFC 0/7] kvm pvtimer Quan Xu
                   ` (2 preceding siblings ...)
  2017-12-08  8:39 ` [PATCH RFC 3/7] KVM: timer: synchronize tsc-deadline timestamp for guest Quan Xu
@ 2017-12-08  8:39 ` Quan Xu
  2017-12-08  8:39 ` [PATCH RFC 5/7] KVM: timer: ignore timer migration if pvtimer is enabled Quan Xu
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: Quan Xu @ 2017-12-08  8:39 UTC (permalink / raw)
  To: pbonzini, rkrcmar; +Cc: yang.zhang.wz, kvm, linux-kernel, Ben Luo, Quan Xu

From: Ben Luo <bn0418@gmail.com>

KVM always registers timer on the CPU which vCPU was running on.
Even though vCPU thread is rescheduled to another CPU, the timer
will be migrated to the target CPU as well. When timer expired,
timer interrupt could make guest-mode vCPU VM-exit on this CPU.

Since the working kthread scans periodically, some of the timer
events may be lost or delayed. We have to program these tsc-
deadline timestamps to MSR_IA32_TSC_DEADLINE as normal, which
will cause VM-exit and KVM will signal the working thread through
IPI to program timer, instread of registering on current CPU.

Signed-off-by: Yang Zhang <yang.zhang.wz@gmail.com>
Signed-off-by: Quan Xu <quan.xu0@gmail.com>
Signed-off-by: Ben Luo <bn0418@gmail.com>
---
 arch/x86/kvm/lapic.c |    8 +++++++-
 arch/x86/kvm/x86.c   |    7 ++++++-
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 20a23bb..5835a27 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2072,7 +2072,13 @@ static enum hrtimer_restart apic_timer_fn(struct hrtimer *data)
 	struct kvm_timer *ktimer = container_of(data, struct kvm_timer, timer);
 	struct kvm_lapic *apic = container_of(ktimer, struct kvm_lapic, lapic_timer);
 
-	apic_timer_expired(apic);
+
+	if (pv_timer_enabled(apic->vcpu)) {
+		kvm_apic_local_deliver(apic, APIC_LVTT);
+		if (apic_lvtt_tscdeadline(apic))
+			apic->lapic_timer.tscdeadline = 0;
+	} else
+		apic_timer_expired(apic);
 
 	if (lapic_is_periodic(apic)) {
 		advance_periodic_target_expiration(apic);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5668774..3cbb223 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -26,6 +26,7 @@
 #include "tss.h"
 #include "kvm_cache_regs.h"
 #include "x86.h"
+#include "lapic.h"
 #include "cpuid.h"
 #include "pmu.h"
 #include "hyperv.h"
@@ -2196,7 +2197,11 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case APIC_BASE_MSR ... APIC_BASE_MSR + 0x3ff:
 		return kvm_x2apic_msr_write(vcpu, msr, data);
 	case MSR_IA32_TSCDEADLINE:
-		kvm_set_lapic_tscdeadline_msr(vcpu, data);
+		if (pv_timer_enabled(vcpu))
+			smp_call_function_single(PVTIMER_SYNC_CPU,
+					kvm_apic_sync_pv_timer, vcpu, 0);
+		else
+			kvm_set_lapic_tscdeadline_msr(vcpu, data);
 		break;
 	case MSR_IA32_TSC_ADJUST:
 		if (guest_cpuid_has(vcpu, X86_FEATURE_TSC_ADJUST)) {
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH RFC 5/7] KVM: timer: ignore timer migration if pvtimer is enabled
  2017-12-08  8:39 [PATCH RFC 0/7] kvm pvtimer Quan Xu
                   ` (3 preceding siblings ...)
  2017-12-08  8:39 ` [PATCH RFC 4/7] KVM: timer: program timer to a dedicated CPU Quan Xu
@ 2017-12-08  8:39 ` Quan Xu
  2017-12-08  8:39 ` [PATCH RFC 6/7] Doc/KVM: introduce a new cpuid bit for kvm pvtimer Quan Xu
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: Quan Xu @ 2017-12-08  8:39 UTC (permalink / raw)
  To: pbonzini, rkrcmar; +Cc: yang.zhang.wz, kvm, linux-kernel, Ben Luo, Quan Xu

From: Ben Luo <bn0418@gmail.com>

When pvtimer is enabled, KVM programs timer to a dedicated CPU
through IPI. Whether the vCPU is on the dedicated CPU or any
other CPU, the timer interrupt will be delivered properly.
No need to migrate timer.

Signed-off-by: Yang Zhang <yang.zhang.wz@gmail.com>
Signed-off-by: Quan Xu <quan.xu0@gmail.com>
Signed-off-by: Ben Luo <bn0418@gmail.com>
---
 arch/x86/kvm/lapic.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 5835a27..265efe6 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2282,7 +2282,7 @@ void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu)
 {
 	struct hrtimer *timer;
 
-	if (!lapic_in_kernel(vcpu))
+	if (!lapic_in_kernel(vcpu) || pv_timer_enabled(vcpu))
 		return;
 
 	timer = &vcpu->arch.apic->lapic_timer.timer;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH RFC 6/7] Doc/KVM: introduce a new cpuid bit for kvm pvtimer
  2017-12-08  8:39 [PATCH RFC 0/7] kvm pvtimer Quan Xu
                   ` (4 preceding siblings ...)
  2017-12-08  8:39 ` [PATCH RFC 5/7] KVM: timer: ignore timer migration if pvtimer is enabled Quan Xu
@ 2017-12-08  8:39 ` Quan Xu
  2017-12-08  8:39 ` [PATCH RFC 7/7] kvm: guest: reprogram guest timer Quan Xu
  2017-12-08 15:10 ` [PATCH RFC 0/7] kvm pvtimer Konrad Rzeszutek Wilk
  7 siblings, 0 replies; 16+ messages in thread
From: Quan Xu @ 2017-12-08  8:39 UTC (permalink / raw)
  To: pbonzini, rkrcmar; +Cc: yang.zhang.wz, kvm, linux-kernel, Ben Luo, Quan Xu

From: Ben Luo <bn0418@gmail.com>

KVM_FEATURE_PV_TIMER enables guest to check whether pvtimer
can be enabled in guest.

Signed-off-by: Yang Zhang <yang.zhang.wz@gmail.com>
Signed-off-by: Quan Xu <quan.xu0@gmail.com>
Signed-off-by: Ben Luo <bn0418@gmail.com>
---
 Documentation/virtual/kvm/cpuid.txt  |    4 ++++
 arch/x86/include/uapi/asm/kvm_para.h |    1 +
 arch/x86/kvm/cpuid.c                 |    1 +
 3 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/Documentation/virtual/kvm/cpuid.txt b/Documentation/virtual/kvm/cpuid.txt
index 3c65feb..b26b31c 100644
--- a/Documentation/virtual/kvm/cpuid.txt
+++ b/Documentation/virtual/kvm/cpuid.txt
@@ -54,6 +54,10 @@ KVM_FEATURE_PV_UNHALT              ||     7 || guest checks this feature bit
                                    ||       || before enabling paravirtualized
                                    ||       || spinlock support.
 ------------------------------------------------------------------------------
+KVM_FEATURE_PV_TIMER               ||     8 || guest checks this feature bit
+                                   ||       || before enabling paravirtualized
+                                   ||       || timer support.
+------------------------------------------------------------------------------
 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||    24 || host will warn if no guest-side
                                    ||       || per-cpu warps are expected in
                                    ||       || kvmclock.
diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index 3dd6116..46734a8 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -25,6 +25,7 @@
 #define KVM_FEATURE_STEAL_TIME		5
 #define KVM_FEATURE_PV_EOI		6
 #define KVM_FEATURE_PV_UNHALT		7
+#define KVM_FEATURE_PV_TIMER		8
 
 /* The last 8 bits are used to indicate how to interpret the flags field
  * in pvclock structure. If no bits are set, all flags are ignored.
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 0099e10..e02fd23 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -593,6 +593,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 			     (1 << KVM_FEATURE_CLOCKSOURCE2) |
 			     (1 << KVM_FEATURE_ASYNC_PF) |
 			     (1 << KVM_FEATURE_PV_EOI) |
+			     (1 << KVM_FEATURE_PV_TIMER) |
 			     (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) |
 			     (1 << KVM_FEATURE_PV_UNHALT);
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH RFC 7/7] kvm: guest: reprogram guest timer
  2017-12-08  8:39 [PATCH RFC 0/7] kvm pvtimer Quan Xu
                   ` (5 preceding siblings ...)
  2017-12-08  8:39 ` [PATCH RFC 6/7] Doc/KVM: introduce a new cpuid bit for kvm pvtimer Quan Xu
@ 2017-12-08  8:39 ` Quan Xu
  2017-12-08 15:10 ` [PATCH RFC 0/7] kvm pvtimer Konrad Rzeszutek Wilk
  7 siblings, 0 replies; 16+ messages in thread
From: Quan Xu @ 2017-12-08  8:39 UTC (permalink / raw)
  To: pbonzini, rkrcmar; +Cc: yang.zhang.wz, kvm, linux-kernel, Ben Luo, Quan Xu

From: Ben Luo <bn0418@gmail.com>

In general, KVM guest programs tsc-deadline timestamp to
MSR_IA32_TSC_DEADLINE MSR.

When pvtimer is enabled, we introduce a new mechanism to
reprogram KVM guest timer. A periodically working kthread
scans share page and synchronize timer setting for guest
on a dedicated CPU. The next time event of the periodically
working kthread is a threshold to decide whether to program
tsc-deadline timestamp to MSR_IA32_TSC_DEADLINE MSR, or to
share page.

Signed-off-by: Yang Zhang <yang.zhang.wz@gmail.com>
Signed-off-by: Quan Xu <quan.xu0@gmail.com>
Signed-off-by: Ben Luo <bn0418@gmail.com>
---
 arch/x86/include/asm/kvm_para.h |    9 +++++++++
 arch/x86/kernel/apic/apic.c     |    9 ++++++---
 arch/x86/kernel/kvm.c           |   38 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 53 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index c373e44..109e706 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -5,6 +5,7 @@
 #include <asm/processor.h>
 #include <asm/alternative.h>
 #include <uapi/asm/kvm_para.h>
+#include <linux/hrtimer.h>
 
 extern void kvmclock_init(void);
 extern int kvm_register_clock(char *txt);
@@ -92,6 +93,8 @@ static inline long kvm_hypercall4(unsigned int nr, unsigned long p1,
 void kvm_async_pf_task_wait(u32 token, int interrupt_kernel);
 void kvm_async_pf_task_wake(u32 token);
 u32 kvm_read_and_reset_pf_reason(void);
+int kvm_pv_timer_next_event(unsigned long tsc,
+		struct clock_event_device *evt);
 extern void kvm_disable_steal_time(void);
 
 #ifdef CONFIG_PARAVIRT_SPINLOCKS
@@ -126,6 +129,12 @@ static inline void kvm_disable_steal_time(void)
 {
 	return;
 }
+
+static inline int kvm_pv_timer_next_event(unsigned long tsc,
+			struct clock_event_device *evt)
+{
+	return 0;
+}
 #endif
 
 #endif /* _ASM_X86_KVM_PARA_H */
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index ff89177..286c1b3 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -471,10 +471,13 @@ static int lapic_next_event(unsigned long delta,
 static int lapic_next_deadline(unsigned long delta,
 			       struct clock_event_device *evt)
 {
-	u64 tsc;
+	u64 tsc = rdtsc() + (((u64) delta) * TSC_DIVISOR);
 
-	tsc = rdtsc();
-	wrmsrl(MSR_IA32_TSC_DEADLINE, tsc + (((u64) delta) * TSC_DIVISOR));
+	/* TODO: undisciplined function call */
+	if (kvm_pv_timer_next_event(tsc, evt))
+		return 0;
+
+	wrmsrl(MSR_IA32_TSC_DEADLINE, tsc);
 	return 0;
 }
 
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 8bb9594..ec7aff1 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -328,6 +328,35 @@ static notrace void kvm_guest_apic_eoi_write(u32 reg, u32 val)
 	apic->native_eoi_write(APIC_EOI, APIC_EOI_ACK);
 }
 
+static DEFINE_PER_CPU(int, pvtimer_enabled);
+static DEFINE_PER_CPU(struct pvtimer_vcpu_event_info,
+		      pvtimer_shared_buf) = {0};
+#define PVTIMER_PADDING		25000
+int kvm_pv_timer_next_event(unsigned long tsc,
+		struct clock_event_device *evt)
+{
+	struct pvtimer_vcpu_event_info *src;
+	u64 now;
+
+	if (!this_cpu_read(pvtimer_enabled))
+		return false;
+
+	src = this_cpu_ptr(&pvtimer_shared_buf);
+	xchg((u64 *)&src->expire_tsc, tsc);
+
+	barrier();
+
+	if (tsc < src->next_sync_tsc)
+		return false;
+
+	rdtscll(now);
+	if (tsc < now || tsc - now < PVTIMER_PADDING)
+		return false;
+
+	return true;
+}
+EXPORT_SYMBOL_GPL(kvm_pv_timer_next_event);
+
 static void kvm_guest_cpu_init(void)
 {
 	if (!kvm_para_available())
@@ -362,6 +391,15 @@ static void kvm_guest_cpu_init(void)
 
 	if (has_steal_clock)
 		kvm_register_steal_time();
+
+	if (kvm_para_has_feature(KVM_FEATURE_PV_TIMER)) {
+		unsigned long data;
+
+		data  = slow_virt_to_phys(this_cpu_ptr(&pvtimer_shared_buf))
+					  | KVM_MSR_ENABLED;
+		wrmsrl(MSR_KVM_PV_TIMER_EN, data);
+		this_cpu_write(pvtimer_enabled, 1);
+	}
 }
 
 static void kvm_pv_disable_apf(void)
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC 3/7] KVM: timer: synchronize tsc-deadline timestamp for guest
  2017-12-08  8:39 ` [PATCH RFC 3/7] KVM: timer: synchronize tsc-deadline timestamp for guest Quan Xu
@ 2017-12-08 15:06   ` Konrad Rzeszutek Wilk
  2017-12-14  1:54     ` Quan Xu
  0 siblings, 1 reply; 16+ messages in thread
From: Konrad Rzeszutek Wilk @ 2017-12-08 15:06 UTC (permalink / raw)
  To: Quan Xu; +Cc: pbonzini, rkrcmar, yang.zhang.wz, kvm, linux-kernel, Ben Luo

On Fri, Dec 08, 2017 at 04:39:46PM +0800, Quan Xu wrote:
> From: Ben Luo <bn0418@gmail.com>
> 
> In general, KVM guest programs tsc-deadline timestamp to
> MSR_IA32_TSC_DEADLINE MSR. This will cause a VM-exit, and
> then KVM handles this timer for guest.
> 
> The tsc-deadline timestamp is mostly recorded in share page
> with less VM-exit. We Introduce a periodically working kthread
> to scan share page and synchronize timer setting for guest
> on a dedicated CPU.

That sounds like a race. Meaning the guest may put too small window
and this 'working thread to scan' may not get to it fast enough?

Meaning we miss the deadline to inject the timer in the guest.

Or is this part of this PV MSR semantics - that it will only work
for certain amount of values and anything less than say 1ms
should not use the PV MSR?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC 0/7] kvm pvtimer
  2017-12-08  8:39 [PATCH RFC 0/7] kvm pvtimer Quan Xu
                   ` (6 preceding siblings ...)
  2017-12-08  8:39 ` [PATCH RFC 7/7] kvm: guest: reprogram guest timer Quan Xu
@ 2017-12-08 15:10 ` Konrad Rzeszutek Wilk
       [not found]   ` <CAFv8KnF1Re7Zn2LVvqh=Sr8MmaWJJwZBB1i5ws03GvPVKXRjzA@mail.gmail.com>
  7 siblings, 1 reply; 16+ messages in thread
From: Konrad Rzeszutek Wilk @ 2017-12-08 15:10 UTC (permalink / raw)
  To: Quan Xu; +Cc: pbonzini, rkrcmar, yang.zhang.wz, kvm, linux-kernel, Ben Luo

On Fri, Dec 08, 2017 at 04:39:43PM +0800, Quan Xu wrote:
> From: Ben Luo <bn0418@gmail.com>
> 
> This patchset introduces a new paravirtualized mechanism to reduce VM-exit
> caused by guest timer accessing.

And how bad is this blib in arming the timer?

And how often do you get this timer to be armed? OR better yet - what
are the workloads in which you found this VMExit to be painful?

Thanks. Or better yet - what
are the workloads in which you found this VMExit to be painful?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC 0/7] kvm pvtimer
       [not found]   ` <CAFv8KnF1Re7Zn2LVvqh=Sr8MmaWJJwZBB1i5ws03GvPVKXRjzA@mail.gmail.com>
@ 2017-12-13 16:28     ` Konrad Rzeszutek Wilk
  2017-12-14  2:32       ` Quan Xu
  2017-12-14 11:56       ` Paolo Bonzini
  0 siblings, 2 replies; 16+ messages in thread
From: Konrad Rzeszutek Wilk @ 2017-12-13 16:28 UTC (permalink / raw)
  To: Quan Xu; +Cc: Paolo Bonzini, Radim Krcmar, Yang Zhang, kvm, LKML, Ben Luo

On Wed, Dec 13, 2017 at 11:25:13PM +0800, Quan Xu wrote:
> On Fri, Dec 8, 2017 at 11:10 PM, Konrad Rzeszutek Wilk <
> konrad.wilk@oracle.com> wrote:
> 
> > On Fri, Dec 08, 2017 at 04:39:43PM +0800, Quan Xu wrote:
> > > From: Ben Luo <bn0418@gmail.com>
> > >
> > > This patchset introduces a new paravirtualized mechanism to reduce
> > VM-exit
> > > caused by guest timer accessing.
> >
> > And how bad is this blib in arming the timer?
> >
> > And how often do you get this timer to be armed? OR better yet - what
> > are the workloads in which you found this VMExit to be painful?
> >
> > Thanks. Or better yet - what
> > are the workloads in which you found this VMExit to be painful?
> >
> 
> one painful point is from VM idle path..
>  for some network req/resp services, or benchmark of  process context
> switches..

So:

1) VM idle path and network req/resp services:

Does this go away if you don't hit the idle path? Meaning if you
loop without hitting HLT/MWAIT? I am assuming the issue you are facing
is the latency - that is first time the guest comes from HLT and
responds to the packet the latency is much higher than without?

And the arming of the timer? 
2) process context switches.

Is that related to the 1)? That is the 'schedule' call and the process
going to sleep waiting for an interrupt or timer?

This all sounds like issues with low-CPU usage workloads where you
need low latency responses?

> 
> 
> 
> Quan
> Alibaba Cloud

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC 3/7] KVM: timer: synchronize tsc-deadline timestamp for guest
  2017-12-08 15:06   ` Konrad Rzeszutek Wilk
@ 2017-12-14  1:54     ` Quan Xu
  0 siblings, 0 replies; 16+ messages in thread
From: Quan Xu @ 2017-12-14  1:54 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: pbonzini, rkrcmar, yang.zhang.wz, kvm, linux-kernel, Ben Luo



On 2017/12/08 23:06, Konrad Rzeszutek Wilk wrote:
> On Fri, Dec 08, 2017 at 04:39:46PM +0800, Quan Xu wrote:
>> From: Ben Luo <bn0418@gmail.com>
>>
>> In general, KVM guest programs tsc-deadline timestamp to
>> MSR_IA32_TSC_DEADLINE MSR. This will cause a VM-exit, and
>> then KVM handles this timer for guest.
>>
>> The tsc-deadline timestamp is mostly recorded in share page
>> with less VM-exit. We Introduce a periodically working kthread
>> to scan share page and synchronize timer setting for guest
>> on a dedicated CPU.
> That sounds like a race. Meaning the guest may put too small window
> and this 'working thread to scan' may not get to it fast enough?
yes, you are right. So ..
> .
> Meaning we miss the deadline to inject the timer in the guest.
>
> Or is this part of this PV MSR semantics - that it will only work
> for certain amount of values and anything less than say 1ms
> should not use the PV MSR?

..
for these timers, We have to program these tsc-deadline timestamps
to MSR_IA32_TSC_DEADLINE as normal, which will cause VM-exit and KVM will
signal the working thread through IPI to program timer, instead of
registering on current CPU (patch 0004).

more detail in patch 0007.

Quan
Alibaba Cloud

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC 0/7] kvm pvtimer
  2017-12-13 16:28     ` Konrad Rzeszutek Wilk
@ 2017-12-14  2:32       ` Quan Xu
  2017-12-14 11:56       ` Paolo Bonzini
  1 sibling, 0 replies; 16+ messages in thread
From: Quan Xu @ 2017-12-14  2:32 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Paolo Bonzini, Radim Krcmar, Yang Zhang, kvm, LKML, Ben Luo



On 2017/12/14 00:28, Konrad Rzeszutek Wilk wrote:
> On Wed, Dec 13, 2017 at 11:25:13PM +0800, Quan Xu wrote:
>> On Fri, Dec 8, 2017 at 11:10 PM, Konrad Rzeszutek Wilk <
>> konrad.wilk@oracle.com> wrote:
>>
>>> On Fri, Dec 08, 2017 at 04:39:43PM +0800, Quan Xu wrote:
>>>> From: Ben Luo <bn0418@gmail.com>
>>>>
>>>> This patchset introduces a new paravirtualized mechanism to reduce
>>> VM-exit
>>>> caused by guest timer accessing.
>>> And how bad is this blib in arming the timer?
>>>
>>> And how often do you get this timer to be armed? OR better yet - what
>>> are the workloads in which you found this VMExit to be painful?
>>>
>>> Thanks. Or better yet - what
>>> are the workloads in which you found this VMExit to be painful?
>>>
>> one painful point is from VM idle path..
>>   for some network req/resp services, or benchmark of  process context
>> switches..
> So:
>
> 1) VM idle path and network req/resp services:
>
> Does this go away if you don't hit the idle path? Meaning if you
> loop without hitting HLT/MWAIT?
   we still hit HLT.. we can use it with 
https://lkml.org/lkml/2017/8/29/279 ..
>   I am assuming the issue you are facing
> is the latency - that is first time the guest comes from HLT and
> responds to the packet the latency is much higher than without?
yes,
> And the arming of the timer?
> 2) process context switches.
>
> Is that related to the 1)? That is the 'schedule' call and the process
> going to sleep waiting for an interrupt or timer?
>
> This all sounds like issues with low-CPU usage workloads where you
> need low latency responses?
yes,  it is also helpful to some timer-intensive services.

Quan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC 0/7] kvm pvtimer
  2017-12-13 16:28     ` Konrad Rzeszutek Wilk
  2017-12-14  2:32       ` Quan Xu
@ 2017-12-14 11:56       ` Paolo Bonzini
  2017-12-14 12:06         ` Quan Xu
  1 sibling, 1 reply; 16+ messages in thread
From: Paolo Bonzini @ 2017-12-14 11:56 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Quan Xu
  Cc: Radim Krcmar, Yang Zhang, kvm, LKML, Ben Luo

On 13/12/2017 17:28, Konrad Rzeszutek Wilk wrote:
> 1) VM idle path and network req/resp services:
> 
> Does this go away if you don't hit the idle path? Meaning if you
> loop without hitting HLT/MWAIT? I am assuming the issue you are facing
> is the latency - that is first time the guest comes from HLT and
> responds to the packet the latency is much higher than without?
> 
> And the arming of the timer? 
> 2) process context switches.
> 
> Is that related to the 1)? That is the 'schedule' call and the process
> going to sleep waiting for an interrupt or timer?
> 
> This all sounds like issues with low-CPU usage workloads where you
> need low latency responses?

Even high-CPU usage, as long as there is a small idle time.  The cost of
setting the TSC deadline timer twice is about 3000 cycles.

However, I think Amazon's approach of not intercepting HLT/MWAIT/PAUSE
can recover most of the performance and it's way less intrusive.

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC 0/7] kvm pvtimer
  2017-12-14 11:56       ` Paolo Bonzini
@ 2017-12-14 12:06         ` Quan Xu
  2017-12-14 13:00           ` Paolo Bonzini
  0 siblings, 1 reply; 16+ messages in thread
From: Quan Xu @ 2017-12-14 12:06 UTC (permalink / raw)
  To: Paolo Bonzini, Konrad Rzeszutek Wilk
  Cc: Radim Krcmar, Yang Zhang, kvm, LKML, Ben Luo



On 2017/12/14 19:56, Paolo Bonzini wrote:
> On 13/12/2017 17:28, Konrad Rzeszutek Wilk wrote:
>> 1) VM idle path and network req/resp services:
>>
>> Does this go away if you don't hit the idle path? Meaning if you
>> loop without hitting HLT/MWAIT? I am assuming the issue you are facing
>> is the latency - that is first time the guest comes from HLT and
>> responds to the packet the latency is much higher than without?
>>
>> And the arming of the timer?
>> 2) process context switches.
>>
>> Is that related to the 1)? That is the 'schedule' call and the process
>> going to sleep waiting for an interrupt or timer?
>>
>> This all sounds like issues with low-CPU usage workloads where you
>> need low latency responses?
> Even high-CPU usage, as long as there is a small idle time.  The cost of
> setting the TSC deadline timer twice is about 3000 cycles.
>
> However, I think Amazon's approach of not intercepting HLT/MWAIT/PAUSE
> can recover most of the performance and it's way less intrusive.

   Paolo, could you share the Amazon's patch or the LML link? thanks.


Quan

> Thanks,
>
> Paolo
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC 0/7] kvm pvtimer
  2017-12-14 12:06         ` Quan Xu
@ 2017-12-14 13:00           ` Paolo Bonzini
  0 siblings, 0 replies; 16+ messages in thread
From: Paolo Bonzini @ 2017-12-14 13:00 UTC (permalink / raw)
  To: Quan Xu, Konrad Rzeszutek Wilk
  Cc: Radim Krcmar, Yang Zhang, kvm, LKML, Ben Luo

On 14/12/2017 13:06, Quan Xu wrote:
> 
> 
> On 2017/12/14 19:56, Paolo Bonzini wrote:
>> On 13/12/2017 17:28, Konrad Rzeszutek Wilk wrote:
>>> 1) VM idle path and network req/resp services:
>>>
>>> Does this go away if you don't hit the idle path? Meaning if you
>>> loop without hitting HLT/MWAIT? I am assuming the issue you are facing
>>> is the latency - that is first time the guest comes from HLT and
>>> responds to the packet the latency is much higher than without?
>>>
>>> And the arming of the timer?
>>> 2) process context switches.
>>>
>>> Is that related to the 1)? That is the 'schedule' call and the process
>>> going to sleep waiting for an interrupt or timer?
>>>
>>> This all sounds like issues with low-CPU usage workloads where you
>>> need low latency responses?
>> Even high-CPU usage, as long as there is a small idle time.  The cost of
>> setting the TSC deadline timer twice is about 3000 cycles.
>>
>> However, I think Amazon's approach of not intercepting HLT/MWAIT/PAUSE
>> can recover most of the performance and it's way less intrusive.
> 
> Paolo, could you share the Amazon's patch or the LML link? thanks.

Here: https://www.spinics.net/lists/kvm/msg159356.html

Paolo

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2017-12-14 13:01 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-08  8:39 [PATCH RFC 0/7] kvm pvtimer Quan Xu
2017-12-08  8:39 ` [PATCH RFC 1/7] kvm: x86: emulate MSR_KVM_PV_TIMER_EN MSR Quan Xu
2017-12-08  8:39 ` [PATCH RFC 2/7] kvm: x86: add a function to exchange value Quan Xu
2017-12-08  8:39 ` [PATCH RFC 3/7] KVM: timer: synchronize tsc-deadline timestamp for guest Quan Xu
2017-12-08 15:06   ` Konrad Rzeszutek Wilk
2017-12-14  1:54     ` Quan Xu
2017-12-08  8:39 ` [PATCH RFC 4/7] KVM: timer: program timer to a dedicated CPU Quan Xu
2017-12-08  8:39 ` [PATCH RFC 5/7] KVM: timer: ignore timer migration if pvtimer is enabled Quan Xu
2017-12-08  8:39 ` [PATCH RFC 6/7] Doc/KVM: introduce a new cpuid bit for kvm pvtimer Quan Xu
2017-12-08  8:39 ` [PATCH RFC 7/7] kvm: guest: reprogram guest timer Quan Xu
2017-12-08 15:10 ` [PATCH RFC 0/7] kvm pvtimer Konrad Rzeszutek Wilk
     [not found]   ` <CAFv8KnF1Re7Zn2LVvqh=Sr8MmaWJJwZBB1i5ws03GvPVKXRjzA@mail.gmail.com>
2017-12-13 16:28     ` Konrad Rzeszutek Wilk
2017-12-14  2:32       ` Quan Xu
2017-12-14 11:56       ` Paolo Bonzini
2017-12-14 12:06         ` Quan Xu
2017-12-14 13:00           ` Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.