linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/4] KVM: LAPIC: Implement Exitless Timer
@ 2019-06-21  9:39 Wanpeng Li
  2019-06-21  9:39 ` [PATCH v5 1/4] KVM: LAPIC: Make lapic timer unpinned Wanpeng Li
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Wanpeng Li @ 2019-06-21  9:39 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Radim Krčmář, Marcelo Tosatti

Dedicated instances are currently disturbed by unnecessary jitter due 
to the emulated lapic timers fire on the same pCPUs which vCPUs resident.
There is no hardware virtual timer on Intel for guest like ARM. Both 
programming timer in guest and the emulated timer fires incur vmexits.
This patchset tries to avoid vmexit which is incurred by the emulated 
timer fires in dedicated instance scenario. 

When nohz_full is enabled in dedicated instances scenario, the unpinned 
timer will be moved to the nearest busy housekeepers after commit
9642d18eee2cd (nohz: Affine unpinned timers to housekeepers) and commit 
444969223c8 ("sched/nohz: Fix affine unpinned timers mess"). However, 
KVM always makes lapic timer pinned to the pCPU which vCPU residents, the 
reason is explained by commit 61abdbe0 (kvm: x86: make lapic hrtimer 
pinned). Actually, these emulated timers can be offload to the housekeeping 
cpus since APICv is really common in recent years. The guest timer interrupt 
is injected by posted-interrupt which is delivered by housekeeping cpu 
once the emulated timer fires. 

The host admin should fine tuned, e.g. dedicated instances scenario w/ 
nohz_full cover the pCPUs which vCPUs resident, several pCPUs surplus 
for busy housekeeping, disable mwait/hlt/pause vmexits to keep in non-root  
mode, ~3% redis performance benefit can be observed on Skylake server.

w/o patchset:

            VM-EXIT  Samples  Samples%  Time%   Min Time  Max Time   Avg time

EXTERNAL_INTERRUPT    42916    49.43%   39.30%   0.47us   106.09us   0.71us ( +-   1.09% )

w/ patchset:

            VM-EXIT  Samples  Samples%  Time%   Min Time  Max Time         Avg time

EXTERNAL_INTERRUPT    6871     9.29%     2.96%   0.44us    57.88us   0.72us ( +-   4.02% )

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>

v4 -> v5:
 * update patch description in patch 1/4
 * feed latest apic->lapic_timer.expired_tscdeadline to kvm_wait_lapic_expire()
 * squash advance timer handling to patch 2/4

v3 -> v4:
 * drop the HRTIMER_MODE_ABS_PINNED, add kick after set pending timer
 * don't posted inject already-expired timer

v2 -> v3:
 * disarming the vmx preemption timer when posted_interrupt_inject_timer_enabled()
 * check kvm_hlt_in_guest instead

v1 -> v2:
 * check vcpu_halt_in_guest
 * move module parameter from kvm-intel to kvm
 * add housekeeping_enabled
 * rename apic_timer_expired_pi to kvm_apic_inject_pending_timer_irqs


Wanpeng Li (4):
  KVM: LAPIC: Make lapic timer unpinned
  KVM: LAPIC: Inject timer interrupt via posted interrupt
  KVM: LAPIC: Ignore timer migration when lapic timer is injected by pi
  KVM: LAPIC: Don't inject already-expired timer via posted interrupt

 arch/x86/kvm/lapic.c            | 68 +++++++++++++++++++++++++++--------------
 arch/x86/kvm/lapic.h            |  3 +-
 arch/x86/kvm/svm.c              |  2 +-
 arch/x86/kvm/vmx/vmx.c          |  5 +--
 arch/x86/kvm/x86.c              | 11 ++++---
 arch/x86/kvm/x86.h              |  2 ++
 include/linux/sched/isolation.h |  2 ++
 kernel/sched/isolation.c        |  6 ++++
 8 files changed, 67 insertions(+), 32 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v5 1/4] KVM: LAPIC: Make lapic timer unpinned
  2019-06-21  9:39 [PATCH v5 0/4] KVM: LAPIC: Implement Exitless Timer Wanpeng Li
@ 2019-06-21  9:39 ` Wanpeng Li
  2019-06-21  9:40 ` [PATCH v5 2/4] KVM: LAPIC: Inject timer interrupt via posted interrupt Wanpeng Li
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: Wanpeng Li @ 2019-06-21  9:39 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Radim Krčmář, Marcelo Tosatti

From: Wanpeng Li <wanpengli@tencent.com>

Commit 61abdbe0bcc2 ("kvm: x86: make lapic hrtimer pinned") pinned the 
lapic timer to avoid to wait until the next kvm exit for the guest to 
see KVM_REQ_PENDING_TIMER set. There is another solution to give a kick 
after setting the KVM_REQ_PENDING_TIMER bit, make lapic timer unpinned 
will be used in follow up patches.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
 arch/x86/kvm/lapic.c | 8 ++++----
 arch/x86/kvm/x86.c   | 6 +-----
 2 files changed, 5 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index e82a18c..87ecb56 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1578,7 +1578,7 @@ static void start_sw_tscdeadline(struct kvm_lapic *apic)
 	    likely(ns > apic->lapic_timer.timer_advance_ns)) {
 		expire = ktime_add_ns(now, ns);
 		expire = ktime_sub_ns(expire, ktimer->timer_advance_ns);
-		hrtimer_start(&ktimer->timer, expire, HRTIMER_MODE_ABS_PINNED);
+		hrtimer_start(&ktimer->timer, expire, HRTIMER_MODE_ABS);
 	} else
 		apic_timer_expired(apic);
 
@@ -1680,7 +1680,7 @@ static void start_sw_period(struct kvm_lapic *apic)
 
 	hrtimer_start(&apic->lapic_timer.timer,
 		apic->lapic_timer.target_expiration,
-		HRTIMER_MODE_ABS_PINNED);
+		HRTIMER_MODE_ABS);
 }
 
 bool kvm_lapic_hv_timer_in_use(struct kvm_vcpu *vcpu)
@@ -2317,7 +2317,7 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu, int timer_advance_ns)
 	apic->vcpu = vcpu;
 
 	hrtimer_init(&apic->lapic_timer.timer, CLOCK_MONOTONIC,
-		     HRTIMER_MODE_ABS_PINNED);
+		     HRTIMER_MODE_ABS);
 	apic->lapic_timer.timer.function = apic_timer_fn;
 	if (timer_advance_ns == -1) {
 		apic->lapic_timer.timer_advance_ns = 1000;
@@ -2506,7 +2506,7 @@ void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu)
 
 	timer = &vcpu->arch.apic->lapic_timer.timer;
 	if (hrtimer_cancel(timer))
-		hrtimer_start_expires(timer, HRTIMER_MODE_ABS_PINNED);
+		hrtimer_start_expires(timer, HRTIMER_MODE_ABS);
 }
 
 /*
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index eb5cd25..d7c757d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1437,12 +1437,8 @@ static void update_pvclock_gtod(struct timekeeper *tk)
 
 void kvm_set_pending_timer(struct kvm_vcpu *vcpu)
 {
-	/*
-	 * Note: KVM_REQ_PENDING_TIMER is implicitly checked in
-	 * vcpu_enter_guest.  This function is only called from
-	 * the physical CPU that is running vcpu.
-	 */
 	kvm_make_request(KVM_REQ_PENDING_TIMER, vcpu);
+	kvm_vcpu_kick(vcpu);
 }
 
 static void kvm_write_wall_clock(struct kvm *kvm, gpa_t wall_clock)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v5 2/4] KVM: LAPIC: Inject timer interrupt via posted interrupt
  2019-06-21  9:39 [PATCH v5 0/4] KVM: LAPIC: Implement Exitless Timer Wanpeng Li
  2019-06-21  9:39 ` [PATCH v5 1/4] KVM: LAPIC: Make lapic timer unpinned Wanpeng Li
@ 2019-06-21  9:40 ` Wanpeng Li
  2019-07-05 12:37   ` Paolo Bonzini
  2019-06-21  9:40 ` [PATCH v5 3/4] KVM: LAPIC: Ignore timer migration when lapic timer is injected by pi Wanpeng Li
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Wanpeng Li @ 2019-06-21  9:40 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Radim Krčmář, Marcelo Tosatti

From: Wanpeng Li <wanpengli@tencent.com>

Dedicated instances are currently disturbed by unnecessary jitter due 
to the emulated lapic timers fire on the same pCPUs which vCPUs resident.
There is no hardware virtual timer on Intel for guest like ARM. Both 
programming timer in guest and the emulated timer fires incur vmexits.
This patch tries to avoid vmexit which is incurred by the emulated 
timer fires in dedicated instance scenario. 

When nohz_full is enabled in dedicated instances scenario, the emulated 
timers can be offload to the nearest busy housekeeping cpus since APICv 
is really common in recent years. The guest timer interrupt is injected 
by posted-interrupt which is delivered by housekeeping cpu once the emulated 
timer fires. 

The host admin should fine tuned, e.g. dedicated instances scenario w/ 
nohz_full cover the pCPUs which vCPUs resident, several pCPUs surplus 
for busy housekeeping, disable mwait/hlt/pause vmexits to keep in non-root  
mode, ~3% redis performance benefit can be observed on Skylake server.

w/o patch:

            VM-EXIT  Samples  Samples%  Time%   Min Time  Max Time   Avg time

EXTERNAL_INTERRUPT    42916    49.43%   39.30%   0.47us   106.09us   0.71us ( +-   1.09% )

w/ patch:

            VM-EXIT  Samples  Samples%  Time%   Min Time  Max Time         Avg time

EXTERNAL_INTERRUPT    6871     9.29%     2.96%   0.44us    57.88us   0.72us ( +-   4.02% )

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
 arch/x86/kvm/lapic.c            | 45 ++++++++++++++++++++++++++++++-----------
 arch/x86/kvm/lapic.h            |  3 ++-
 arch/x86/kvm/svm.c              |  2 +-
 arch/x86/kvm/vmx/vmx.c          |  5 +++--
 arch/x86/kvm/x86.c              |  5 +++++
 arch/x86/kvm/x86.h              |  2 ++
 include/linux/sched/isolation.h |  2 ++
 kernel/sched/isolation.c        |  6 ++++++
 8 files changed, 54 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 87ecb56..8869d30 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -122,6 +122,13 @@ static inline u32 kvm_x2apic_id(struct kvm_lapic *apic)
 	return apic->vcpu->vcpu_id;
 }
 
+bool posted_interrupt_inject_timer(struct kvm_vcpu *vcpu)
+{
+	return pi_inject_timer && kvm_vcpu_apicv_active(vcpu) &&
+		kvm_hlt_in_guest(vcpu->kvm);
+}
+EXPORT_SYMBOL_GPL(posted_interrupt_inject_timer);
+
 static inline bool kvm_apic_map_get_logical_dest(struct kvm_apic_map *map,
 		u32 dest_id, struct kvm_lapic ***cluster, u16 *mask) {
 	switch (map->mode) {
@@ -1432,6 +1439,19 @@ static void apic_update_lvtt(struct kvm_lapic *apic)
 	}
 }
 
+static void kvm_apic_inject_pending_timer_irqs(struct kvm_lapic *apic)
+{
+	struct kvm_timer *ktimer = &apic->lapic_timer;
+
+	kvm_apic_local_deliver(apic, APIC_LVTT);
+	if (apic_lvtt_tscdeadline(apic))
+		ktimer->tscdeadline = 0;
+	if (apic_lvtt_oneshot(apic)) {
+		ktimer->tscdeadline = 0;
+		ktimer->target_expiration = 0;
+	}
+}
+
 static void apic_timer_expired(struct kvm_lapic *apic)
 {
 	struct kvm_vcpu *vcpu = apic->vcpu;
@@ -1441,6 +1461,16 @@ static void apic_timer_expired(struct kvm_lapic *apic)
 	if (atomic_read(&apic->lapic_timer.pending))
 		return;
 
+	if (apic_lvtt_tscdeadline(apic) || ktimer->hv_timer_in_use)
+		ktimer->expired_tscdeadline = ktimer->tscdeadline;
+
+	if (posted_interrupt_inject_timer(apic->vcpu)) {
+		if (apic->lapic_timer.timer_advance_ns)
+			kvm_wait_lapic_expire(vcpu, true);
+		kvm_apic_inject_pending_timer_irqs(apic);
+		return;
+	}
+
 	atomic_inc(&apic->lapic_timer.pending);
 	kvm_set_pending_timer(vcpu);
 
@@ -1450,9 +1480,6 @@ static void apic_timer_expired(struct kvm_lapic *apic)
 	 */
 	if (swait_active(q))
 		swake_up_one(q);
-
-	if (apic_lvtt_tscdeadline(apic) || ktimer->hv_timer_in_use)
-		ktimer->expired_tscdeadline = ktimer->tscdeadline;
 }
 
 /*
@@ -1528,7 +1555,7 @@ static inline void adjust_lapic_timer_advance(struct kvm_vcpu *vcpu,
 	apic->lapic_timer.timer_advance_ns = timer_advance_ns;
 }
 
-void kvm_wait_lapic_expire(struct kvm_vcpu *vcpu)
+void kvm_wait_lapic_expire(struct kvm_vcpu *vcpu, bool pi_inject)
 {
 	struct kvm_lapic *apic = vcpu->arch.apic;
 	u64 guest_tsc, tsc_deadline;
@@ -1536,7 +1563,7 @@ void kvm_wait_lapic_expire(struct kvm_vcpu *vcpu)
 	if (apic->lapic_timer.expired_tscdeadline == 0)
 		return;
 
-	if (!lapic_timer_int_injected(vcpu))
+	if (!lapic_timer_int_injected(vcpu) && !pi_inject)
 		return;
 
 	tsc_deadline = apic->lapic_timer.expired_tscdeadline;
@@ -2373,13 +2400,7 @@ void kvm_inject_apic_timer_irqs(struct kvm_vcpu *vcpu)
 	struct kvm_lapic *apic = vcpu->arch.apic;
 
 	if (atomic_read(&apic->lapic_timer.pending) > 0) {
-		kvm_apic_local_deliver(apic, APIC_LVTT);
-		if (apic_lvtt_tscdeadline(apic))
-			apic->lapic_timer.tscdeadline = 0;
-		if (apic_lvtt_oneshot(apic)) {
-			apic->lapic_timer.tscdeadline = 0;
-			apic->lapic_timer.target_expiration = 0;
-		}
+		kvm_apic_inject_pending_timer_irqs(apic);
 		atomic_set(&apic->lapic_timer.pending, 0);
 	}
 }
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 3674717..3d8a043 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -225,7 +225,7 @@ static inline int kvm_lapic_latched_init(struct kvm_vcpu *vcpu)
 
 bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector);
 
-void kvm_wait_lapic_expire(struct kvm_vcpu *vcpu);
+void kvm_wait_lapic_expire(struct kvm_vcpu *vcpu, bool pi_inject);
 
 bool kvm_intr_is_single_vcpu_fast(struct kvm *kvm, struct kvm_lapic_irq *irq,
 			struct kvm_vcpu **dest_vcpu);
@@ -236,6 +236,7 @@ void kvm_lapic_switch_to_hv_timer(struct kvm_vcpu *vcpu);
 void kvm_lapic_expired_hv_timer(struct kvm_vcpu *vcpu);
 bool kvm_lapic_hv_timer_in_use(struct kvm_vcpu *vcpu);
 void kvm_lapic_restart_hv_timer(struct kvm_vcpu *vcpu);
+bool posted_interrupt_inject_timer(struct kvm_vcpu *vcpu);
 
 static inline enum lapic_mode kvm_apic_mode(u64 apic_base)
 {
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index bbc31f7..7e65de4 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -5648,7 +5648,7 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
 
 	if (lapic_in_kernel(vcpu) &&
 		vcpu->arch.apic->lapic_timer.timer_advance_ns)
-		kvm_wait_lapic_expire(vcpu);
+		kvm_wait_lapic_expire(vcpu, false);
 
 	/*
 	 * If this vCPU has touched SPEC_CTRL, restore the guest's value if
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index d174b62..f74eb6a 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6468,7 +6468,7 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
 
 	if (lapic_in_kernel(vcpu) &&
 		vcpu->arch.apic->lapic_timer.timer_advance_ns)
-		kvm_wait_lapic_expire(vcpu);
+		kvm_wait_lapic_expire(vcpu, false);
 
 	/*
 	 * If this vCPU has touched SPEC_CTRL, restore the guest's value if
@@ -7065,7 +7065,8 @@ static int vmx_set_hv_timer(struct kvm_vcpu *vcpu, u64 guest_deadline_tsc,
 	u64 tscl, guest_tscl, delta_tsc, lapic_timer_advance_cycles;
 	struct kvm_timer *ktimer = &vcpu->arch.apic->lapic_timer;
 
-	if (kvm_mwait_in_guest(vcpu->kvm))
+	if (kvm_mwait_in_guest(vcpu->kvm) ||
+		posted_interrupt_inject_timer(vcpu))
 		return -EOPNOTSUPP;
 
 	vmx = to_vmx(vcpu);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d7c757d..74ee1f4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -54,6 +54,7 @@
 #include <linux/kvm_irqfd.h>
 #include <linux/irqbypass.h>
 #include <linux/sched/stat.h>
+#include <linux/sched/isolation.h>
 #include <linux/mem_encrypt.h>
 
 #include <trace/events/kvm.h>
@@ -155,6 +156,9 @@ EXPORT_SYMBOL_GPL(enable_vmware_backdoor);
 static bool __read_mostly force_emulation_prefix = false;
 module_param(force_emulation_prefix, bool, S_IRUGO);
 
+bool __read_mostly pi_inject_timer = 0;
+module_param(pi_inject_timer, bool, S_IRUGO | S_IWUSR);
+
 #define KVM_NR_SHARED_MSRS 16
 
 struct kvm_shared_msrs_global {
@@ -7032,6 +7036,7 @@ int kvm_arch_init(void *opaque)
 		host_xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK);
 
 	kvm_lapic_init();
+	pi_inject_timer = housekeeping_enabled(HK_FLAG_TIMER);
 #ifdef CONFIG_X86_64
 	pvclock_gtod_register_notifier(&pvclock_gtod_notifier);
 
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index e08a128..10b26f4 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -301,6 +301,8 @@ extern unsigned int min_timer_period_us;
 
 extern bool enable_vmware_backdoor;
 
+extern bool pi_inject_timer;
+
 extern struct static_key kvm_no_apic_vcpu;
 
 static inline u64 nsec_to_cycles(struct kvm_vcpu *vcpu, u64 nsec)
diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
index b0fb144..6fc5407 100644
--- a/include/linux/sched/isolation.h
+++ b/include/linux/sched/isolation.h
@@ -19,6 +19,7 @@ enum hk_flags {
 DECLARE_STATIC_KEY_FALSE(housekeeping_overridden);
 extern int housekeeping_any_cpu(enum hk_flags flags);
 extern const struct cpumask *housekeeping_cpumask(enum hk_flags flags);
+extern bool housekeeping_enabled(enum hk_flags flags);
 extern void housekeeping_affine(struct task_struct *t, enum hk_flags flags);
 extern bool housekeeping_test_cpu(int cpu, enum hk_flags flags);
 extern void __init housekeeping_init(void);
@@ -38,6 +39,7 @@ static inline const struct cpumask *housekeeping_cpumask(enum hk_flags flags)
 static inline void housekeeping_affine(struct task_struct *t,
 				       enum hk_flags flags) { }
 static inline void housekeeping_init(void) { }
+static inline bool housekeeping_enabled(enum hk_flags flags) { }
 #endif /* CONFIG_CPU_ISOLATION */
 
 static inline bool housekeeping_cpu(int cpu, enum hk_flags flags)
diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
index 123ea07..ccb2808 100644
--- a/kernel/sched/isolation.c
+++ b/kernel/sched/isolation.c
@@ -14,6 +14,12 @@ EXPORT_SYMBOL_GPL(housekeeping_overridden);
 static cpumask_var_t housekeeping_mask;
 static unsigned int housekeeping_flags;
 
+bool housekeeping_enabled(enum hk_flags flags)
+{
+	return !!(housekeeping_flags & flags);
+}
+EXPORT_SYMBOL_GPL(housekeeping_enabled);
+
 int housekeeping_any_cpu(enum hk_flags flags)
 {
 	if (static_branch_unlikely(&housekeeping_overridden))
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v5 3/4] KVM: LAPIC: Ignore timer migration when lapic timer is injected by pi
  2019-06-21  9:39 [PATCH v5 0/4] KVM: LAPIC: Implement Exitless Timer Wanpeng Li
  2019-06-21  9:39 ` [PATCH v5 1/4] KVM: LAPIC: Make lapic timer unpinned Wanpeng Li
  2019-06-21  9:40 ` [PATCH v5 2/4] KVM: LAPIC: Inject timer interrupt via posted interrupt Wanpeng Li
@ 2019-06-21  9:40 ` Wanpeng Li
  2019-06-21  9:40 ` [PATCH v5 4/4] KVM: LAPIC: Don't inject already-expired timer via posted interrupt Wanpeng Li
  2019-07-02 16:38 ` [PATCH v5 0/4] KVM: LAPIC: Implement Exitless Timer Paolo Bonzini
  4 siblings, 0 replies; 14+ messages in thread
From: Wanpeng Li @ 2019-06-21  9:40 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Radim Krčmář, Marcelo Tosatti

From: Wanpeng Li <wanpengli@tencent.com>

When lapic timer is injected by posted-interrupt, the emulated timer is
offload to the housekeeping cpu. The timer interrupt will be delivered
properly, no need to migrate timer.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
 arch/x86/kvm/lapic.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 8869d30..ae575c0 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2522,7 +2522,8 @@ void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu)
 {
 	struct hrtimer *timer;
 
-	if (!lapic_in_kernel(vcpu))
+	if (!lapic_in_kernel(vcpu) ||
+		posted_interrupt_inject_timer(vcpu))
 		return;
 
 	timer = &vcpu->arch.apic->lapic_timer.timer;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v5 4/4] KVM: LAPIC: Don't inject already-expired timer via posted interrupt
  2019-06-21  9:39 [PATCH v5 0/4] KVM: LAPIC: Implement Exitless Timer Wanpeng Li
                   ` (2 preceding siblings ...)
  2019-06-21  9:40 ` [PATCH v5 3/4] KVM: LAPIC: Ignore timer migration when lapic timer is injected by pi Wanpeng Li
@ 2019-06-21  9:40 ` Wanpeng Li
  2019-07-05 12:40   ` Paolo Bonzini
  2019-07-02 16:38 ` [PATCH v5 0/4] KVM: LAPIC: Implement Exitless Timer Paolo Bonzini
  4 siblings, 1 reply; 14+ messages in thread
From: Wanpeng Li @ 2019-06-21  9:40 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Radim Krčmář, Marcelo Tosatti

From: Wanpeng Li <wanpengli@tencent.com>

already-expired timer interrupt can be injected to guest when vCPU who 
arms the lapic timer re-vmentry, don't posted inject in this case.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
 arch/x86/kvm/lapic.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index ae575c0..7cd95ea 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1452,7 +1452,7 @@ static void kvm_apic_inject_pending_timer_irqs(struct kvm_lapic *apic)
 	}
 }
 
-static void apic_timer_expired(struct kvm_lapic *apic)
+static void apic_timer_expired(struct kvm_lapic *apic, bool can_pi_inject)
 {
 	struct kvm_vcpu *vcpu = apic->vcpu;
 	struct swait_queue_head *q = &vcpu->wq;
@@ -1464,7 +1464,7 @@ static void apic_timer_expired(struct kvm_lapic *apic)
 	if (apic_lvtt_tscdeadline(apic) || ktimer->hv_timer_in_use)
 		ktimer->expired_tscdeadline = ktimer->tscdeadline;
 
-	if (posted_interrupt_inject_timer(apic->vcpu)) {
+	if (can_pi_inject && posted_interrupt_inject_timer(apic->vcpu)) {
 		if (apic->lapic_timer.timer_advance_ns)
 			kvm_wait_lapic_expire(vcpu, true);
 		kvm_apic_inject_pending_timer_irqs(apic);
@@ -1607,7 +1607,7 @@ static void start_sw_tscdeadline(struct kvm_lapic *apic)
 		expire = ktime_sub_ns(expire, ktimer->timer_advance_ns);
 		hrtimer_start(&ktimer->timer, expire, HRTIMER_MODE_ABS);
 	} else
-		apic_timer_expired(apic);
+		apic_timer_expired(apic, false);
 
 	local_irq_restore(flags);
 }
@@ -1697,7 +1697,7 @@ static void start_sw_period(struct kvm_lapic *apic)
 
 	if (ktime_after(ktime_get(),
 			apic->lapic_timer.target_expiration)) {
-		apic_timer_expired(apic);
+		apic_timer_expired(apic, false);
 
 		if (apic_lvtt_oneshot(apic))
 			return;
@@ -1759,7 +1759,7 @@ static bool start_hv_timer(struct kvm_lapic *apic)
 		if (atomic_read(&ktimer->pending)) {
 			cancel_hv_timer(apic);
 		} else if (expired) {
-			apic_timer_expired(apic);
+			apic_timer_expired(apic, false);
 			cancel_hv_timer(apic);
 		}
 	}
@@ -1809,7 +1809,7 @@ void kvm_lapic_expired_hv_timer(struct kvm_vcpu *vcpu)
 		goto out;
 	WARN_ON(swait_active(&vcpu->wq));
 	cancel_hv_timer(apic);
-	apic_timer_expired(apic);
+	apic_timer_expired(apic, false);
 
 	if (apic_lvtt_period(apic) && apic->lapic_timer.period) {
 		advance_periodic_target_expiration(apic);
@@ -2312,7 +2312,7 @@ static enum hrtimer_restart apic_timer_fn(struct hrtimer *data)
 	struct kvm_timer *ktimer = container_of(data, struct kvm_timer, timer);
 	struct kvm_lapic *apic = container_of(ktimer, struct kvm_lapic, lapic_timer);
 
-	apic_timer_expired(apic);
+	apic_timer_expired(apic, true);
 
 	if (lapic_is_periodic(apic)) {
 		advance_periodic_target_expiration(apic);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 0/4] KVM: LAPIC: Implement Exitless Timer
  2019-06-21  9:39 [PATCH v5 0/4] KVM: LAPIC: Implement Exitless Timer Wanpeng Li
                   ` (3 preceding siblings ...)
  2019-06-21  9:40 ` [PATCH v5 4/4] KVM: LAPIC: Don't inject already-expired timer via posted interrupt Wanpeng Li
@ 2019-07-02 16:38 ` Paolo Bonzini
  2019-07-02 22:23   ` Marcelo Tosatti
  4 siblings, 1 reply; 14+ messages in thread
From: Paolo Bonzini @ 2019-07-02 16:38 UTC (permalink / raw)
  To: Wanpeng Li, linux-kernel, kvm
  Cc: Radim Krčmář, Marcelo Tosatti

On 21/06/19 11:39, Wanpeng Li wrote:
> Dedicated instances are currently disturbed by unnecessary jitter due 
> to the emulated lapic timers fire on the same pCPUs which vCPUs resident.
> There is no hardware virtual timer on Intel for guest like ARM. Both 
> programming timer in guest and the emulated timer fires incur vmexits.
> This patchset tries to avoid vmexit which is incurred by the emulated 
> timer fires in dedicated instance scenario. 
> 
> When nohz_full is enabled in dedicated instances scenario, the unpinned 
> timer will be moved to the nearest busy housekeepers after commit
> 9642d18eee2cd (nohz: Affine unpinned timers to housekeepers) and commit 
> 444969223c8 ("sched/nohz: Fix affine unpinned timers mess"). However, 
> KVM always makes lapic timer pinned to the pCPU which vCPU residents, the 
> reason is explained by commit 61abdbe0 (kvm: x86: make lapic hrtimer 
> pinned). Actually, these emulated timers can be offload to the housekeeping 
> cpus since APICv is really common in recent years. The guest timer interrupt 
> is injected by posted-interrupt which is delivered by housekeeping cpu 
> once the emulated timer fires. 
> 
> The host admin should fine tuned, e.g. dedicated instances scenario w/ 
> nohz_full cover the pCPUs which vCPUs resident, several pCPUs surplus 
> for busy housekeeping, disable mwait/hlt/pause vmexits to keep in non-root  
> mode, ~3% redis performance benefit can be observed on Skylake server.

Marcelo,

does this patch work for you or can you still see the oops?

Thanks,

Paolo

> w/o patchset:
> 
>             VM-EXIT  Samples  Samples%  Time%   Min Time  Max Time   Avg time
> 
> EXTERNAL_INTERRUPT    42916    49.43%   39.30%   0.47us   106.09us   0.71us ( +-   1.09% )
> 
> w/ patchset:
> 
>             VM-EXIT  Samples  Samples%  Time%   Min Time  Max Time         Avg time
> 
> EXTERNAL_INTERRUPT    6871     9.29%     2.96%   0.44us    57.88us   0.72us ( +-   4.02% )
> 
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Cc: Marcelo Tosatti <mtosatti@redhat.com>
> 
> v4 -> v5:
>  * update patch description in patch 1/4
>  * feed latest apic->lapic_timer.expired_tscdeadline to kvm_wait_lapic_expire()
>  * squash advance timer handling to patch 2/4
> 
> v3 -> v4:
>  * drop the HRTIMER_MODE_ABS_PINNED, add kick after set pending timer
>  * don't posted inject already-expired timer
> 
> v2 -> v3:
>  * disarming the vmx preemption timer when posted_interrupt_inject_timer_enabled()
>  * check kvm_hlt_in_guest instead
> 
> v1 -> v2:
>  * check vcpu_halt_in_guest
>  * move module parameter from kvm-intel to kvm
>  * add housekeeping_enabled
>  * rename apic_timer_expired_pi to kvm_apic_inject_pending_timer_irqs
> 
> 
> Wanpeng Li (4):
>   KVM: LAPIC: Make lapic timer unpinned
>   KVM: LAPIC: Inject timer interrupt via posted interrupt
>   KVM: LAPIC: Ignore timer migration when lapic timer is injected by pi
>   KVM: LAPIC: Don't inject already-expired timer via posted interrupt
> 
>  arch/x86/kvm/lapic.c            | 68 +++++++++++++++++++++++++++--------------
>  arch/x86/kvm/lapic.h            |  3 +-
>  arch/x86/kvm/svm.c              |  2 +-
>  arch/x86/kvm/vmx/vmx.c          |  5 +--
>  arch/x86/kvm/x86.c              | 11 ++++---
>  arch/x86/kvm/x86.h              |  2 ++
>  include/linux/sched/isolation.h |  2 ++
>  kernel/sched/isolation.c        |  6 ++++
>  8 files changed, 67 insertions(+), 32 deletions(-)
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 0/4] KVM: LAPIC: Implement Exitless Timer
  2019-07-02 16:38 ` [PATCH v5 0/4] KVM: LAPIC: Implement Exitless Timer Paolo Bonzini
@ 2019-07-02 22:23   ` Marcelo Tosatti
  2019-07-03  0:47     ` Wanpeng Li
  0 siblings, 1 reply; 14+ messages in thread
From: Marcelo Tosatti @ 2019-07-02 22:23 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Wanpeng Li, linux-kernel, kvm, Radim Krčmář

On Tue, Jul 02, 2019 at 06:38:56PM +0200, Paolo Bonzini wrote:
> On 21/06/19 11:39, Wanpeng Li wrote:
> > Dedicated instances are currently disturbed by unnecessary jitter due 
> > to the emulated lapic timers fire on the same pCPUs which vCPUs resident.
> > There is no hardware virtual timer on Intel for guest like ARM. Both 
> > programming timer in guest and the emulated timer fires incur vmexits.
> > This patchset tries to avoid vmexit which is incurred by the emulated 
> > timer fires in dedicated instance scenario. 
> > 
> > When nohz_full is enabled in dedicated instances scenario, the unpinned 
> > timer will be moved to the nearest busy housekeepers after commit
> > 9642d18eee2cd (nohz: Affine unpinned timers to housekeepers) and commit 
> > 444969223c8 ("sched/nohz: Fix affine unpinned timers mess"). However, 
> > KVM always makes lapic timer pinned to the pCPU which vCPU residents, the 
> > reason is explained by commit 61abdbe0 (kvm: x86: make lapic hrtimer 
> > pinned). Actually, these emulated timers can be offload to the housekeeping 
> > cpus since APICv is really common in recent years. The guest timer interrupt 
> > is injected by posted-interrupt which is delivered by housekeeping cpu 
> > once the emulated timer fires. 
> > 
> > The host admin should fine tuned, e.g. dedicated instances scenario w/ 
> > nohz_full cover the pCPUs which vCPUs resident, several pCPUs surplus 
> > for busy housekeeping, disable mwait/hlt/pause vmexits to keep in non-root  
> > mode, ~3% redis performance benefit can be observed on Skylake server.
> 
> Marcelo,
> 
> does this patch work for you or can you still see the oops?

Hi Paolo,

No more oopses with kvm/queue. Can you include:

Index: kvm/arch/x86/kvm/lapic.c
===================================================================
--- kvm.orig/arch/x86/kvm/lapic.c
+++ kvm/arch/x86/kvm/lapic.c
@@ -124,8 +124,7 @@ static inline u32 kvm_x2apic_id(struct k
 
 bool posted_interrupt_inject_timer(struct kvm_vcpu *vcpu)
 {
-	return pi_inject_timer && kvm_vcpu_apicv_active(vcpu) &&
-		kvm_hlt_in_guest(vcpu->kvm);
+	return pi_inject_timer && kvm_vcpu_apicv_active(vcpu);
 }
 EXPORT_SYMBOL_GPL(posted_interrupt_inject_timer);
 
However, for some reason (hrtimer subsystems responsability) with cyclictest -i 200
on the guest, the timer runs on the local CPU:

       CPU 1/KVM-9454  [003] d..2   881.674196: get_nohz_timer_target: get_nohz_timer_target 3->0
       CPU 1/KVM-9454  [003] d..2   881.674200: get_nohz_timer_target: get_nohz_timer_target 3->0
       CPU 1/KVM-9454  [003] d.h.   881.674387: apic_timer_fn <-__hrtimer_run_queues
       CPU 1/KVM-9454  [003] d..2   881.674393: get_nohz_timer_target: get_nohz_timer_target 3->0
       CPU 1/KVM-9454  [003] d..2   881.674395: get_nohz_timer_target: get_nohz_timer_target 3->0
       CPU 1/KVM-9454  [003] d..2   881.674399: get_nohz_timer_target: get_nohz_timer_target 3->0
       CPU 1/KVM-9454  [003] d.h.   881.674586: apic_timer_fn <-__hrtimer_run_queues
       CPU 1/KVM-9454  [003] d..2   881.674593: get_nohz_timer_target: get_nohz_timer_target 3->0
       CPU 1/KVM-9454  [003] d..2   881.674595: get_nohz_timer_target: get_nohz_timer_target 3->0
       CPU 1/KVM-9454  [003] d..2   881.674599: get_nohz_timer_target: get_nohz_timer_target 3->0
       CPU 1/KVM-9454  [003] d.h.   881.674787: apic_timer_fn <-__hrtimer_run_queues
       CPU 1/KVM-9454  [003] d..2   881.674793: get_nohz_timer_target: get_nohz_timer_target 3->0
       CPU 1/KVM-9454  [003] d..2   881.674795: get_nohz_timer_target: get_nohz_timer_target 3->0

But on boot:

       CPU 1/KVM-9454  [003] d..2   578.625394: get_nohz_timer_target: get_nohz_timer_target 3->0
          <idle>-0     [000] d.h1   578.626390: apic_timer_fn <-__hrtimer_run_queues
          <idle>-0     [000] d.h1   578.626394: apic_timer_fn<-__hrtimer_run_queues
       CPU 1/KVM-9454  [003] d..2   578.626401: get_nohz_timer_target: get_nohz_timer_target 3->0
          <idle>-0     [000] d.h1   578.628397: apic_timer_fn <-__hrtimer_run_queues
       CPU 1/KVM-9454  [003] d..2   578.628407: get_nohz_timer_target: get_nohz_timer_target 3->0
          <idle>-0     [000] d.h1   578.631403: apic_timer_fn <-__hrtimer_run_queues
       CPU 1/KVM-9454  [003] d..2   578.631413: get_nohz_timer_target: get_nohz_timer_target 3->0
          <idle>-0     [000] d.h1   578.635409: apic_timer_fn <-__hrtimer_run_queues
       CPU 1/KVM-9454  [003] d..2   578.635419: get_nohz_timer_target: get_nohz_timer_target 3->0
          <idle>-0     [000] d.h1   578.640415: apic_timer_fn <-__hrtimer_run_queues

Thanks.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 0/4] KVM: LAPIC: Implement Exitless Timer
  2019-07-02 22:23   ` Marcelo Tosatti
@ 2019-07-03  0:47     ` Wanpeng Li
  2019-07-03  1:01       ` Wanpeng Li
  0 siblings, 1 reply; 14+ messages in thread
From: Wanpeng Li @ 2019-07-03  0:47 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Paolo Bonzini, LKML, kvm, Radim Krčmář

On Wed, 3 Jul 2019 at 06:23, Marcelo Tosatti <mtosatti@redhat.com> wrote:
>
> On Tue, Jul 02, 2019 at 06:38:56PM +0200, Paolo Bonzini wrote:
> > On 21/06/19 11:39, Wanpeng Li wrote:
> > > Dedicated instances are currently disturbed by unnecessary jitter due
> > > to the emulated lapic timers fire on the same pCPUs which vCPUs resident.
> > > There is no hardware virtual timer on Intel for guest like ARM. Both
> > > programming timer in guest and the emulated timer fires incur vmexits.
> > > This patchset tries to avoid vmexit which is incurred by the emulated
> > > timer fires in dedicated instance scenario.
> > >
> > > When nohz_full is enabled in dedicated instances scenario, the unpinned
> > > timer will be moved to the nearest busy housekeepers after commit
> > > 9642d18eee2cd (nohz: Affine unpinned timers to housekeepers) and commit
> > > 444969223c8 ("sched/nohz: Fix affine unpinned timers mess"). However,
> > > KVM always makes lapic timer pinned to the pCPU which vCPU residents, the
> > > reason is explained by commit 61abdbe0 (kvm: x86: make lapic hrtimer
> > > pinned). Actually, these emulated timers can be offload to the housekeeping
> > > cpus since APICv is really common in recent years. The guest timer interrupt
> > > is injected by posted-interrupt which is delivered by housekeeping cpu
> > > once the emulated timer fires.
> > >
> > > The host admin should fine tuned, e.g. dedicated instances scenario w/
> > > nohz_full cover the pCPUs which vCPUs resident, several pCPUs surplus
> > > for busy housekeeping, disable mwait/hlt/pause vmexits to keep in non-root
> > > mode, ~3% redis performance benefit can be observed on Skylake server.
> >
> > Marcelo,
> >
> > does this patch work for you or can you still see the oops?
>
> Hi Paolo,
>
> No more oopses with kvm/queue. Can you include:

Cool, thanks for the confirm, Marcelo!

>
> Index: kvm/arch/x86/kvm/lapic.c
> ===================================================================
> --- kvm.orig/arch/x86/kvm/lapic.c
> +++ kvm/arch/x86/kvm/lapic.c
> @@ -124,8 +124,7 @@ static inline u32 kvm_x2apic_id(struct k
>
>  bool posted_interrupt_inject_timer(struct kvm_vcpu *vcpu)
>  {
> -       return pi_inject_timer && kvm_vcpu_apicv_active(vcpu) &&
> -               kvm_hlt_in_guest(vcpu->kvm);
> +       return pi_inject_timer && kvm_vcpu_apicv_active(vcpu);
>  }
>  EXPORT_SYMBOL_GPL(posted_interrupt_inject_timer);
>
> However, for some reason (hrtimer subsystems responsability) with cyclictest -i 200
> on the guest, the timer runs on the local CPU:
>
>        CPU 1/KVM-9454  [003] d..2   881.674196: get_nohz_timer_target: get_nohz_timer_target 3->0
>        CPU 1/KVM-9454  [003] d..2   881.674200: get_nohz_timer_target: get_nohz_timer_target 3->0
>        CPU 1/KVM-9454  [003] d.h.   881.674387: apic_timer_fn <-__hrtimer_run_queues
>        CPU 1/KVM-9454  [003] d..2   881.674393: get_nohz_timer_target: get_nohz_timer_target 3->0
>        CPU 1/KVM-9454  [003] d..2   881.674395: get_nohz_timer_target: get_nohz_timer_target 3->0
>        CPU 1/KVM-9454  [003] d..2   881.674399: get_nohz_timer_target: get_nohz_timer_target 3->0
>        CPU 1/KVM-9454  [003] d.h.   881.674586: apic_timer_fn <-__hrtimer_run_queues
>        CPU 1/KVM-9454  [003] d..2   881.674593: get_nohz_timer_target: get_nohz_timer_target 3->0
>        CPU 1/KVM-9454  [003] d..2   881.674595: get_nohz_timer_target: get_nohz_timer_target 3->0
>        CPU 1/KVM-9454  [003] d..2   881.674599: get_nohz_timer_target: get_nohz_timer_target 3->0
>        CPU 1/KVM-9454  [003] d.h.   881.674787: apic_timer_fn <-__hrtimer_run_queues
>        CPU 1/KVM-9454  [003] d..2   881.674793: get_nohz_timer_target: get_nohz_timer_target 3->0
>        CPU 1/KVM-9454  [003] d..2   881.674795: get_nohz_timer_target: get_nohz_timer_target 3->0
>
> But on boot:
>
>        CPU 1/KVM-9454  [003] d..2   578.625394: get_nohz_timer_target: get_nohz_timer_target 3->0
>           <idle>-0     [000] d.h1   578.626390: apic_timer_fn <-__hrtimer_run_queues
>           <idle>-0     [000] d.h1   578.626394: apic_timer_fn<-__hrtimer_run_queues
>        CPU 1/KVM-9454  [003] d..2   578.626401: get_nohz_timer_target: get_nohz_timer_target 3->0
>           <idle>-0     [000] d.h1   578.628397: apic_timer_fn <-__hrtimer_run_queues
>        CPU 1/KVM-9454  [003] d..2   578.628407: get_nohz_timer_target: get_nohz_timer_target 3->0
>           <idle>-0     [000] d.h1   578.631403: apic_timer_fn <-__hrtimer_run_queues
>        CPU 1/KVM-9454  [003] d..2   578.631413: get_nohz_timer_target: get_nohz_timer_target 3->0
>           <idle>-0     [000] d.h1   578.635409: apic_timer_fn <-__hrtimer_run_queues
>        CPU 1/KVM-9454  [003] d..2   578.635419: get_nohz_timer_target: get_nohz_timer_target 3->0
>           <idle>-0     [000] d.h1   578.640415: apic_timer_fn <-__hrtimer_run_queues

You have an idle housekeeping cpu(cpu 0), however, most of
housekeeping cpus will be busy in product environment to avoid to
waste money. get_nohz_timer_target() will find a busy housekeeping cpu
but the timer migration will fail if the timer is the first expiring
timer on the new target(as the comments above the function
switch_hrtimer_base()). Please try taskset -c 0 stress --cpu 1 on your
host, you can observe(through /proc/timer_list) apic_timer_fn running
on cpu 0 most of the time and sporadically on local cpu.

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 0/4] KVM: LAPIC: Implement Exitless Timer
  2019-07-03  0:47     ` Wanpeng Li
@ 2019-07-03  1:01       ` Wanpeng Li
  0 siblings, 0 replies; 14+ messages in thread
From: Wanpeng Li @ 2019-07-03  1:01 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Paolo Bonzini, LKML, kvm, Radim Krčmář

On Wed, 3 Jul 2019 at 08:47, Wanpeng Li <kernellwp@gmail.com> wrote:
>
> On Wed, 3 Jul 2019 at 06:23, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> >
> > On Tue, Jul 02, 2019 at 06:38:56PM +0200, Paolo Bonzini wrote:
> > > On 21/06/19 11:39, Wanpeng Li wrote:
> > > > Dedicated instances are currently disturbed by unnecessary jitter due
> > > > to the emulated lapic timers fire on the same pCPUs which vCPUs resident.
> > > > There is no hardware virtual timer on Intel for guest like ARM. Both
> > > > programming timer in guest and the emulated timer fires incur vmexits.
> > > > This patchset tries to avoid vmexit which is incurred by the emulated
> > > > timer fires in dedicated instance scenario.
> > > >
> > > > When nohz_full is enabled in dedicated instances scenario, the unpinned
> > > > timer will be moved to the nearest busy housekeepers after commit
> > > > 9642d18eee2cd (nohz: Affine unpinned timers to housekeepers) and commit
> > > > 444969223c8 ("sched/nohz: Fix affine unpinned timers mess"). However,
> > > > KVM always makes lapic timer pinned to the pCPU which vCPU residents, the
> > > > reason is explained by commit 61abdbe0 (kvm: x86: make lapic hrtimer
> > > > pinned). Actually, these emulated timers can be offload to the housekeeping
> > > > cpus since APICv is really common in recent years. The guest timer interrupt
> > > > is injected by posted-interrupt which is delivered by housekeeping cpu
> > > > once the emulated timer fires.
> > > >
> > > > The host admin should fine tuned, e.g. dedicated instances scenario w/
> > > > nohz_full cover the pCPUs which vCPUs resident, several pCPUs surplus
> > > > for busy housekeeping, disable mwait/hlt/pause vmexits to keep in non-root
> > > > mode, ~3% redis performance benefit can be observed on Skylake server.
> > >
> > > Marcelo,
> > >
> > > does this patch work for you or can you still see the oops?
> >
> > Hi Paolo,
> >
> > No more oopses with kvm/queue. Can you include:
>
> Cool, thanks for the confirm, Marcelo!
>
> >
> > Index: kvm/arch/x86/kvm/lapic.c
> > ===================================================================
> > --- kvm.orig/arch/x86/kvm/lapic.c
> > +++ kvm/arch/x86/kvm/lapic.c
> > @@ -124,8 +124,7 @@ static inline u32 kvm_x2apic_id(struct k
> >
> >  bool posted_interrupt_inject_timer(struct kvm_vcpu *vcpu)
> >  {
> > -       return pi_inject_timer && kvm_vcpu_apicv_active(vcpu) &&
> > -               kvm_hlt_in_guest(vcpu->kvm);
> > +       return pi_inject_timer && kvm_vcpu_apicv_active(vcpu);
> >  }
> >  EXPORT_SYMBOL_GPL(posted_interrupt_inject_timer);
> >
> > However, for some reason (hrtimer subsystems responsability) with cyclictest -i 200
> > on the guest, the timer runs on the local CPU:
> >
> >        CPU 1/KVM-9454  [003] d..2   881.674196: get_nohz_timer_target: get_nohz_timer_target 3->0
> >        CPU 1/KVM-9454  [003] d..2   881.674200: get_nohz_timer_target: get_nohz_timer_target 3->0
> >        CPU 1/KVM-9454  [003] d.h.   881.674387: apic_timer_fn <-__hrtimer_run_queues
> >        CPU 1/KVM-9454  [003] d..2   881.674393: get_nohz_timer_target: get_nohz_timer_target 3->0
> >        CPU 1/KVM-9454  [003] d..2   881.674395: get_nohz_timer_target: get_nohz_timer_target 3->0
> >        CPU 1/KVM-9454  [003] d..2   881.674399: get_nohz_timer_target: get_nohz_timer_target 3->0
> >        CPU 1/KVM-9454  [003] d.h.   881.674586: apic_timer_fn <-__hrtimer_run_queues
> >        CPU 1/KVM-9454  [003] d..2   881.674593: get_nohz_timer_target: get_nohz_timer_target 3->0
> >        CPU 1/KVM-9454  [003] d..2   881.674595: get_nohz_timer_target: get_nohz_timer_target 3->0
> >        CPU 1/KVM-9454  [003] d..2   881.674599: get_nohz_timer_target: get_nohz_timer_target 3->0
> >        CPU 1/KVM-9454  [003] d.h.   881.674787: apic_timer_fn <-__hrtimer_run_queues
> >        CPU 1/KVM-9454  [003] d..2   881.674793: get_nohz_timer_target: get_nohz_timer_target 3->0
> >        CPU 1/KVM-9454  [003] d..2   881.674795: get_nohz_timer_target: get_nohz_timer_target 3->0
> >
> > But on boot:
> >
> >        CPU 1/KVM-9454  [003] d..2   578.625394: get_nohz_timer_target: get_nohz_timer_target 3->0
> >           <idle>-0     [000] d.h1   578.626390: apic_timer_fn <-__hrtimer_run_queues
> >           <idle>-0     [000] d.h1   578.626394: apic_timer_fn<-__hrtimer_run_queues
> >        CPU 1/KVM-9454  [003] d..2   578.626401: get_nohz_timer_target: get_nohz_timer_target 3->0
> >           <idle>-0     [000] d.h1   578.628397: apic_timer_fn <-__hrtimer_run_queues
> >        CPU 1/KVM-9454  [003] d..2   578.628407: get_nohz_timer_target: get_nohz_timer_target 3->0
> >           <idle>-0     [000] d.h1   578.631403: apic_timer_fn <-__hrtimer_run_queues
> >        CPU 1/KVM-9454  [003] d..2   578.631413: get_nohz_timer_target: get_nohz_timer_target 3->0
> >           <idle>-0     [000] d.h1   578.635409: apic_timer_fn <-__hrtimer_run_queues
> >        CPU 1/KVM-9454  [003] d..2   578.635419: get_nohz_timer_target: get_nohz_timer_target 3->0
> >           <idle>-0     [000] d.h1   578.640415: apic_timer_fn <-__hrtimer_run_queues
>
> You have an idle housekeeping cpu(cpu 0), however, most of
> housekeeping cpus will be busy in product environment to avoid to
> waste money. get_nohz_timer_target() will find a busy housekeeping cpu
> but the timer migration will fail if the timer is the first expiring
> timer on the new target(as the comments above the function
> switch_hrtimer_base()). Please try taskset -c 0 stress --cpu 1 on your
> host, you can observe(through /proc/timer_list) apic_timer_fn running
> on cpu 0 most of the time and sporadically on local cpu.

Or if you have a little bigger VM/multiple VMs, the apic_timer_fn from
all virtual lapics will make a housekeeping cpu busy. :)

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 2/4] KVM: LAPIC: Inject timer interrupt via posted interrupt
  2019-06-21  9:40 ` [PATCH v5 2/4] KVM: LAPIC: Inject timer interrupt via posted interrupt Wanpeng Li
@ 2019-07-05 12:37   ` Paolo Bonzini
       [not found]     ` <HK2PR02MB4145E122DC5AC2445137A10F80F50@HK2PR02MB4145.apcprd02.prod.outlook.com>
       [not found]     ` <HK2PR02MB4145BBA5B72DD70AC622FD0E80F50@HK2PR02MB4145.apcprd02.prod.outlook.com>
  0 siblings, 2 replies; 14+ messages in thread
From: Paolo Bonzini @ 2019-07-05 12:37 UTC (permalink / raw)
  To: Wanpeng Li, linux-kernel, kvm
  Cc: Radim Krčmář, Marcelo Tosatti

On 21/06/19 11:40, Wanpeng Li wrote:
> +bool __read_mostly pi_inject_timer = 0;
> +module_param(pi_inject_timer, bool, S_IRUGO | S_IWUSR);
> +
>  #define KVM_NR_SHARED_MSRS 16
>  
>  struct kvm_shared_msrs_global {
> @@ -7032,6 +7036,7 @@ int kvm_arch_init(void *opaque)
>  		host_xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK);
>  
>  	kvm_lapic_init();
> +	pi_inject_timer = housekeeping_enabled(HK_FLAG_TIMER);

This overwrites whatever the user specified, so perhaps:

u8 __read_mostly pi_inject_timer = -1;
module_param(pi_inject_timer, bool, S_IRUGO | S_IWUSR);

...
	if (pi_inject_timer == -1)
		pi_inject_timer = housekeeping_enabled(HK_FLAG_TIMER);

Also, do you want to disable the preemption timer if pi_inject_timer and
enable_apicv are both true?

Paolo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 4/4] KVM: LAPIC: Don't inject already-expired timer via posted interrupt
  2019-06-21  9:40 ` [PATCH v5 4/4] KVM: LAPIC: Don't inject already-expired timer via posted interrupt Wanpeng Li
@ 2019-07-05 12:40   ` Paolo Bonzini
       [not found]     ` <HK2PR02MB4145B13227997511174DBFA480F50@HK2PR02MB4145.apcprd02.prod.outlook.com>
  0 siblings, 1 reply; 14+ messages in thread
From: Paolo Bonzini @ 2019-07-05 12:40 UTC (permalink / raw)
  To: Wanpeng Li, linux-kernel, kvm
  Cc: Radim Krčmář, Marcelo Tosatti

On 21/06/19 11:40, Wanpeng Li wrote:
> From: Wanpeng Li <wanpengli@tencent.com>
> 
> already-expired timer interrupt can be injected to guest when vCPU who 
> arms the lapic timer re-vmentry, don't posted inject in this case.
> 
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Cc: Marcelo Tosatti <mtosatti@redhat.com>
> Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> ---
>  arch/x86/kvm/lapic.c | 14 +++++++-------
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index ae575c0..7cd95ea 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -1452,7 +1452,7 @@ static void kvm_apic_inject_pending_timer_irqs(struct kvm_lapic *apic)
>  	}
>  }
>  
> -static void apic_timer_expired(struct kvm_lapic *apic)
> +static void apic_timer_expired(struct kvm_lapic *apic, bool can_pi_inject)
>  {
>  	struct kvm_vcpu *vcpu = apic->vcpu;
>  	struct swait_queue_head *q = &vcpu->wq;
> @@ -1464,7 +1464,7 @@ static void apic_timer_expired(struct kvm_lapic *apic)
>  	if (apic_lvtt_tscdeadline(apic) || ktimer->hv_timer_in_use)
>  		ktimer->expired_tscdeadline = ktimer->tscdeadline;
>  
> -	if (posted_interrupt_inject_timer(apic->vcpu)) {
> +	if (can_pi_inject && posted_interrupt_inject_timer(apic->vcpu)) {

Perhaps it should use a posted interrupt if kvm_arch_should_kick(vcpu),
i.e. just add kvm_arch_vcpu_should_kick(apic->vcpu) to
posted_interrupt_inject_timer?

Paolo

>  		if (apic->lapic_timer.timer_advance_ns)
>  			kvm_wait_lapic_expire(vcpu, true);
>  		kvm_apic_inject_pending_timer_irqs(apic);
> @@ -1607,7 +1607,7 @@ static void start_sw_tscdeadline(struct kvm_lapic *apic)
>  		expire = ktime_sub_ns(expire, ktimer->timer_advance_ns);
>  		hrtimer_start(&ktimer->timer, expire, HRTIMER_MODE_ABS);
>  	} else
> -		apic_timer_expired(apic);
> +		apic_timer_expired(apic, false);
>  
>  	local_irq_restore(flags);
>  }
> @@ -1697,7 +1697,7 @@ static void start_sw_period(struct kvm_lapic *apic)
>  
>  	if (ktime_after(ktime_get(),
>  			apic->lapic_timer.target_expiration)) {
> -		apic_timer_expired(apic);
> +		apic_timer_expired(apic, false);
>  
>  		if (apic_lvtt_oneshot(apic))
>  			return;
> @@ -1759,7 +1759,7 @@ static bool start_hv_timer(struct kvm_lapic *apic)
>  		if (atomic_read(&ktimer->pending)) {
>  			cancel_hv_timer(apic);
>  		} else if (expired) {
> -			apic_timer_expired(apic);
> +			apic_timer_expired(apic, false);
>  			cancel_hv_timer(apic);
>  		}
>  	}
> @@ -1809,7 +1809,7 @@ void kvm_lapic_expired_hv_timer(struct kvm_vcpu *vcpu)
>  		goto out;
>  	WARN_ON(swait_active(&vcpu->wq));
>  	cancel_hv_timer(apic);
> -	apic_timer_expired(apic);
> +	apic_timer_expired(apic, false);
>  
>  	if (apic_lvtt_period(apic) && apic->lapic_timer.period) {
>  		advance_periodic_target_expiration(apic);
> @@ -2312,7 +2312,7 @@ static enum hrtimer_restart apic_timer_fn(struct hrtimer *data)
>  	struct kvm_timer *ktimer = container_of(data, struct kvm_timer, timer);
>  	struct kvm_lapic *apic = container_of(ktimer, struct kvm_lapic, lapic_timer);
>  
> -	apic_timer_expired(apic);
> +	apic_timer_expired(apic, true);
>  
>  	if (lapic_is_periodic(apic)) {
>  		advance_periodic_target_expiration(apic);
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 2/4] KVM: LAPIC: Inject timer interrupt via posted interrupt
       [not found]     ` <HK2PR02MB4145E122DC5AC2445137A10F80F50@HK2PR02MB4145.apcprd02.prod.outlook.com>
@ 2019-07-05 13:00       ` Paolo Bonzini
  0 siblings, 0 replies; 14+ messages in thread
From: Paolo Bonzini @ 2019-07-05 13:00 UTC (permalink / raw)
  To: Wanpeng Li, Wanpeng Li, linux-kernel, kvm
  Cc: Radim Krčmář, Marcelo Tosatti

On 05/07/19 14:52, Wanpeng Li wrote:
>> Also, do you want to disable the preemption timer if pi_inject_timer and
>> enable_apicv are both true?
> 
> I have already done this in vmx_set_hv_timer().

Right, sorry!

Paolo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 4/4] KVM: LAPIC: Don't inject already-expired timer via posted interrupt
       [not found]     ` <HK2PR02MB4145B13227997511174DBFA480F50@HK2PR02MB4145.apcprd02.prod.outlook.com>
@ 2019-07-05 13:26       ` Paolo Bonzini
  0 siblings, 0 replies; 14+ messages in thread
From: Paolo Bonzini @ 2019-07-05 13:26 UTC (permalink / raw)
  To: Wanpeng Li, Wanpeng Li, linux-kernel, kvm
  Cc: Radim Krčmář, Marcelo Tosatti

On 05/07/19 15:11, Wanpeng Li wrote:
> On 7/5/19 8:40 PM, Paolo Bonzini wrote:
> 
>> On 21/06/19 11:40, Wanpeng Li wrote:
>>> From: Wanpeng Li <wanpengli@tencent.com>
>>>
>>> already-expired timer interrupt can be injected to guest when vCPU who
>>> arms the lapic timer re-vmentry, don't posted inject in this case.
>>>
>>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>>> Cc: Radim Krčmář <rkrcmar@redhat.com>
>>> Cc: Marcelo Tosatti <mtosatti@redhat.com>
>>> Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
>>> ---
>>>   arch/x86/kvm/lapic.c | 14 +++++++-------
>>>   1 file changed, 7 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>>> index ae575c0..7cd95ea 100644
>>> --- a/arch/x86/kvm/lapic.c
>>> +++ b/arch/x86/kvm/lapic.c
>>> @@ -1452,7 +1452,7 @@ static void kvm_apic_inject_pending_timer_irqs(struct kvm_lapic *apic)
>>>   	}
>>>   }
>>>   
>>> -static void apic_timer_expired(struct kvm_lapic *apic)
>>> +static void apic_timer_expired(struct kvm_lapic *apic, bool can_pi_inject)
>>>   {
>>>   	struct kvm_vcpu *vcpu = apic->vcpu;
>>>   	struct swait_queue_head *q = &vcpu->wq;
>>> @@ -1464,7 +1464,7 @@ static void apic_timer_expired(struct kvm_lapic *apic)
>>>   	if (apic_lvtt_tscdeadline(apic) || ktimer->hv_timer_in_use)
>>>   		ktimer->expired_tscdeadline = ktimer->tscdeadline;
>>>   
>>> -	if (posted_interrupt_inject_timer(apic->vcpu)) {
>>> +	if (can_pi_inject && posted_interrupt_inject_timer(apic->vcpu)) {
>> Perhaps it should use a posted interrupt if kvm_arch_should_kick(vcpu),
>> i.e. just add kvm_arch_vcpu_should_kick(apic->vcpu) to
>> posted_interrupt_inject_timer?
> 
> So do you mean not nohz_full setup? An external interrupt is incurred 
> here and preemption timer is better.

No, I mean instead of adding can_pi_inject, just test
kvm_arch_should_kick in posted_interrupt_inject_timer, skipping the PI
if the vCPU is not running.  Instead just go down the normal path and
the guest will get the interrupt by checking the timer-pending flag.

Paolo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 2/4] KVM: LAPIC: Inject timer interrupt via posted interrupt
       [not found]     ` <HK2PR02MB4145BBA5B72DD70AC622FD0E80F50@HK2PR02MB4145.apcprd02.prod.outlook.com>
@ 2019-07-05 13:42       ` Paolo Bonzini
  0 siblings, 0 replies; 14+ messages in thread
From: Paolo Bonzini @ 2019-07-05 13:42 UTC (permalink / raw)
  To: Wanpeng Li, Wanpeng Li, linux-kernel, kvm
  Cc: Radim Krčmář, Marcelo Tosatti

On 05/07/19 15:32, Wanpeng Li wrote:
> -bool __read_mostly pi_inject_timer = 0;
> -module_param(pi_inject_timer, bool, S_IRUGO | S_IWUSR);
> +int __read_mostly pi_inject_timer = -1;
> +module_param(pi_inject_timer, int, S_IRUGO | S_IWUSR);

Use "bint" instead of "int" please, so that it accepts 0/1 only and
prints as Y/N.

Paolo

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2019-07-05 13:42 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-21  9:39 [PATCH v5 0/4] KVM: LAPIC: Implement Exitless Timer Wanpeng Li
2019-06-21  9:39 ` [PATCH v5 1/4] KVM: LAPIC: Make lapic timer unpinned Wanpeng Li
2019-06-21  9:40 ` [PATCH v5 2/4] KVM: LAPIC: Inject timer interrupt via posted interrupt Wanpeng Li
2019-07-05 12:37   ` Paolo Bonzini
     [not found]     ` <HK2PR02MB4145E122DC5AC2445137A10F80F50@HK2PR02MB4145.apcprd02.prod.outlook.com>
2019-07-05 13:00       ` Paolo Bonzini
     [not found]     ` <HK2PR02MB4145BBA5B72DD70AC622FD0E80F50@HK2PR02MB4145.apcprd02.prod.outlook.com>
2019-07-05 13:42       ` Paolo Bonzini
2019-06-21  9:40 ` [PATCH v5 3/4] KVM: LAPIC: Ignore timer migration when lapic timer is injected by pi Wanpeng Li
2019-06-21  9:40 ` [PATCH v5 4/4] KVM: LAPIC: Don't inject already-expired timer via posted interrupt Wanpeng Li
2019-07-05 12:40   ` Paolo Bonzini
     [not found]     ` <HK2PR02MB4145B13227997511174DBFA480F50@HK2PR02MB4145.apcprd02.prod.outlook.com>
2019-07-05 13:26       ` Paolo Bonzini
2019-07-02 16:38 ` [PATCH v5 0/4] KVM: LAPIC: Implement Exitless Timer Paolo Bonzini
2019-07-02 22:23   ` Marcelo Tosatti
2019-07-03  0:47     ` Wanpeng Li
2019-07-03  1:01       ` Wanpeng Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).