KVM Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v4 0/5] KVM: LAPIC: Implement Exitless Timer
@ 2019-06-17 11:24 Wanpeng Li
  2019-06-17 11:24 ` [PATCH v4 1/5] KVM: LAPIC: Make lapic timer unpinned Wanpeng Li
                   ` (4 more replies)
  0 siblings, 5 replies; 23+ messages in thread
From: Wanpeng Li @ 2019-06-17 11:24 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Radim Krčmář, Marcelo Tosatti

Dedicated instances are currently disturbed by unnecessary jitter due 
to the emulated lapic timers fire on the same pCPUs which vCPUs resident.
There is no hardware virtual timer on Intel for guest like ARM. Both 
programming timer in guest and the emulated timer fires incur vmexits.
This patchset tries to avoid vmexit which is incurred by the emulated 
timer fires in dedicated instance scenario. 

When nohz_full is enabled in dedicated instances scenario, the unpinned 
timer will be moved to the nearest busy housekeepers after commit
9642d18eee2cd (nohz: Affine unpinned timers to housekeepers) and commit 
444969223c8 ("sched/nohz: Fix affine unpinned timers mess"). However, 
KVM always makes lapic timer pinned to the pCPU which vCPU residents, the 
reason is explained by commit 61abdbe0 (kvm: x86: make lapic hrtimer 
pinned). Actually, these emulated timers can be offload to the housekeeping 
cpus since APICv is really common in recent years. The guest timer interrupt 
is injected by posted-interrupt which is delivered by housekeeping cpu 
once the emulated timer fires. 

The host admin should fine tuned, e.g. dedicated instances scenario w/ 
nohz_full cover the pCPUs which vCPUs resident, several pCPUs surplus 
for busy housekeeping, disable mwait/hlt/pause vmexits to keep in non-root  
mode, ~3% redis performance benefit can be observed on Skylake server.

w/o patchset:

            VM-EXIT  Samples  Samples%  Time%   Min Time  Max Time   Avg time

EXTERNAL_INTERRUPT    42916    49.43%   39.30%   0.47us   106.09us   0.71us ( +-   1.09% )

w/ patchset:

            VM-EXIT  Samples  Samples%  Time%   Min Time  Max Time         Avg time

EXTERNAL_INTERRUPT    6871     9.29%     2.96%   0.44us    57.88us   0.72us ( +-   4.02% )

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>

v3 -> v4:
 * drop the HRTIMER_MODE_ABS_PINNED, add kick after set pending timer
 * don't posted inject already-expired timer

v2 -> v3:
 * disarming the vmx preemption timer when posted_interrupt_inject_timer_enabled()
 * check kvm_hlt_in_guest instead

v1 -> v2:
 * check vcpu_halt_in_guest
 * move module parameter from kvm-intel to kvm
 * add housekeeping_enabled
 * rename apic_timer_expired_pi to kvm_apic_inject_pending_timer_irqs

Wanpeng Li (5):
  KVM: LAPIC: Make lapic timer unpinned
  KVM: LAPIC: inject lapic timer interrupt by posted interrupt
  KVM: LAPIC: Ignore timer migration when lapic timer is injected by pi
  KVM: LAPIC: Don't posted inject already-expired timer
  KVM: LAPIC: add advance timer support to pi_inject_timer

 arch/x86/kvm/lapic.c            | 62 ++++++++++++++++++++++++++++-------------
 arch/x86/kvm/lapic.h            |  3 +-
 arch/x86/kvm/svm.c              |  2 +-
 arch/x86/kvm/vmx/vmx.c          |  5 ++--
 arch/x86/kvm/x86.c              | 11 ++++----
 arch/x86/kvm/x86.h              |  2 ++
 include/linux/sched/isolation.h |  2 ++
 kernel/sched/isolation.c        |  6 ++++
 8 files changed, 64 insertions(+), 29 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v4 1/5] KVM: LAPIC: Make lapic timer unpinned
  2019-06-17 11:24 [PATCH v4 0/5] KVM: LAPIC: Implement Exitless Timer Wanpeng Li
@ 2019-06-17 11:24 ` Wanpeng Li
  2019-06-17 11:48   ` Peter Xu
  2019-06-17 11:24 ` [PATCH v4 2/5] KVM: LAPIC: inject lapic timer interrupt by posted interrupt Wanpeng Li
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 23+ messages in thread
From: Wanpeng Li @ 2019-06-17 11:24 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Radim Krčmář, Marcelo Tosatti

From: Wanpeng Li <wanpengli@tencent.com>

Make lapic timer unpinned when timer is injected by posted-interrupt,
the emulated timer can be offload to the housekeeping cpus, kick after 
setting the pending timer request as alternative to commit 61abdbe0bcc.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
 arch/x86/kvm/lapic.c | 8 ++++----
 arch/x86/kvm/x86.c   | 6 +-----
 2 files changed, 5 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index e82a18c..87ecb56 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1578,7 +1578,7 @@ static void start_sw_tscdeadline(struct kvm_lapic *apic)
 	    likely(ns > apic->lapic_timer.timer_advance_ns)) {
 		expire = ktime_add_ns(now, ns);
 		expire = ktime_sub_ns(expire, ktimer->timer_advance_ns);
-		hrtimer_start(&ktimer->timer, expire, HRTIMER_MODE_ABS_PINNED);
+		hrtimer_start(&ktimer->timer, expire, HRTIMER_MODE_ABS);
 	} else
 		apic_timer_expired(apic);
 
@@ -1680,7 +1680,7 @@ static void start_sw_period(struct kvm_lapic *apic)
 
 	hrtimer_start(&apic->lapic_timer.timer,
 		apic->lapic_timer.target_expiration,
-		HRTIMER_MODE_ABS_PINNED);
+		HRTIMER_MODE_ABS);
 }
 
 bool kvm_lapic_hv_timer_in_use(struct kvm_vcpu *vcpu)
@@ -2317,7 +2317,7 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu, int timer_advance_ns)
 	apic->vcpu = vcpu;
 
 	hrtimer_init(&apic->lapic_timer.timer, CLOCK_MONOTONIC,
-		     HRTIMER_MODE_ABS_PINNED);
+		     HRTIMER_MODE_ABS);
 	apic->lapic_timer.timer.function = apic_timer_fn;
 	if (timer_advance_ns == -1) {
 		apic->lapic_timer.timer_advance_ns = 1000;
@@ -2506,7 +2506,7 @@ void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu)
 
 	timer = &vcpu->arch.apic->lapic_timer.timer;
 	if (hrtimer_cancel(timer))
-		hrtimer_start_expires(timer, HRTIMER_MODE_ABS_PINNED);
+		hrtimer_start_expires(timer, HRTIMER_MODE_ABS);
 }
 
 /*
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0a05a4e..9450a16 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1437,12 +1437,8 @@ static void update_pvclock_gtod(struct timekeeper *tk)
 
 void kvm_set_pending_timer(struct kvm_vcpu *vcpu)
 {
-	/*
-	 * Note: KVM_REQ_PENDING_TIMER is implicitly checked in
-	 * vcpu_enter_guest.  This function is only called from
-	 * the physical CPU that is running vcpu.
-	 */
 	kvm_make_request(KVM_REQ_PENDING_TIMER, vcpu);
+	kvm_vcpu_kick(vcpu);
 }
 
 static void kvm_write_wall_clock(struct kvm *kvm, gpa_t wall_clock)
-- 
2.7.4


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v4 2/5] KVM: LAPIC: inject lapic timer interrupt by posted interrupt
  2019-06-17 11:24 [PATCH v4 0/5] KVM: LAPIC: Implement Exitless Timer Wanpeng Li
  2019-06-17 11:24 ` [PATCH v4 1/5] KVM: LAPIC: Make lapic timer unpinned Wanpeng Li
@ 2019-06-17 11:24 ` Wanpeng Li
  2019-06-18 13:35   ` Marcelo Tosatti
  2019-06-17 11:24 ` [PATCH v4 3/5] KVM: LAPIC: Ignore timer migration when lapic timer is injected by pi Wanpeng Li
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 23+ messages in thread
From: Wanpeng Li @ 2019-06-17 11:24 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Radim Krčmář, Marcelo Tosatti

From: Wanpeng Li <wanpengli@tencent.com>

Dedicated instances are currently disturbed by unnecessary jitter due 
to the emulated lapic timers fire on the same pCPUs which vCPUs resident.
There is no hardware virtual timer on Intel for guest like ARM. Both 
programming timer in guest and the emulated timer fires incur vmexits.
This patch tries to avoid vmexit which is incurred by the emulated 
timer fires in dedicated instance scenario. 

When nohz_full is enabled in dedicated instances scenario, the emulated 
timers can be offload to the nearest busy housekeeping cpus since APICv 
is really common in recent years. The guest timer interrupt is injected 
by posted-interrupt which is delivered by housekeeping cpu once the emulated 
timer fires. 

The host admin should fine tuned, e.g. dedicated instances scenario w/ 
nohz_full cover the pCPUs which vCPUs resident, several pCPUs surplus 
for busy housekeeping, disable mwait/hlt/pause vmexits to keep in non-root  
mode, ~3% redis performance benefit can be observed on Skylake server.

w/o patch:

            VM-EXIT  Samples  Samples%  Time%   Min Time  Max Time   Avg time

EXTERNAL_INTERRUPT    42916    49.43%   39.30%   0.47us   106.09us   0.71us ( +-   1.09% )

w/ patch:

            VM-EXIT  Samples  Samples%  Time%   Min Time  Max Time         Avg time

EXTERNAL_INTERRUPT    6871     9.29%     2.96%   0.44us    57.88us   0.72us ( +-   4.02% )

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
 arch/x86/kvm/lapic.c            | 33 ++++++++++++++++++++++++++-------
 arch/x86/kvm/lapic.h            |  1 +
 arch/x86/kvm/vmx/vmx.c          |  3 ++-
 arch/x86/kvm/x86.c              |  5 +++++
 arch/x86/kvm/x86.h              |  2 ++
 include/linux/sched/isolation.h |  2 ++
 kernel/sched/isolation.c        |  6 ++++++
 7 files changed, 44 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 87ecb56..9ceeee5 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -122,6 +122,13 @@ static inline u32 kvm_x2apic_id(struct kvm_lapic *apic)
 	return apic->vcpu->vcpu_id;
 }
 
+bool posted_interrupt_inject_timer(struct kvm_vcpu *vcpu)
+{
+	return pi_inject_timer && kvm_vcpu_apicv_active(vcpu) &&
+		kvm_hlt_in_guest(vcpu->kvm);
+}
+EXPORT_SYMBOL_GPL(posted_interrupt_inject_timer);
+
 static inline bool kvm_apic_map_get_logical_dest(struct kvm_apic_map *map,
 		u32 dest_id, struct kvm_lapic ***cluster, u16 *mask) {
 	switch (map->mode) {
@@ -1432,6 +1439,19 @@ static void apic_update_lvtt(struct kvm_lapic *apic)
 	}
 }
 
+static void kvm_apic_inject_pending_timer_irqs(struct kvm_lapic *apic)
+{
+	struct kvm_timer *ktimer = &apic->lapic_timer;
+
+	kvm_apic_local_deliver(apic, APIC_LVTT);
+	if (apic_lvtt_tscdeadline(apic))
+		ktimer->tscdeadline = 0;
+	if (apic_lvtt_oneshot(apic)) {
+		ktimer->tscdeadline = 0;
+		ktimer->target_expiration = 0;
+	}
+}
+
 static void apic_timer_expired(struct kvm_lapic *apic)
 {
 	struct kvm_vcpu *vcpu = apic->vcpu;
@@ -1441,6 +1461,11 @@ static void apic_timer_expired(struct kvm_lapic *apic)
 	if (atomic_read(&apic->lapic_timer.pending))
 		return;
 
+	if (posted_interrupt_inject_timer(apic->vcpu)) {
+		kvm_apic_inject_pending_timer_irqs(apic);
+		return;
+	}
+
 	atomic_inc(&apic->lapic_timer.pending);
 	kvm_set_pending_timer(vcpu);
 
@@ -2373,13 +2398,7 @@ void kvm_inject_apic_timer_irqs(struct kvm_vcpu *vcpu)
 	struct kvm_lapic *apic = vcpu->arch.apic;
 
 	if (atomic_read(&apic->lapic_timer.pending) > 0) {
-		kvm_apic_local_deliver(apic, APIC_LVTT);
-		if (apic_lvtt_tscdeadline(apic))
-			apic->lapic_timer.tscdeadline = 0;
-		if (apic_lvtt_oneshot(apic)) {
-			apic->lapic_timer.tscdeadline = 0;
-			apic->lapic_timer.target_expiration = 0;
-		}
+		kvm_apic_inject_pending_timer_irqs(apic);
 		atomic_set(&apic->lapic_timer.pending, 0);
 	}
 }
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 3674717..e41936b 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -236,6 +236,7 @@ void kvm_lapic_switch_to_hv_timer(struct kvm_vcpu *vcpu);
 void kvm_lapic_expired_hv_timer(struct kvm_vcpu *vcpu);
 bool kvm_lapic_hv_timer_in_use(struct kvm_vcpu *vcpu);
 void kvm_lapic_restart_hv_timer(struct kvm_vcpu *vcpu);
+bool posted_interrupt_inject_timer(struct kvm_vcpu *vcpu);
 
 static inline enum lapic_mode kvm_apic_mode(u64 apic_base)
 {
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 8fbea03..f45c51e 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7059,7 +7059,8 @@ static int vmx_set_hv_timer(struct kvm_vcpu *vcpu, u64 guest_deadline_tsc,
 	u64 tscl, guest_tscl, delta_tsc, lapic_timer_advance_cycles;
 	struct kvm_timer *ktimer = &vcpu->arch.apic->lapic_timer;
 
-	if (kvm_mwait_in_guest(vcpu->kvm))
+	if (kvm_mwait_in_guest(vcpu->kvm) ||
+		posted_interrupt_inject_timer(vcpu))
 		return -EOPNOTSUPP;
 
 	vmx = to_vmx(vcpu);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9450a16..dae18f7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -54,6 +54,7 @@
 #include <linux/kvm_irqfd.h>
 #include <linux/irqbypass.h>
 #include <linux/sched/stat.h>
+#include <linux/sched/isolation.h>
 #include <linux/mem_encrypt.h>
 
 #include <trace/events/kvm.h>
@@ -155,6 +156,9 @@ EXPORT_SYMBOL_GPL(enable_vmware_backdoor);
 static bool __read_mostly force_emulation_prefix = false;
 module_param(force_emulation_prefix, bool, S_IRUGO);
 
+bool __read_mostly pi_inject_timer = 0;
+module_param(pi_inject_timer, bool, S_IRUGO | S_IWUSR);
+
 #define KVM_NR_SHARED_MSRS 16
 
 struct kvm_shared_msrs_global {
@@ -7032,6 +7036,7 @@ int kvm_arch_init(void *opaque)
 		host_xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK);
 
 	kvm_lapic_init();
+	pi_inject_timer = housekeeping_enabled(HK_FLAG_TIMER);
 #ifdef CONFIG_X86_64
 	pvclock_gtod_register_notifier(&pvclock_gtod_notifier);
 
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index e08a128..10b26f4 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -301,6 +301,8 @@ extern unsigned int min_timer_period_us;
 
 extern bool enable_vmware_backdoor;
 
+extern bool pi_inject_timer;
+
 extern struct static_key kvm_no_apic_vcpu;
 
 static inline u64 nsec_to_cycles(struct kvm_vcpu *vcpu, u64 nsec)
diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
index b0fb144..6fc5407 100644
--- a/include/linux/sched/isolation.h
+++ b/include/linux/sched/isolation.h
@@ -19,6 +19,7 @@ enum hk_flags {
 DECLARE_STATIC_KEY_FALSE(housekeeping_overridden);
 extern int housekeeping_any_cpu(enum hk_flags flags);
 extern const struct cpumask *housekeeping_cpumask(enum hk_flags flags);
+extern bool housekeeping_enabled(enum hk_flags flags);
 extern void housekeeping_affine(struct task_struct *t, enum hk_flags flags);
 extern bool housekeeping_test_cpu(int cpu, enum hk_flags flags);
 extern void __init housekeeping_init(void);
@@ -38,6 +39,7 @@ static inline const struct cpumask *housekeeping_cpumask(enum hk_flags flags)
 static inline void housekeeping_affine(struct task_struct *t,
 				       enum hk_flags flags) { }
 static inline void housekeeping_init(void) { }
+static inline bool housekeeping_enabled(enum hk_flags flags) { }
 #endif /* CONFIG_CPU_ISOLATION */
 
 static inline bool housekeeping_cpu(int cpu, enum hk_flags flags)
diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
index 123ea07..ccb2808 100644
--- a/kernel/sched/isolation.c
+++ b/kernel/sched/isolation.c
@@ -14,6 +14,12 @@ EXPORT_SYMBOL_GPL(housekeeping_overridden);
 static cpumask_var_t housekeeping_mask;
 static unsigned int housekeeping_flags;
 
+bool housekeeping_enabled(enum hk_flags flags)
+{
+	return !!(housekeeping_flags & flags);
+}
+EXPORT_SYMBOL_GPL(housekeeping_enabled);
+
 int housekeeping_any_cpu(enum hk_flags flags)
 {
 	if (static_branch_unlikely(&housekeeping_overridden))
-- 
2.7.4


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v4 3/5] KVM: LAPIC: Ignore timer migration when lapic timer is injected by pi
  2019-06-17 11:24 [PATCH v4 0/5] KVM: LAPIC: Implement Exitless Timer Wanpeng Li
  2019-06-17 11:24 ` [PATCH v4 1/5] KVM: LAPIC: Make lapic timer unpinned Wanpeng Li
  2019-06-17 11:24 ` [PATCH v4 2/5] KVM: LAPIC: inject lapic timer interrupt by posted interrupt Wanpeng Li
@ 2019-06-17 11:24 ` Wanpeng Li
  2019-06-17 11:24 ` [PATCH v4 4/5] KVM: LAPIC: Don't posted inject already-expired timer Wanpeng Li
  2019-06-17 11:24 ` [PATCH v4 5/5] KVM: LAPIC: add advance timer support to pi_inject_timer Wanpeng Li
  4 siblings, 0 replies; 23+ messages in thread
From: Wanpeng Li @ 2019-06-17 11:24 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Radim Krčmář, Marcelo Tosatti

From: Wanpeng Li <wanpengli@tencent.com>

When lapic timer is injected by posted-interrupt, the emulated timer is
offload to the housekeeping cpu. The timer interrupt will be delivered
properly, no need to migrate timer.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
 arch/x86/kvm/lapic.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 9ceeee5..665b1bb 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2520,7 +2520,8 @@ void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu)
 {
 	struct hrtimer *timer;
 
-	if (!lapic_in_kernel(vcpu))
+	if (!lapic_in_kernel(vcpu) ||
+		posted_interrupt_inject_timer(vcpu))
 		return;
 
 	timer = &vcpu->arch.apic->lapic_timer.timer;
-- 
2.7.4


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v4 4/5] KVM: LAPIC: Don't posted inject already-expired timer
  2019-06-17 11:24 [PATCH v4 0/5] KVM: LAPIC: Implement Exitless Timer Wanpeng Li
                   ` (2 preceding siblings ...)
  2019-06-17 11:24 ` [PATCH v4 3/5] KVM: LAPIC: Ignore timer migration when lapic timer is injected by pi Wanpeng Li
@ 2019-06-17 11:24 ` Wanpeng Li
  2019-06-17 11:24 ` [PATCH v4 5/5] KVM: LAPIC: add advance timer support to pi_inject_timer Wanpeng Li
  4 siblings, 0 replies; 23+ messages in thread
From: Wanpeng Li @ 2019-06-17 11:24 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Radim Krčmář, Marcelo Tosatti

From: Wanpeng Li <wanpengli@tencent.com>

already-expired timer interrupt can be injected to guest when vCPU who 
arms the lapic timer re-vmentry, don't posted inject in this case.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
 arch/x86/kvm/lapic.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 665b1bb..1a31389 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1452,7 +1452,7 @@ static void kvm_apic_inject_pending_timer_irqs(struct kvm_lapic *apic)
 	}
 }
 
-static void apic_timer_expired(struct kvm_lapic *apic)
+static void apic_timer_expired(struct kvm_lapic *apic, bool can_pi_inject)
 {
 	struct kvm_vcpu *vcpu = apic->vcpu;
 	struct swait_queue_head *q = &vcpu->wq;
@@ -1461,7 +1461,7 @@ static void apic_timer_expired(struct kvm_lapic *apic)
 	if (atomic_read(&apic->lapic_timer.pending))
 		return;
 
-	if (posted_interrupt_inject_timer(apic->vcpu)) {
+	if (can_pi_inject && posted_interrupt_inject_timer(apic->vcpu)) {
 		kvm_apic_inject_pending_timer_irqs(apic);
 		return;
 	}
@@ -1605,7 +1605,7 @@ static void start_sw_tscdeadline(struct kvm_lapic *apic)
 		expire = ktime_sub_ns(expire, ktimer->timer_advance_ns);
 		hrtimer_start(&ktimer->timer, expire, HRTIMER_MODE_ABS);
 	} else
-		apic_timer_expired(apic);
+		apic_timer_expired(apic, false);
 
 	local_irq_restore(flags);
 }
@@ -1695,7 +1695,7 @@ static void start_sw_period(struct kvm_lapic *apic)
 
 	if (ktime_after(ktime_get(),
 			apic->lapic_timer.target_expiration)) {
-		apic_timer_expired(apic);
+		apic_timer_expired(apic, false);
 
 		if (apic_lvtt_oneshot(apic))
 			return;
@@ -1757,7 +1757,7 @@ static bool start_hv_timer(struct kvm_lapic *apic)
 		if (atomic_read(&ktimer->pending)) {
 			cancel_hv_timer(apic);
 		} else if (expired) {
-			apic_timer_expired(apic);
+			apic_timer_expired(apic, false);
 			cancel_hv_timer(apic);
 		}
 	}
@@ -1807,7 +1807,7 @@ void kvm_lapic_expired_hv_timer(struct kvm_vcpu *vcpu)
 		goto out;
 	WARN_ON(swait_active(&vcpu->wq));
 	cancel_hv_timer(apic);
-	apic_timer_expired(apic);
+	apic_timer_expired(apic, false);
 
 	if (apic_lvtt_period(apic) && apic->lapic_timer.period) {
 		advance_periodic_target_expiration(apic);
@@ -2310,7 +2310,7 @@ static enum hrtimer_restart apic_timer_fn(struct hrtimer *data)
 	struct kvm_timer *ktimer = container_of(data, struct kvm_timer, timer);
 	struct kvm_lapic *apic = container_of(ktimer, struct kvm_lapic, lapic_timer);
 
-	apic_timer_expired(apic);
+	apic_timer_expired(apic, true);
 
 	if (lapic_is_periodic(apic)) {
 		advance_periodic_target_expiration(apic);
-- 
2.7.4


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v4 5/5] KVM: LAPIC: add advance timer support to pi_inject_timer
  2019-06-17 11:24 [PATCH v4 0/5] KVM: LAPIC: Implement Exitless Timer Wanpeng Li
                   ` (3 preceding siblings ...)
  2019-06-17 11:24 ` [PATCH v4 4/5] KVM: LAPIC: Don't posted inject already-expired timer Wanpeng Li
@ 2019-06-17 11:24 ` Wanpeng Li
  2019-06-17 21:32   ` Radim Krčmář
  4 siblings, 1 reply; 23+ messages in thread
From: Wanpeng Li @ 2019-06-17 11:24 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Radim Krčmář, Marcelo Tosatti

From: Wanpeng Li <wanpengli@tencent.com>

Wait before calling posted-interrupt deliver function directly to add 
advance timer support to pi_inject_timer.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
 arch/x86/kvm/lapic.c   | 6 ++++--
 arch/x86/kvm/lapic.h   | 2 +-
 arch/x86/kvm/svm.c     | 2 +-
 arch/x86/kvm/vmx/vmx.c | 2 +-
 4 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 1a31389..1a31ba5 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1462,6 +1462,8 @@ static void apic_timer_expired(struct kvm_lapic *apic, bool can_pi_inject)
 		return;
 
 	if (can_pi_inject && posted_interrupt_inject_timer(apic->vcpu)) {
+		if (apic->lapic_timer.timer_advance_ns)
+			kvm_wait_lapic_expire(vcpu, true);
 		kvm_apic_inject_pending_timer_irqs(apic);
 		return;
 	}
@@ -1553,7 +1555,7 @@ static inline void adjust_lapic_timer_advance(struct kvm_vcpu *vcpu,
 	apic->lapic_timer.timer_advance_ns = timer_advance_ns;
 }
 
-void kvm_wait_lapic_expire(struct kvm_vcpu *vcpu)
+void kvm_wait_lapic_expire(struct kvm_vcpu *vcpu, bool pi_inject)
 {
 	struct kvm_lapic *apic = vcpu->arch.apic;
 	u64 guest_tsc, tsc_deadline;
@@ -1561,7 +1563,7 @@ void kvm_wait_lapic_expire(struct kvm_vcpu *vcpu)
 	if (apic->lapic_timer.expired_tscdeadline == 0)
 		return;
 
-	if (!lapic_timer_int_injected(vcpu))
+	if (!lapic_timer_int_injected(vcpu) && !pi_inject)
 		return;
 
 	tsc_deadline = apic->lapic_timer.expired_tscdeadline;
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index e41936b..3d8a043 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -225,7 +225,7 @@ static inline int kvm_lapic_latched_init(struct kvm_vcpu *vcpu)
 
 bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector);
 
-void kvm_wait_lapic_expire(struct kvm_vcpu *vcpu);
+void kvm_wait_lapic_expire(struct kvm_vcpu *vcpu, bool pi_inject);
 
 bool kvm_intr_is_single_vcpu_fast(struct kvm *kvm, struct kvm_lapic_irq *irq,
 			struct kvm_vcpu **dest_vcpu);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index bbc31f7..7e65de4 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -5648,7 +5648,7 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
 
 	if (lapic_in_kernel(vcpu) &&
 		vcpu->arch.apic->lapic_timer.timer_advance_ns)
-		kvm_wait_lapic_expire(vcpu);
+		kvm_wait_lapic_expire(vcpu, false);
 
 	/*
 	 * If this vCPU has touched SPEC_CTRL, restore the guest's value if
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index f45c51e..718a3ad 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6462,7 +6462,7 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
 
 	if (lapic_in_kernel(vcpu) &&
 		vcpu->arch.apic->lapic_timer.timer_advance_ns)
-		kvm_wait_lapic_expire(vcpu);
+		kvm_wait_lapic_expire(vcpu, false);
 
 	/*
 	 * If this vCPU has touched SPEC_CTRL, restore the guest's value if
-- 
2.7.4


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 1/5] KVM: LAPIC: Make lapic timer unpinned
  2019-06-17 11:24 ` [PATCH v4 1/5] KVM: LAPIC: Make lapic timer unpinned Wanpeng Li
@ 2019-06-17 11:48   ` Peter Xu
  2019-06-18  0:38     ` Wanpeng Li
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Xu @ 2019-06-17 11:48 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: linux-kernel, kvm, Paolo Bonzini, Radim Krčmář,
	Marcelo Tosatti

On Mon, Jun 17, 2019 at 07:24:43PM +0800, Wanpeng Li wrote:
> From: Wanpeng Li <wanpengli@tencent.com>
> 
> Make lapic timer unpinned when timer is injected by posted-interrupt,

It has nothing to do with PI, yet?

And, how about mentioning 61abdbe0bc and telling that this could be
another solution for that problem (but will be used in follow up
patches)?

> the emulated timer can be offload to the housekeeping cpus, kick after 
> setting the pending timer request as alternative to commit 61abdbe0bcc.
> 
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Cc: Marcelo Tosatti <mtosatti@redhat.com>
> Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> ---
>  arch/x86/kvm/lapic.c | 8 ++++----
>  arch/x86/kvm/x86.c   | 6 +-----
>  2 files changed, 5 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index e82a18c..87ecb56 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -1578,7 +1578,7 @@ static void start_sw_tscdeadline(struct kvm_lapic *apic)
>  	    likely(ns > apic->lapic_timer.timer_advance_ns)) {
>  		expire = ktime_add_ns(now, ns);
>  		expire = ktime_sub_ns(expire, ktimer->timer_advance_ns);
> -		hrtimer_start(&ktimer->timer, expire, HRTIMER_MODE_ABS_PINNED);
> +		hrtimer_start(&ktimer->timer, expire, HRTIMER_MODE_ABS);
>  	} else
>  		apic_timer_expired(apic);
>  
> @@ -1680,7 +1680,7 @@ static void start_sw_period(struct kvm_lapic *apic)
>  
>  	hrtimer_start(&apic->lapic_timer.timer,
>  		apic->lapic_timer.target_expiration,
> -		HRTIMER_MODE_ABS_PINNED);
> +		HRTIMER_MODE_ABS);
>  }
>  
>  bool kvm_lapic_hv_timer_in_use(struct kvm_vcpu *vcpu)
> @@ -2317,7 +2317,7 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu, int timer_advance_ns)
>  	apic->vcpu = vcpu;
>  
>  	hrtimer_init(&apic->lapic_timer.timer, CLOCK_MONOTONIC,
> -		     HRTIMER_MODE_ABS_PINNED);
> +		     HRTIMER_MODE_ABS);
>  	apic->lapic_timer.timer.function = apic_timer_fn;
>  	if (timer_advance_ns == -1) {
>  		apic->lapic_timer.timer_advance_ns = 1000;
> @@ -2506,7 +2506,7 @@ void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu)
>  
>  	timer = &vcpu->arch.apic->lapic_timer.timer;
>  	if (hrtimer_cancel(timer))
> -		hrtimer_start_expires(timer, HRTIMER_MODE_ABS_PINNED);
> +		hrtimer_start_expires(timer, HRTIMER_MODE_ABS);
>  }
>  
>  /*
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 0a05a4e..9450a16 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1437,12 +1437,8 @@ static void update_pvclock_gtod(struct timekeeper *tk)
>  
>  void kvm_set_pending_timer(struct kvm_vcpu *vcpu)
>  {
> -	/*
> -	 * Note: KVM_REQ_PENDING_TIMER is implicitly checked in
> -	 * vcpu_enter_guest.  This function is only called from
> -	 * the physical CPU that is running vcpu.
> -	 */
>  	kvm_make_request(KVM_REQ_PENDING_TIMER, vcpu);
> +	kvm_vcpu_kick(vcpu);
>  }
>  
>  static void kvm_write_wall_clock(struct kvm *kvm, gpa_t wall_clock)
> -- 
> 2.7.4
> 

Regards,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 5/5] KVM: LAPIC: add advance timer support to pi_inject_timer
  2019-06-17 11:24 ` [PATCH v4 5/5] KVM: LAPIC: add advance timer support to pi_inject_timer Wanpeng Li
@ 2019-06-17 21:32   ` Radim Krčmář
  2019-06-18  0:44     ` Wanpeng Li
  0 siblings, 1 reply; 23+ messages in thread
From: Radim Krčmář @ 2019-06-17 21:32 UTC (permalink / raw)
  To: Wanpeng Li; +Cc: linux-kernel, kvm, Paolo Bonzini, Marcelo Tosatti

2019-06-17 19:24+0800, Wanpeng Li:
> From: Wanpeng Li <wanpengli@tencent.com>
> 
> Wait before calling posted-interrupt deliver function directly to add 
> advance timer support to pi_inject_timer.
> 
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Cc: Marcelo Tosatti <mtosatti@redhat.com>
> Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> ---

Please merge this patch with [2/5], so bisection doesn't break.

>  arch/x86/kvm/lapic.c   | 6 ++++--
>  arch/x86/kvm/lapic.h   | 2 +-
>  arch/x86/kvm/svm.c     | 2 +-
>  arch/x86/kvm/vmx/vmx.c | 2 +-
>  4 files changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 1a31389..1a31ba5 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -1462,6 +1462,8 @@ static void apic_timer_expired(struct kvm_lapic *apic, bool can_pi_inject)
>  		return;
>  
>  	if (can_pi_inject && posted_interrupt_inject_timer(apic->vcpu)) {
> +		if (apic->lapic_timer.timer_advance_ns)
> +			kvm_wait_lapic_expire(vcpu, true);

From where does kvm_wait_lapic_expire() take
apic->lapic_timer.expired_tscdeadline?

(I think it would be best to take the functional core of
 kvm_wait_lapic_expire() and make it into a function that takes the
 expired_tscdeadline as an argument.)

Thanks.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 1/5] KVM: LAPIC: Make lapic timer unpinned
  2019-06-17 11:48   ` Peter Xu
@ 2019-06-18  0:38     ` Wanpeng Li
  0 siblings, 0 replies; 23+ messages in thread
From: Wanpeng Li @ 2019-06-18  0:38 UTC (permalink / raw)
  To: Peter Xu
  Cc: LKML, kvm, Paolo Bonzini, Radim Krčmář, Marcelo Tosatti

On Mon, 17 Jun 2019 at 19:48, Peter Xu <peterx@redhat.com> wrote:
>
> On Mon, Jun 17, 2019 at 07:24:43PM +0800, Wanpeng Li wrote:
> > From: Wanpeng Li <wanpengli@tencent.com>
> >
> > Make lapic timer unpinned when timer is injected by posted-interrupt,
>
> It has nothing to do with PI, yet?
>
> And, how about mentioning 61abdbe0bc and telling that this could be
> another solution for that problem (but will be used in follow up
> patches)?
>
> > the emulated timer can be offload to the housekeeping cpus, kick after
> > setting the pending timer request as alternative to commit 61abdbe0bcc.

Yeah, the patch description needs to rework. Thank you.

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 5/5] KVM: LAPIC: add advance timer support to pi_inject_timer
  2019-06-17 21:32   ` Radim Krčmář
@ 2019-06-18  0:44     ` Wanpeng Li
  2019-06-18  0:57       ` Wanpeng Li
  0 siblings, 1 reply; 23+ messages in thread
From: Wanpeng Li @ 2019-06-18  0:44 UTC (permalink / raw)
  To: Radim Krčmář; +Cc: LKML, kvm, Paolo Bonzini, Marcelo Tosatti

On Tue, 18 Jun 2019 at 05:32, Radim Krčmář <rkrcmar@redhat.com> wrote:
>
> 2019-06-17 19:24+0800, Wanpeng Li:
> > From: Wanpeng Li <wanpengli@tencent.com>
> >
> > Wait before calling posted-interrupt deliver function directly to add
> > advance timer support to pi_inject_timer.
> >
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: Radim Krčmář <rkrcmar@redhat.com>
> > Cc: Marcelo Tosatti <mtosatti@redhat.com>
> > Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> > ---
>
> Please merge this patch with [2/5], so bisection doesn't break.

Agreed.

>
> >  arch/x86/kvm/lapic.c   | 6 ++++--
> >  arch/x86/kvm/lapic.h   | 2 +-
> >  arch/x86/kvm/svm.c     | 2 +-
> >  arch/x86/kvm/vmx/vmx.c | 2 +-
> >  4 files changed, 7 insertions(+), 5 deletions(-)
> >
> > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > index 1a31389..1a31ba5 100644
> > --- a/arch/x86/kvm/lapic.c
> > +++ b/arch/x86/kvm/lapic.c
> > @@ -1462,6 +1462,8 @@ static void apic_timer_expired(struct kvm_lapic *apic, bool can_pi_inject)
> >               return;
> >
> >       if (can_pi_inject && posted_interrupt_inject_timer(apic->vcpu)) {
> > +             if (apic->lapic_timer.timer_advance_ns)
> > +                     kvm_wait_lapic_expire(vcpu, true);
>
> From where does kvm_wait_lapic_expire() take
> apic->lapic_timer.expired_tscdeadline?

Sorry, I failed to understand this.
https://git.kernel.org/pub/scm/virt/kvm/kvm.git/tree/arch/x86/kvm/lapic.c?h=queue#n1541
We can get apic->lapic_timer.expired_tscdeadline in
kvm_wait_lapic_expire() directly.

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 5/5] KVM: LAPIC: add advance timer support to pi_inject_timer
  2019-06-18  0:44     ` Wanpeng Li
@ 2019-06-18  0:57       ` Wanpeng Li
  0 siblings, 0 replies; 23+ messages in thread
From: Wanpeng Li @ 2019-06-18  0:57 UTC (permalink / raw)
  To: Radim Krčmář; +Cc: LKML, kvm, Paolo Bonzini, Marcelo Tosatti

On Tue, 18 Jun 2019 at 08:44, Wanpeng Li <kernellwp@gmail.com> wrote:
>
> On Tue, 18 Jun 2019 at 05:32, Radim Krčmář <rkrcmar@redhat.com> wrote:
> >
> > 2019-06-17 19:24+0800, Wanpeng Li:
> > > From: Wanpeng Li <wanpengli@tencent.com>
> > >
> > > Wait before calling posted-interrupt deliver function directly to add
> > > advance timer support to pi_inject_timer.
> > >
> > > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > > Cc: Radim Krčmář <rkrcmar@redhat.com>
> > > Cc: Marcelo Tosatti <mtosatti@redhat.com>
> > > Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> > > ---
> >
> > Please merge this patch with [2/5], so bisection doesn't break.
>
> Agreed.
>
> >
> > >  arch/x86/kvm/lapic.c   | 6 ++++--
> > >  arch/x86/kvm/lapic.h   | 2 +-
> > >  arch/x86/kvm/svm.c     | 2 +-
> > >  arch/x86/kvm/vmx/vmx.c | 2 +-
> > >  4 files changed, 7 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > > index 1a31389..1a31ba5 100644
> > > --- a/arch/x86/kvm/lapic.c
> > > +++ b/arch/x86/kvm/lapic.c
> > > @@ -1462,6 +1462,8 @@ static void apic_timer_expired(struct kvm_lapic *apic, bool can_pi_inject)
> > >               return;
> > >
> > >       if (can_pi_inject && posted_interrupt_inject_timer(apic->vcpu)) {
> > > +             if (apic->lapic_timer.timer_advance_ns)
> > > +                     kvm_wait_lapic_expire(vcpu, true);
> >
> > From where does kvm_wait_lapic_expire() take
> > apic->lapic_timer.expired_tscdeadline?
>
> Sorry, I failed to understand this.
> https://git.kernel.org/pub/scm/virt/kvm/kvm.git/tree/arch/x86/kvm/lapic.c?h=queue#n1541
> We can get apic->lapic_timer.expired_tscdeadline in
> kvm_wait_lapic_expire() directly.

Oh, miss the latest expired_tscdeadline, how about something like below?

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 1a31ba5..7cd95ea 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1461,6 +1461,9 @@ static void apic_timer_expired(struct kvm_lapic
*apic, bool can_pi_inject)
     if (atomic_read(&apic->lapic_timer.pending))
         return;

+    if (apic_lvtt_tscdeadline(apic) || ktimer->hv_timer_in_use)
+        ktimer->expired_tscdeadline = ktimer->tscdeadline;
+
     if (can_pi_inject && posted_interrupt_inject_timer(apic->vcpu)) {
         if (apic->lapic_timer.timer_advance_ns)
             kvm_wait_lapic_expire(vcpu, true);
@@ -1477,9 +1480,6 @@ static void apic_timer_expired(struct kvm_lapic
*apic, bool can_pi_inject)
      */
     if (swait_active(q))
         swake_up_one(q);
-
-    if (apic_lvtt_tscdeadline(apic) || ktimer->hv_timer_in_use)
-        ktimer->expired_tscdeadline = ktimer->tscdeadline;
 }

 /*

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 2/5] KVM: LAPIC: inject lapic timer interrupt by posted interrupt
  2019-06-17 11:24 ` [PATCH v4 2/5] KVM: LAPIC: inject lapic timer interrupt by posted interrupt Wanpeng Li
@ 2019-06-18 13:35   ` Marcelo Tosatti
  2019-06-19  0:36     ` Wanpeng Li
  0 siblings, 1 reply; 23+ messages in thread
From: Marcelo Tosatti @ 2019-06-18 13:35 UTC (permalink / raw)
  To: Wanpeng Li, Paolo Bonzini
  Cc: linux-kernel, kvm, Paolo Bonzini, Radim Krčmář

On Mon, Jun 17, 2019 at 07:24:44PM +0800, Wanpeng Li wrote:
> From: Wanpeng Li <wanpengli@tencent.com>
> 
> Dedicated instances are currently disturbed by unnecessary jitter due 
> to the emulated lapic timers fire on the same pCPUs which vCPUs resident.
> There is no hardware virtual timer on Intel for guest like ARM. Both 
> programming timer in guest and the emulated timer fires incur vmexits.
> This patch tries to avoid vmexit which is incurred by the emulated 
> timer fires in dedicated instance scenario. 
> 
> When nohz_full is enabled in dedicated instances scenario, the emulated 
> timers can be offload to the nearest busy housekeeping cpus since APICv 
> is really common in recent years. The guest timer interrupt is injected 
> by posted-interrupt which is delivered by housekeeping cpu once the emulated 
> timer fires. 
> 
> The host admin should fine tuned, e.g. dedicated instances scenario w/ 
> nohz_full cover the pCPUs which vCPUs resident, several pCPUs surplus 
> for busy housekeeping, disable mwait/hlt/pause vmexits to keep in non-root  
> mode, ~3% redis performance benefit can be observed on Skylake server.
> 
> w/o patch:
> 
>             VM-EXIT  Samples  Samples%  Time%   Min Time  Max Time   Avg time
> 
> EXTERNAL_INTERRUPT    42916    49.43%   39.30%   0.47us   106.09us   0.71us ( +-   1.09% )
> 
> w/ patch:
> 
>             VM-EXIT  Samples  Samples%  Time%   Min Time  Max Time         Avg time
> 
> EXTERNAL_INTERRUPT    6871     9.29%     2.96%   0.44us    57.88us   0.72us ( +-   4.02% )
> 
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Cc: Marcelo Tosatti <mtosatti@redhat.com>
> Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> ---
>  arch/x86/kvm/lapic.c            | 33 ++++++++++++++++++++++++++-------
>  arch/x86/kvm/lapic.h            |  1 +
>  arch/x86/kvm/vmx/vmx.c          |  3 ++-
>  arch/x86/kvm/x86.c              |  5 +++++
>  arch/x86/kvm/x86.h              |  2 ++
>  include/linux/sched/isolation.h |  2 ++
>  kernel/sched/isolation.c        |  6 ++++++
>  7 files changed, 44 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 87ecb56..9ceeee5 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -122,6 +122,13 @@ static inline u32 kvm_x2apic_id(struct kvm_lapic *apic)
>  	return apic->vcpu->vcpu_id;
>  }
>  
> +bool posted_interrupt_inject_timer(struct kvm_vcpu *vcpu)
> +{
> +	return pi_inject_timer && kvm_vcpu_apicv_active(vcpu) &&
> +		kvm_hlt_in_guest(vcpu->kvm);
> +}
> +EXPORT_SYMBOL_GPL(posted_interrupt_inject_timer);

Paolo, can you explain the reasoning behind this?

Should not be necessary...


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 2/5] KVM: LAPIC: inject lapic timer interrupt by posted interrupt
  2019-06-18 13:35   ` Marcelo Tosatti
@ 2019-06-19  0:36     ` Wanpeng Li
  2019-06-19 21:03       ` Marcelo Tosatti
  0 siblings, 1 reply; 23+ messages in thread
From: Wanpeng Li @ 2019-06-19  0:36 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Paolo Bonzini, LKML, kvm, Radim Krčmář

On Tue, 18 Jun 2019 at 21:36, Marcelo Tosatti <mtosatti@redhat.com> wrote:
>
> On Mon, Jun 17, 2019 at 07:24:44PM +0800, Wanpeng Li wrote:
> > From: Wanpeng Li <wanpengli@tencent.com>
> >
> > Dedicated instances are currently disturbed by unnecessary jitter due
> > to the emulated lapic timers fire on the same pCPUs which vCPUs resident.
> > There is no hardware virtual timer on Intel for guest like ARM. Both
> > programming timer in guest and the emulated timer fires incur vmexits.
> > This patch tries to avoid vmexit which is incurred by the emulated
> > timer fires in dedicated instance scenario.
> >
> > When nohz_full is enabled in dedicated instances scenario, the emulated
> > timers can be offload to the nearest busy housekeeping cpus since APICv
> > is really common in recent years. The guest timer interrupt is injected
> > by posted-interrupt which is delivered by housekeeping cpu once the emulated
> > timer fires.
> >
> > The host admin should fine tuned, e.g. dedicated instances scenario w/
> > nohz_full cover the pCPUs which vCPUs resident, several pCPUs surplus
> > for busy housekeeping, disable mwait/hlt/pause vmexits to keep in non-root
> > mode, ~3% redis performance benefit can be observed on Skylake server.
> >
> > w/o patch:
> >
> >             VM-EXIT  Samples  Samples%  Time%   Min Time  Max Time   Avg time
> >
> > EXTERNAL_INTERRUPT    42916    49.43%   39.30%   0.47us   106.09us   0.71us ( +-   1.09% )
> >
> > w/ patch:
> >
> >             VM-EXIT  Samples  Samples%  Time%   Min Time  Max Time         Avg time
> >
> > EXTERNAL_INTERRUPT    6871     9.29%     2.96%   0.44us    57.88us   0.72us ( +-   4.02% )
> >
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: Radim Krčmář <rkrcmar@redhat.com>
> > Cc: Marcelo Tosatti <mtosatti@redhat.com>
> > Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> > ---
> >  arch/x86/kvm/lapic.c            | 33 ++++++++++++++++++++++++++-------
> >  arch/x86/kvm/lapic.h            |  1 +
> >  arch/x86/kvm/vmx/vmx.c          |  3 ++-
> >  arch/x86/kvm/x86.c              |  5 +++++
> >  arch/x86/kvm/x86.h              |  2 ++
> >  include/linux/sched/isolation.h |  2 ++
> >  kernel/sched/isolation.c        |  6 ++++++
> >  7 files changed, 44 insertions(+), 8 deletions(-)
> >
> > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > index 87ecb56..9ceeee5 100644
> > --- a/arch/x86/kvm/lapic.c
> > +++ b/arch/x86/kvm/lapic.c
> > @@ -122,6 +122,13 @@ static inline u32 kvm_x2apic_id(struct kvm_lapic *apic)
> >       return apic->vcpu->vcpu_id;
> >  }
> >
> > +bool posted_interrupt_inject_timer(struct kvm_vcpu *vcpu)
> > +{
> > +     return pi_inject_timer && kvm_vcpu_apicv_active(vcpu) &&
> > +             kvm_hlt_in_guest(vcpu->kvm);
> > +}
> > +EXPORT_SYMBOL_GPL(posted_interrupt_inject_timer);
>
> Paolo, can you explain the reasoning behind this?
>
> Should not be necessary...

Here some new discussions:
https://lkml.org/lkml/2019/6/13/1423
https://lkml.org/lkml/2019/6/13/1420

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 2/5] KVM: LAPIC: inject lapic timer interrupt by posted interrupt
  2019-06-19  0:36     ` Wanpeng Li
@ 2019-06-19 21:03       ` Marcelo Tosatti
  2019-06-20  0:52         ` Wanpeng Li
  2019-06-21  1:42         ` Wanpeng Li
  0 siblings, 2 replies; 23+ messages in thread
From: Marcelo Tosatti @ 2019-06-19 21:03 UTC (permalink / raw)
  To: Wanpeng Li; +Cc: Paolo Bonzini, LKML, kvm, Radim Krčmář

Hi Li,

On Wed, Jun 19, 2019 at 08:36:06AM +0800, Wanpeng Li wrote:
> On Tue, 18 Jun 2019 at 21:36, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> >
> > On Mon, Jun 17, 2019 at 07:24:44PM +0800, Wanpeng Li wrote:
> > > From: Wanpeng Li <wanpengli@tencent.com>
> > >
> > > Dedicated instances are currently disturbed by unnecessary jitter due
> > > to the emulated lapic timers fire on the same pCPUs which vCPUs resident.
> > > There is no hardware virtual timer on Intel for guest like ARM. Both
> > > programming timer in guest and the emulated timer fires incur vmexits.
> > > This patch tries to avoid vmexit which is incurred by the emulated
> > > timer fires in dedicated instance scenario.
> > >
> > > When nohz_full is enabled in dedicated instances scenario, the emulated
> > > timers can be offload to the nearest busy housekeeping cpus since APICv
> > > is really common in recent years. The guest timer interrupt is injected
> > > by posted-interrupt which is delivered by housekeeping cpu once the emulated
> > > timer fires.
> > >
> > > The host admin should fine tuned, e.g. dedicated instances scenario w/
> > > nohz_full cover the pCPUs which vCPUs resident, several pCPUs surplus
> > > for busy housekeeping, disable mwait/hlt/pause vmexits to keep in non-root
> > > mode, ~3% redis performance benefit can be observed on Skylake server.
> > >
> > > w/o patch:
> > >
> > >             VM-EXIT  Samples  Samples%  Time%   Min Time  Max Time   Avg time
> > >
> > > EXTERNAL_INTERRUPT    42916    49.43%   39.30%   0.47us   106.09us   0.71us ( +-   1.09% )
> > >
> > > w/ patch:
> > >
> > >             VM-EXIT  Samples  Samples%  Time%   Min Time  Max Time         Avg time
> > >
> > > EXTERNAL_INTERRUPT    6871     9.29%     2.96%   0.44us    57.88us   0.72us ( +-   4.02% )
> > >
> > > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > > Cc: Radim Krčmář <rkrcmar@redhat.com>
> > > Cc: Marcelo Tosatti <mtosatti@redhat.com>
> > > Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> > > ---
> > >  arch/x86/kvm/lapic.c            | 33 ++++++++++++++++++++++++++-------
> > >  arch/x86/kvm/lapic.h            |  1 +
> > >  arch/x86/kvm/vmx/vmx.c          |  3 ++-
> > >  arch/x86/kvm/x86.c              |  5 +++++
> > >  arch/x86/kvm/x86.h              |  2 ++
> > >  include/linux/sched/isolation.h |  2 ++
> > >  kernel/sched/isolation.c        |  6 ++++++
> > >  7 files changed, 44 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > > index 87ecb56..9ceeee5 100644
> > > --- a/arch/x86/kvm/lapic.c
> > > +++ b/arch/x86/kvm/lapic.c
> > > @@ -122,6 +122,13 @@ static inline u32 kvm_x2apic_id(struct kvm_lapic *apic)
> > >       return apic->vcpu->vcpu_id;
> > >  }
> > >
> > > +bool posted_interrupt_inject_timer(struct kvm_vcpu *vcpu)
> > > +{
> > > +     return pi_inject_timer && kvm_vcpu_apicv_active(vcpu) &&
> > > +             kvm_hlt_in_guest(vcpu->kvm);
> > > +}
> > > +EXPORT_SYMBOL_GPL(posted_interrupt_inject_timer);
> >
> > Paolo, can you explain the reasoning behind this?
> >
> > Should not be necessary...
> 
> Here some new discussions:
> https://lkml.org/lkml/2019/6/13/1423

Not sure what this has to do with injecting timer
interrupts via posted interrupts ?

> https://lkml.org/lkml/2019/6/13/1420

Two things (unrelated to the above):

1) hrtimer_reprogram is unable to wakeup a remote vCPU, therefore 
i believe execution of apic_timer_expired can be delayed. 
Should wakeup the CPU which hosts apic_timer_expired.


        /*
         * If the timer is not on the current cpu, we cannot reprogram
         * the other cpus clock event device.
         */
        if (base->cpu_base != cpu_base)
                return;

2) Getting an oops when running cyclictest, debugging...

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 2/5] KVM: LAPIC: inject lapic timer interrupt by posted interrupt
  2019-06-19 21:03       ` Marcelo Tosatti
@ 2019-06-20  0:52         ` Wanpeng Li
  2019-06-21  1:42         ` Wanpeng Li
  1 sibling, 0 replies; 23+ messages in thread
From: Wanpeng Li @ 2019-06-20  0:52 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Paolo Bonzini, LKML, kvm, Radim Krčmář

On Thu, 20 Jun 2019 at 05:04, Marcelo Tosatti <mtosatti@redhat.com> wrote:
>
> Hi Li,
>
> On Wed, Jun 19, 2019 at 08:36:06AM +0800, Wanpeng Li wrote:
> > On Tue, 18 Jun 2019 at 21:36, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> > >
> > > On Mon, Jun 17, 2019 at 07:24:44PM +0800, Wanpeng Li wrote:
> > > > From: Wanpeng Li <wanpengli@tencent.com>
> > > >
> > > > Dedicated instances are currently disturbed by unnecessary jitter due
> > > > to the emulated lapic timers fire on the same pCPUs which vCPUs resident.
> > > > There is no hardware virtual timer on Intel for guest like ARM. Both
> > > > programming timer in guest and the emulated timer fires incur vmexits.
> > > > This patch tries to avoid vmexit which is incurred by the emulated
> > > > timer fires in dedicated instance scenario.
> > > >
> > > > When nohz_full is enabled in dedicated instances scenario, the emulated
> > > > timers can be offload to the nearest busy housekeeping cpus since APICv
> > > > is really common in recent years. The guest timer interrupt is injected
> > > > by posted-interrupt which is delivered by housekeeping cpu once the emulated
> > > > timer fires.
> > > >
> > > > The host admin should fine tuned, e.g. dedicated instances scenario w/
> > > > nohz_full cover the pCPUs which vCPUs resident, several pCPUs surplus
> > > > for busy housekeeping, disable mwait/hlt/pause vmexits to keep in non-root
> > > > mode, ~3% redis performance benefit can be observed on Skylake server.
> > > >
> > > > w/o patch:
> > > >
> > > >             VM-EXIT  Samples  Samples%  Time%   Min Time  Max Time   Avg time
> > > >
> > > > EXTERNAL_INTERRUPT    42916    49.43%   39.30%   0.47us   106.09us   0.71us ( +-   1.09% )
> > > >
> > > > w/ patch:
> > > >
> > > >             VM-EXIT  Samples  Samples%  Time%   Min Time  Max Time         Avg time
> > > >
> > > > EXTERNAL_INTERRUPT    6871     9.29%     2.96%   0.44us    57.88us   0.72us ( +-   4.02% )
> > > >
> > > > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > > > Cc: Radim Krčmář <rkrcmar@redhat.com>
> > > > Cc: Marcelo Tosatti <mtosatti@redhat.com>
> > > > Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> > > > ---
> > > >  arch/x86/kvm/lapic.c            | 33 ++++++++++++++++++++++++++-------
> > > >  arch/x86/kvm/lapic.h            |  1 +
> > > >  arch/x86/kvm/vmx/vmx.c          |  3 ++-
> > > >  arch/x86/kvm/x86.c              |  5 +++++
> > > >  arch/x86/kvm/x86.h              |  2 ++
> > > >  include/linux/sched/isolation.h |  2 ++
> > > >  kernel/sched/isolation.c        |  6 ++++++
> > > >  7 files changed, 44 insertions(+), 8 deletions(-)
> > > >
> > > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > > > index 87ecb56..9ceeee5 100644
> > > > --- a/arch/x86/kvm/lapic.c
> > > > +++ b/arch/x86/kvm/lapic.c
> > > > @@ -122,6 +122,13 @@ static inline u32 kvm_x2apic_id(struct kvm_lapic *apic)
> > > >       return apic->vcpu->vcpu_id;
> > > >  }
> > > >
> > > > +bool posted_interrupt_inject_timer(struct kvm_vcpu *vcpu)
> > > > +{
> > > > +     return pi_inject_timer && kvm_vcpu_apicv_active(vcpu) &&
> > > > +             kvm_hlt_in_guest(vcpu->kvm);
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(posted_interrupt_inject_timer);
> > >
> > > Paolo, can you explain the reasoning behind this?
> > >
> > > Should not be necessary...
> >
> > Here some new discussions:
> > https://lkml.org/lkml/2019/6/13/1423
>
> Not sure what this has to do with injecting timer
> interrupts via posted interrupts ?

Yeah, need more explain from Paolo! Ping Paolo,

>
> > https://lkml.org/lkml/2019/6/13/1420
>
> Two things (unrelated to the above):
>
> 1) hrtimer_reprogram is unable to wakeup a remote vCPU, therefore
> i believe execution of apic_timer_expired can be delayed.
> Should wakeup the CPU which hosts apic_timer_expired.
>
>
>         /*
>          * If the timer is not on the current cpu, we cannot reprogram
>          * the other cpus clock event device.
>          */
>         if (base->cpu_base != cpu_base)
>                 return;

If it is not the first expiring timer on the new target, we don't need
to reprogram. It can't be the first expired timer on the new target
since below:

/*
 * We switch the timer base to a power-optimized selected CPU target,
 * if:
 * - NO_HZ_COMMON is enabled
 * - timer migration is enabled
 * - the timer callback is not running
 * - the timer is not the first expiring timer on the new target
 *
 * If one of the above requirements is not fulfilled we move the timer
 * to the current CPU or leave it on the previously assigned CPU if
 * the timer callback is currently running.
 */
static inline struct hrtimer_clock_base *
switch_hrtimer_base(struct hrtimer *timer, struct hrtimer_clock_base *base,
   int pinned)

>
> 2) Getting an oops when running cyclictest, debugging...

Radim point out one issue in patch 5/5, not sure that is cause.

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 2/5] KVM: LAPIC: inject lapic timer interrupt by posted interrupt
  2019-06-19 21:03       ` Marcelo Tosatti
  2019-06-20  0:52         ` Wanpeng Li
@ 2019-06-21  1:42         ` Wanpeng Li
  2019-06-21 21:42           ` Marcelo Tosatti
  1 sibling, 1 reply; 23+ messages in thread
From: Wanpeng Li @ 2019-06-21  1:42 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Paolo Bonzini, LKML, kvm, Radim Krčmář

On Thu, 20 Jun 2019 at 05:04, Marcelo Tosatti <mtosatti@redhat.com> wrote:
>
> Hi Li,
>
> On Wed, Jun 19, 2019 at 08:36:06AM +0800, Wanpeng Li wrote:
> > On Tue, 18 Jun 2019 at 21:36, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> > >
> > > On Mon, Jun 17, 2019 at 07:24:44PM +0800, Wanpeng Li wrote:
> > > > From: Wanpeng Li <wanpengli@tencent.com>
> > > >
> > > > Dedicated instances are currently disturbed by unnecessary jitter due
> > > > to the emulated lapic timers fire on the same pCPUs which vCPUs resident.
> > > > There is no hardware virtual timer on Intel for guest like ARM. Both
> > > > programming timer in guest and the emulated timer fires incur vmexits.
> > > > This patch tries to avoid vmexit which is incurred by the emulated
> > > > timer fires in dedicated instance scenario.
> > > >
> > > > When nohz_full is enabled in dedicated instances scenario, the emulated
> > > > timers can be offload to the nearest busy housekeeping cpus since APICv
> > > > is really common in recent years. The guest timer interrupt is injected
> > > > by posted-interrupt which is delivered by housekeeping cpu once the emulated
> > > > timer fires.
> > > >
> > > > The host admin should fine tuned, e.g. dedicated instances scenario w/
> > > > nohz_full cover the pCPUs which vCPUs resident, several pCPUs surplus
> > > > for busy housekeeping, disable mwait/hlt/pause vmexits to keep in non-root
> > > > mode, ~3% redis performance benefit can be observed on Skylake server.
> > > >
> > > > w/o patch:
> > > >
> > > >             VM-EXIT  Samples  Samples%  Time%   Min Time  Max Time   Avg time
> > > >
> > > > EXTERNAL_INTERRUPT    42916    49.43%   39.30%   0.47us   106.09us   0.71us ( +-   1.09% )
> > > >
> > > > w/ patch:
> > > >
> > > >             VM-EXIT  Samples  Samples%  Time%   Min Time  Max Time         Avg time
> > > >
> > > > EXTERNAL_INTERRUPT    6871     9.29%     2.96%   0.44us    57.88us   0.72us ( +-   4.02% )
> > > >
> > > > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > > > Cc: Radim Krčmář <rkrcmar@redhat.com>
> > > > Cc: Marcelo Tosatti <mtosatti@redhat.com>
> > > > Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> > > > ---
> > > >  arch/x86/kvm/lapic.c            | 33 ++++++++++++++++++++++++++-------
> > > >  arch/x86/kvm/lapic.h            |  1 +
> > > >  arch/x86/kvm/vmx/vmx.c          |  3 ++-
> > > >  arch/x86/kvm/x86.c              |  5 +++++
> > > >  arch/x86/kvm/x86.h              |  2 ++
> > > >  include/linux/sched/isolation.h |  2 ++
> > > >  kernel/sched/isolation.c        |  6 ++++++
> > > >  7 files changed, 44 insertions(+), 8 deletions(-)
> > > >
> > > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > > > index 87ecb56..9ceeee5 100644
> > > > --- a/arch/x86/kvm/lapic.c
> > > > +++ b/arch/x86/kvm/lapic.c
> > > > @@ -122,6 +122,13 @@ static inline u32 kvm_x2apic_id(struct kvm_lapic *apic)
> > > >       return apic->vcpu->vcpu_id;
> > > >  }
> > > >
> > > > +bool posted_interrupt_inject_timer(struct kvm_vcpu *vcpu)
> > > > +{
> > > > +     return pi_inject_timer && kvm_vcpu_apicv_active(vcpu) &&
> > > > +             kvm_hlt_in_guest(vcpu->kvm);
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(posted_interrupt_inject_timer);
> > >
> > > Paolo, can you explain the reasoning behind this?
> > >
> > > Should not be necessary...

https://lkml.org/lkml/2019/6/5/436  "Here you need to check
kvm_halt_in_guest, not kvm_mwait_in_guest, because you need to go
through kvm_apic_expired if the guest needs to be woken up from
kvm_vcpu_block."

I think we can still be woken up from kvm_vcpu_block() if pir is set.

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 2/5] KVM: LAPIC: inject lapic timer interrupt by posted interrupt
  2019-06-21  1:42         ` Wanpeng Li
@ 2019-06-21 21:42           ` Marcelo Tosatti
  2019-06-24  8:53             ` Wanpeng Li
  2019-06-25 17:02             ` Paolo Bonzini
  0 siblings, 2 replies; 23+ messages in thread
From: Marcelo Tosatti @ 2019-06-21 21:42 UTC (permalink / raw)
  To: Wanpeng Li; +Cc: Paolo Bonzini, LKML, kvm, Radim Krčmář

On Fri, Jun 21, 2019 at 09:42:39AM +0800, Wanpeng Li wrote:
> On Thu, 20 Jun 2019 at 05:04, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> >
> > Hi Li,
> >
> > On Wed, Jun 19, 2019 at 08:36:06AM +0800, Wanpeng Li wrote:
> > > On Tue, 18 Jun 2019 at 21:36, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> > > >
> > > > On Mon, Jun 17, 2019 at 07:24:44PM +0800, Wanpeng Li wrote:
> > > > > From: Wanpeng Li <wanpengli@tencent.com>
> > > > >
> > > > > Dedicated instances are currently disturbed by unnecessary jitter due
> > > > > to the emulated lapic timers fire on the same pCPUs which vCPUs resident.
> > > > > There is no hardware virtual timer on Intel for guest like ARM. Both
> > > > > programming timer in guest and the emulated timer fires incur vmexits.
> > > > > This patch tries to avoid vmexit which is incurred by the emulated
> > > > > timer fires in dedicated instance scenario.
> > > > >
> > > > > When nohz_full is enabled in dedicated instances scenario, the emulated
> > > > > timers can be offload to the nearest busy housekeeping cpus since APICv
> > > > > is really common in recent years. The guest timer interrupt is injected
> > > > > by posted-interrupt which is delivered by housekeeping cpu once the emulated
> > > > > timer fires.
> > > > >
> > > > > The host admin should fine tuned, e.g. dedicated instances scenario w/
> > > > > nohz_full cover the pCPUs which vCPUs resident, several pCPUs surplus
> > > > > for busy housekeeping, disable mwait/hlt/pause vmexits to keep in non-root
> > > > > mode, ~3% redis performance benefit can be observed on Skylake server.
> > > > >
> > > > > w/o patch:
> > > > >
> > > > >             VM-EXIT  Samples  Samples%  Time%   Min Time  Max Time   Avg time
> > > > >
> > > > > EXTERNAL_INTERRUPT    42916    49.43%   39.30%   0.47us   106.09us   0.71us ( +-   1.09% )
> > > > >
> > > > > w/ patch:
> > > > >
> > > > >             VM-EXIT  Samples  Samples%  Time%   Min Time  Max Time         Avg time
> > > > >
> > > > > EXTERNAL_INTERRUPT    6871     9.29%     2.96%   0.44us    57.88us   0.72us ( +-   4.02% )
> > > > >
> > > > > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > > > > Cc: Radim Krčmář <rkrcmar@redhat.com>
> > > > > Cc: Marcelo Tosatti <mtosatti@redhat.com>
> > > > > Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> > > > > ---
> > > > >  arch/x86/kvm/lapic.c            | 33 ++++++++++++++++++++++++++-------
> > > > >  arch/x86/kvm/lapic.h            |  1 +
> > > > >  arch/x86/kvm/vmx/vmx.c          |  3 ++-
> > > > >  arch/x86/kvm/x86.c              |  5 +++++
> > > > >  arch/x86/kvm/x86.h              |  2 ++
> > > > >  include/linux/sched/isolation.h |  2 ++
> > > > >  kernel/sched/isolation.c        |  6 ++++++
> > > > >  7 files changed, 44 insertions(+), 8 deletions(-)
> > > > >
> > > > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > > > > index 87ecb56..9ceeee5 100644
> > > > > --- a/arch/x86/kvm/lapic.c
> > > > > +++ b/arch/x86/kvm/lapic.c
> > > > > @@ -122,6 +122,13 @@ static inline u32 kvm_x2apic_id(struct kvm_lapic *apic)
> > > > >       return apic->vcpu->vcpu_id;
> > > > >  }
> > > > >
> > > > > +bool posted_interrupt_inject_timer(struct kvm_vcpu *vcpu)
> > > > > +{
> > > > > +     return pi_inject_timer && kvm_vcpu_apicv_active(vcpu) &&
> > > > > +             kvm_hlt_in_guest(vcpu->kvm);
> > > > > +}
> > > > > +EXPORT_SYMBOL_GPL(posted_interrupt_inject_timer);
> > > >
> > > > Paolo, can you explain the reasoning behind this?
> > > >
> > > > Should not be necessary...
> 
> https://lkml.org/lkml/2019/6/5/436  "Here you need to check
> kvm_halt_in_guest, not kvm_mwait_in_guest, because you need to go
> through kvm_apic_expired if the guest needs to be woken up from
> kvm_vcpu_block."

Ah, i think he means that a sleeping vcpu (in kvm_vcpu_block) must
be woken up, if it receives a timer interrupt.

But your patch will go through:

kvm_apic_inject_pending_timer_irqs
__apic_accept_irq -> 
vmx_deliver_posted_interrupt ->
kvm_vcpu_trigger_posted_interrupt returns false
(because vcpu->mode != IN_GUEST_MODE) ->
kvm_vcpu_kick

Which will wakeup the vcpu.

Apart from this oops, which triggers when running:
taskset -c 1 ./cyclictest -D 3600 -p 99 -t 1 -h 30 -m -n  -i 50000 -b 40

Timer interruption from housekeeping vcpus is normal to me 
(without requiring kvm_hlt_in_guest).

[ 1145.849646] BUG: kernel NULL pointer dereference, address:
0000000000000000
[ 1145.850481] #PF: supervisor instruction fetch in kernel mode
[ 1145.851161] #PF: error_code(0x0010) - not-present page
[ 1145.851772] PGD 80000002a9fa5067 P4D 80000002a9fa5067 PUD 2abcbb067
PMD 0 
[ 1145.852578] Oops: 0010 [#1] PREEMPT SMP PTI
[ 1145.853066] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.2.0-rc1+ #11
[ 1145.853809] Hardware name: Red Hat KVM, BIOS 1.11.0-2.el7 04/01/2014
[ 1145.854554] RIP: 0010:0x0
[ 1145.854879] Code: Bad RIP value.
[ 1145.855270] RSP: 0018:ffffc90001903e68 EFLAGS: 00010013
[ 1145.855902] RAX: 0000010ac9f60043 RBX: ffff8882b58a8320 RCX:
00000000c526b7c4              
[ 1145.856726] RDX: 0000000000000000 RSI: ffffffff820d9640 RDI:
ffff8882b58a8320              
[ 1145.857560] RBP: ffffffff820d9640 R08: 00000000c526b7c4 R09:
0000000000000832              
[ 1145.858390] R10: 0000000000000000 R11: 0000000000000000 R12:
0000000000000000              
[ 1145.859222] R13: ffffffff820d9658 R14: ffff8881063b2880 R15:
0000000000000002              
[ 1145.860047] FS:  0000000000000000(0000) GS:ffff8882b5880000(0000)
knlGS:0000000000000000   
[ 1145.860994] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033                              
[ 1145.861692] CR2: ffffffffffffffd6 CR3: 00000002ab1de001 CR4:
0000000000160ee0              
[ 1145.862570] Call Trace:                                                                    
[ 1145.862877]  cpuidle_enter_state+0x7c/0x3e0                                                
[ 1145.863392]  cpuidle_enter+0x29/0x40                                                       


> I think we can still be woken up from kvm_vcpu_block() if pir is set.

Exactly.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 2/5] KVM: LAPIC: inject lapic timer interrupt by posted interrupt
  2019-06-21 21:42           ` Marcelo Tosatti
@ 2019-06-24  8:53             ` Wanpeng Li
  2019-06-25 19:00               ` Marcelo Tosatti
  2019-06-25 17:02             ` Paolo Bonzini
  1 sibling, 1 reply; 23+ messages in thread
From: Wanpeng Li @ 2019-06-24  8:53 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Paolo Bonzini, LKML, kvm, Radim Krčmář

On Sat, 22 Jun 2019 at 06:11, Marcelo Tosatti <mtosatti@redhat.com> wrote:
>
> On Fri, Jun 21, 2019 at 09:42:39AM +0800, Wanpeng Li wrote:
> > On Thu, 20 Jun 2019 at 05:04, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> > >
> > > Hi Li,
> > >
> > > On Wed, Jun 19, 2019 at 08:36:06AM +0800, Wanpeng Li wrote:
> > > > On Tue, 18 Jun 2019 at 21:36, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> > > > >
> > > > > On Mon, Jun 17, 2019 at 07:24:44PM +0800, Wanpeng Li wrote:
> > > > > > From: Wanpeng Li <wanpengli@tencent.com>
> > > > > >
> > > > > > Dedicated instances are currently disturbed by unnecessary jitter due
> > > > > > to the emulated lapic timers fire on the same pCPUs which vCPUs resident.
> > > > > > There is no hardware virtual timer on Intel for guest like ARM. Both
> > > > > > programming timer in guest and the emulated timer fires incur vmexits.
> > > > > > This patch tries to avoid vmexit which is incurred by the emulated
> > > > > > timer fires in dedicated instance scenario.
> > > > > >
> > > > > > When nohz_full is enabled in dedicated instances scenario, the emulated
> > > > > > timers can be offload to the nearest busy housekeeping cpus since APICv
> > > > > > is really common in recent years. The guest timer interrupt is injected
> > > > > > by posted-interrupt which is delivered by housekeeping cpu once the emulated
> > > > > > timer fires.
> > > > > >
> > > > > > The host admin should fine tuned, e.g. dedicated instances scenario w/
> > > > > > nohz_full cover the pCPUs which vCPUs resident, several pCPUs surplus
> > > > > > for busy housekeeping, disable mwait/hlt/pause vmexits to keep in non-root
> > > > > > mode, ~3% redis performance benefit can be observed on Skylake server.
> > > > > >
> > > > > > w/o patch:
> > > > > >
> > > > > >             VM-EXIT  Samples  Samples%  Time%   Min Time  Max Time   Avg time
> > > > > >
> > > > > > EXTERNAL_INTERRUPT    42916    49.43%   39.30%   0.47us   106.09us   0.71us ( +-   1.09% )
> > > > > >
> > > > > > w/ patch:
> > > > > >
> > > > > >             VM-EXIT  Samples  Samples%  Time%   Min Time  Max Time         Avg time
> > > > > >
> > > > > > EXTERNAL_INTERRUPT    6871     9.29%     2.96%   0.44us    57.88us   0.72us ( +-   4.02% )
> > > > > >
> > > > > > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > > > > > Cc: Radim Krčmář <rkrcmar@redhat.com>
> > > > > > Cc: Marcelo Tosatti <mtosatti@redhat.com>
> > > > > > Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> > > > > > ---
> > > > > >  arch/x86/kvm/lapic.c            | 33 ++++++++++++++++++++++++++-------
> > > > > >  arch/x86/kvm/lapic.h            |  1 +
> > > > > >  arch/x86/kvm/vmx/vmx.c          |  3 ++-
> > > > > >  arch/x86/kvm/x86.c              |  5 +++++
> > > > > >  arch/x86/kvm/x86.h              |  2 ++
> > > > > >  include/linux/sched/isolation.h |  2 ++
> > > > > >  kernel/sched/isolation.c        |  6 ++++++
> > > > > >  7 files changed, 44 insertions(+), 8 deletions(-)
> > > > > >
> > > > > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > > > > > index 87ecb56..9ceeee5 100644
> > > > > > --- a/arch/x86/kvm/lapic.c
> > > > > > +++ b/arch/x86/kvm/lapic.c
> > > > > > @@ -122,6 +122,13 @@ static inline u32 kvm_x2apic_id(struct kvm_lapic *apic)
> > > > > >       return apic->vcpu->vcpu_id;
> > > > > >  }
> > > > > >
> > > > > > +bool posted_interrupt_inject_timer(struct kvm_vcpu *vcpu)
> > > > > > +{
> > > > > > +     return pi_inject_timer && kvm_vcpu_apicv_active(vcpu) &&
> > > > > > +             kvm_hlt_in_guest(vcpu->kvm);
> > > > > > +}
> > > > > > +EXPORT_SYMBOL_GPL(posted_interrupt_inject_timer);
> > > > >
> > > > > Paolo, can you explain the reasoning behind this?
> > > > >
> > > > > Should not be necessary...
> >
> > https://lkml.org/lkml/2019/6/5/436  "Here you need to check
> > kvm_halt_in_guest, not kvm_mwait_in_guest, because you need to go
> > through kvm_apic_expired if the guest needs to be woken up from
> > kvm_vcpu_block."
>
> Ah, i think he means that a sleeping vcpu (in kvm_vcpu_block) must
> be woken up, if it receives a timer interrupt.
>
> But your patch will go through:
>
> kvm_apic_inject_pending_timer_irqs
> __apic_accept_irq ->
> vmx_deliver_posted_interrupt ->
> kvm_vcpu_trigger_posted_interrupt returns false
> (because vcpu->mode != IN_GUEST_MODE) ->
> kvm_vcpu_kick
>
> Which will wakeup the vcpu.

Hi Marcelo,

>
> Apart from this oops, which triggers when running:
> taskset -c 1 ./cyclictest -D 3600 -p 99 -t 1 -h 30 -m -n  -i 50000 -b 40

I try both host and guest use latest kvm/queue  w/ CONFIG_PREEMPT
enabled, and expose mwait as your config, however, there is no oops.
Can you reproduce steadily or encounter casually? Can you reproduce
w/o the patchset?

>
> Timer interruption from housekeeping vcpus is normal to me
> (without requiring kvm_hlt_in_guest).
>
> [ 1145.849646] BUG: kernel NULL pointer dereference, address:
> 0000000000000000
> [ 1145.850481] #PF: supervisor instruction fetch in kernel mode
> [ 1145.851161] #PF: error_code(0x0010) - not-present page
> [ 1145.851772] PGD 80000002a9fa5067 P4D 80000002a9fa5067 PUD 2abcbb067
> PMD 0
> [ 1145.852578] Oops: 0010 [#1] PREEMPT SMP PTI
> [ 1145.853066] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.2.0-rc1+ #11
> [ 1145.853809] Hardware name: Red Hat KVM, BIOS 1.11.0-2.el7 04/01/2014
> [ 1145.854554] RIP: 0010:0x0
> [ 1145.854879] Code: Bad RIP value.
> [ 1145.855270] RSP: 0018:ffffc90001903e68 EFLAGS: 00010013
> [ 1145.855902] RAX: 0000010ac9f60043 RBX: ffff8882b58a8320 RCX:
> 00000000c526b7c4
> [ 1145.856726] RDX: 0000000000000000 RSI: ffffffff820d9640 RDI:
> ffff8882b58a8320
> [ 1145.857560] RBP: ffffffff820d9640 R08: 00000000c526b7c4 R09:
> 0000000000000832
> [ 1145.858390] R10: 0000000000000000 R11: 0000000000000000 R12:
> 0000000000000000
> [ 1145.859222] R13: ffffffff820d9658 R14: ffff8881063b2880 R15:
> 0000000000000002
> [ 1145.860047] FS:  0000000000000000(0000) GS:ffff8882b5880000(0000)
> knlGS:0000000000000000
> [ 1145.860994] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1145.861692] CR2: ffffffffffffffd6 CR3: 00000002ab1de001 CR4:
> 0000000000160ee0
> [ 1145.862570] Call Trace:
> [ 1145.862877]  cpuidle_enter_state+0x7c/0x3e0
> [ 1145.863392]  cpuidle_enter+0x29/0x40
>
>
> > I think we can still be woken up from kvm_vcpu_block() if pir is set.
>
> Exactly.
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 2/5] KVM: LAPIC: inject lapic timer interrupt by posted interrupt
  2019-06-21 21:42           ` Marcelo Tosatti
  2019-06-24  8:53             ` Wanpeng Li
@ 2019-06-25 17:02             ` Paolo Bonzini
  1 sibling, 0 replies; 23+ messages in thread
From: Paolo Bonzini @ 2019-06-25 17:02 UTC (permalink / raw)
  To: Marcelo Tosatti, Wanpeng Li; +Cc: LKML, kvm, Radim Krčmář

On 21/06/19 23:42, Marcelo Tosatti wrote:
> On Fri, Jun 21, 2019 at 09:42:39AM +0800, Wanpeng Li wrote:
>> On Thu, 20 Jun 2019 at 05:04, Marcelo Tosatti <mtosatti@redhat.com> wrote:
>>>
>>> Hi Li,
>>>
>>> On Wed, Jun 19, 2019 at 08:36:06AM +0800, Wanpeng Li wrote:
>>>> On Tue, 18 Jun 2019 at 21:36, Marcelo Tosatti <mtosatti@redhat.com> wrote:
>>>>>
>>>>> On Mon, Jun 17, 2019 at 07:24:44PM +0800, Wanpeng Li wrote:
>>>>>> From: Wanpeng Li <wanpengli@tencent.com>
>>>>>>
>>>>>> Dedicated instances are currently disturbed by unnecessary jitter due
>>>>>> to the emulated lapic timers fire on the same pCPUs which vCPUs resident.
>>>>>> There is no hardware virtual timer on Intel for guest like ARM. Both
>>>>>> programming timer in guest and the emulated timer fires incur vmexits.
>>>>>> This patch tries to avoid vmexit which is incurred by the emulated
>>>>>> timer fires in dedicated instance scenario.
>>>>>>
>>>>>> When nohz_full is enabled in dedicated instances scenario, the emulated
>>>>>> timers can be offload to the nearest busy housekeeping cpus since APICv
>>>>>> is really common in recent years. The guest timer interrupt is injected
>>>>>> by posted-interrupt which is delivered by housekeeping cpu once the emulated
>>>>>> timer fires.
>>>>>>
>>>>>> The host admin should fine tuned, e.g. dedicated instances scenario w/
>>>>>> nohz_full cover the pCPUs which vCPUs resident, several pCPUs surplus
>>>>>> for busy housekeeping, disable mwait/hlt/pause vmexits to keep in non-root
>>>>>> mode, ~3% redis performance benefit can be observed on Skylake server.
>>>>>>
>>>>>> w/o patch:
>>>>>>
>>>>>>             VM-EXIT  Samples  Samples%  Time%   Min Time  Max Time   Avg time
>>>>>>
>>>>>> EXTERNAL_INTERRUPT    42916    49.43%   39.30%   0.47us   106.09us   0.71us ( +-   1.09% )
>>>>>>
>>>>>> w/ patch:
>>>>>>
>>>>>>             VM-EXIT  Samples  Samples%  Time%   Min Time  Max Time         Avg time
>>>>>>
>>>>>> EXTERNAL_INTERRUPT    6871     9.29%     2.96%   0.44us    57.88us   0.72us ( +-   4.02% )
>>>>>>
>>>>>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>>>>>> Cc: Radim Krčmář <rkrcmar@redhat.com>
>>>>>> Cc: Marcelo Tosatti <mtosatti@redhat.com>
>>>>>> Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
>>>>>> ---
>>>>>>  arch/x86/kvm/lapic.c            | 33 ++++++++++++++++++++++++++-------
>>>>>>  arch/x86/kvm/lapic.h            |  1 +
>>>>>>  arch/x86/kvm/vmx/vmx.c          |  3 ++-
>>>>>>  arch/x86/kvm/x86.c              |  5 +++++
>>>>>>  arch/x86/kvm/x86.h              |  2 ++
>>>>>>  include/linux/sched/isolation.h |  2 ++
>>>>>>  kernel/sched/isolation.c        |  6 ++++++
>>>>>>  7 files changed, 44 insertions(+), 8 deletions(-)
>>>>>>
>>>>>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>>>>>> index 87ecb56..9ceeee5 100644
>>>>>> --- a/arch/x86/kvm/lapic.c
>>>>>> +++ b/arch/x86/kvm/lapic.c
>>>>>> @@ -122,6 +122,13 @@ static inline u32 kvm_x2apic_id(struct kvm_lapic *apic)
>>>>>>       return apic->vcpu->vcpu_id;
>>>>>>  }
>>>>>>
>>>>>> +bool posted_interrupt_inject_timer(struct kvm_vcpu *vcpu)
>>>>>> +{
>>>>>> +     return pi_inject_timer && kvm_vcpu_apicv_active(vcpu) &&
>>>>>> +             kvm_hlt_in_guest(vcpu->kvm);
>>>>>> +}
>>>>>> +EXPORT_SYMBOL_GPL(posted_interrupt_inject_timer);
>>>>>
>>>>> Paolo, can you explain the reasoning behind this?
>>>>>
>>>>> Should not be necessary...
>>
>> https://lkml.org/lkml/2019/6/5/436  "Here you need to check
>> kvm_halt_in_guest, not kvm_mwait_in_guest, because you need to go
>> through kvm_apic_expired if the guest needs to be woken up from
>> kvm_vcpu_block."
> 
> Ah, i think he means that a sleeping vcpu (in kvm_vcpu_block) must
> be woken up, if it receives a timer interrupt.

Yes, this is true.

Paolo

> But your patch will go through:
> 
> kvm_apic_inject_pending_timer_irqs
> __apic_accept_irq -> 
> vmx_deliver_posted_interrupt ->
> kvm_vcpu_trigger_posted_interrupt returns false
> (because vcpu->mode != IN_GUEST_MODE) ->
> kvm_vcpu_kick
> 
> Which will wakeup the vcpu.
> 
> Apart from this oops, which triggers when running:
> taskset -c 1 ./cyclictest -D 3600 -p 99 -t 1 -h 30 -m -n  -i 50000 -b 40
> 
> Timer interruption from housekeeping vcpus is normal to me 
> (without requiring kvm_hlt_in_guest).
> 
> [ 1145.849646] BUG: kernel NULL pointer dereference, address:
> 0000000000000000
> [ 1145.850481] #PF: supervisor instruction fetch in kernel mode
> [ 1145.851161] #PF: error_code(0x0010) - not-present page
> [ 1145.851772] PGD 80000002a9fa5067 P4D 80000002a9fa5067 PUD 2abcbb067
> PMD 0 
> [ 1145.852578] Oops: 0010 [#1] PREEMPT SMP PTI
> [ 1145.853066] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.2.0-rc1+ #11
> [ 1145.853809] Hardware name: Red Hat KVM, BIOS 1.11.0-2.el7 04/01/2014
> [ 1145.854554] RIP: 0010:0x0
> [ 1145.854879] Code: Bad RIP value.
> [ 1145.855270] RSP: 0018:ffffc90001903e68 EFLAGS: 00010013
> [ 1145.855902] RAX: 0000010ac9f60043 RBX: ffff8882b58a8320 RCX:
> 00000000c526b7c4              
> [ 1145.856726] RDX: 0000000000000000 RSI: ffffffff820d9640 RDI:
> ffff8882b58a8320              
> [ 1145.857560] RBP: ffffffff820d9640 R08: 00000000c526b7c4 R09:
> 0000000000000832              
> [ 1145.858390] R10: 0000000000000000 R11: 0000000000000000 R12:
> 0000000000000000              
> [ 1145.859222] R13: ffffffff820d9658 R14: ffff8881063b2880 R15:
> 0000000000000002              
> [ 1145.860047] FS:  0000000000000000(0000) GS:ffff8882b5880000(0000)
> knlGS:0000000000000000   
> [ 1145.860994] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033                              
> [ 1145.861692] CR2: ffffffffffffffd6 CR3: 00000002ab1de001 CR4:
> 0000000000160ee0              
> [ 1145.862570] Call Trace:                                                                    
> [ 1145.862877]  cpuidle_enter_state+0x7c/0x3e0                                                
> [ 1145.863392]  cpuidle_enter+0x29/0x40                                                       
> 
> 
>> I think we can still be woken up from kvm_vcpu_block() if pir is set.
> 
> Exactly.
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 2/5] KVM: LAPIC: inject lapic timer interrupt by posted interrupt
  2019-06-24  8:53             ` Wanpeng Li
@ 2019-06-25 19:00               ` Marcelo Tosatti
  2019-06-26 11:02                 ` Wanpeng Li
  0 siblings, 1 reply; 23+ messages in thread
From: Marcelo Tosatti @ 2019-06-25 19:00 UTC (permalink / raw)
  To: Wanpeng Li; +Cc: Paolo Bonzini, LKML, kvm, Radim Krčmář

On Mon, Jun 24, 2019 at 04:53:53PM +0800, Wanpeng Li wrote:
> On Sat, 22 Jun 2019 at 06:11, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> >
> > On Fri, Jun 21, 2019 at 09:42:39AM +0800, Wanpeng Li wrote:
> > > On Thu, 20 Jun 2019 at 05:04, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> > > >
> > > > Hi Li,
> > > >
> > > > On Wed, Jun 19, 2019 at 08:36:06AM +0800, Wanpeng Li wrote:
> > > > > On Tue, 18 Jun 2019 at 21:36, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> > > > > >
> > > > > > On Mon, Jun 17, 2019 at 07:24:44PM +0800, Wanpeng Li wrote:
> > > > > > > From: Wanpeng Li <wanpengli@tencent.com>
> > > > > > >
> > > > > > > Dedicated instances are currently disturbed by unnecessary jitter due
> > > > > > > to the emulated lapic timers fire on the same pCPUs which vCPUs resident.
> > > > > > > There is no hardware virtual timer on Intel for guest like ARM. Both
> > > > > > > programming timer in guest and the emulated timer fires incur vmexits.
> > > > > > > This patch tries to avoid vmexit which is incurred by the emulated
> > > > > > > timer fires in dedicated instance scenario.
> > > > > > >
> > > > > > > When nohz_full is enabled in dedicated instances scenario, the emulated
> > > > > > > timers can be offload to the nearest busy housekeeping cpus since APICv
> > > > > > > is really common in recent years. The guest timer interrupt is injected
> > > > > > > by posted-interrupt which is delivered by housekeeping cpu once the emulated
> > > > > > > timer fires.
> > > > > > >
> > > > > > > The host admin should fine tuned, e.g. dedicated instances scenario w/
> > > > > > > nohz_full cover the pCPUs which vCPUs resident, several pCPUs surplus
> > > > > > > for busy housekeeping, disable mwait/hlt/pause vmexits to keep in non-root
> > > > > > > mode, ~3% redis performance benefit can be observed on Skylake server.
> > > > > > >
> > > > > > > w/o patch:
> > > > > > >
> > > > > > >             VM-EXIT  Samples  Samples%  Time%   Min Time  Max Time   Avg time
> > > > > > >
> > > > > > > EXTERNAL_INTERRUPT    42916    49.43%   39.30%   0.47us   106.09us   0.71us ( +-   1.09% )
> > > > > > >
> > > > > > > w/ patch:
> > > > > > >
> > > > > > >             VM-EXIT  Samples  Samples%  Time%   Min Time  Max Time         Avg time
> > > > > > >
> > > > > > > EXTERNAL_INTERRUPT    6871     9.29%     2.96%   0.44us    57.88us   0.72us ( +-   4.02% )
> > > > > > >
> > > > > > > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > > > > > > Cc: Radim Krčmář <rkrcmar@redhat.com>
> > > > > > > Cc: Marcelo Tosatti <mtosatti@redhat.com>
> > > > > > > Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> > > > > > > ---
> > > > > > >  arch/x86/kvm/lapic.c            | 33 ++++++++++++++++++++++++++-------
> > > > > > >  arch/x86/kvm/lapic.h            |  1 +
> > > > > > >  arch/x86/kvm/vmx/vmx.c          |  3 ++-
> > > > > > >  arch/x86/kvm/x86.c              |  5 +++++
> > > > > > >  arch/x86/kvm/x86.h              |  2 ++
> > > > > > >  include/linux/sched/isolation.h |  2 ++
> > > > > > >  kernel/sched/isolation.c        |  6 ++++++
> > > > > > >  7 files changed, 44 insertions(+), 8 deletions(-)
> > > > > > >
> > > > > > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > > > > > > index 87ecb56..9ceeee5 100644
> > > > > > > --- a/arch/x86/kvm/lapic.c
> > > > > > > +++ b/arch/x86/kvm/lapic.c
> > > > > > > @@ -122,6 +122,13 @@ static inline u32 kvm_x2apic_id(struct kvm_lapic *apic)
> > > > > > >       return apic->vcpu->vcpu_id;
> > > > > > >  }
> > > > > > >
> > > > > > > +bool posted_interrupt_inject_timer(struct kvm_vcpu *vcpu)
> > > > > > > +{
> > > > > > > +     return pi_inject_timer && kvm_vcpu_apicv_active(vcpu) &&
> > > > > > > +             kvm_hlt_in_guest(vcpu->kvm);
> > > > > > > +}
> > > > > > > +EXPORT_SYMBOL_GPL(posted_interrupt_inject_timer);
> > > > > >
> > > > > > Paolo, can you explain the reasoning behind this?
> > > > > >
> > > > > > Should not be necessary...
> > >
> > > https://lkml.org/lkml/2019/6/5/436  "Here you need to check
> > > kvm_halt_in_guest, not kvm_mwait_in_guest, because you need to go
> > > through kvm_apic_expired if the guest needs to be woken up from
> > > kvm_vcpu_block."
> >
> > Ah, i think he means that a sleeping vcpu (in kvm_vcpu_block) must
> > be woken up, if it receives a timer interrupt.
> >
> > But your patch will go through:
> >
> > kvm_apic_inject_pending_timer_irqs
> > __apic_accept_irq ->
> > vmx_deliver_posted_interrupt ->
> > kvm_vcpu_trigger_posted_interrupt returns false
> > (because vcpu->mode != IN_GUEST_MODE) ->
> > kvm_vcpu_kick
> >
> > Which will wakeup the vcpu.
> 
> Hi Marcelo,
> 
> >
> > Apart from this oops, which triggers when running:
> > taskset -c 1 ./cyclictest -D 3600 -p 99 -t 1 -h 30 -m -n  -i 50000 -b 40
> 
> I try both host and guest use latest kvm/queue  w/ CONFIG_PREEMPT
> enabled, and expose mwait as your config, however, there is no oops.
> Can you reproduce steadily or encounter casually? Can you reproduce
> w/o the patchset?

Hi Li,

Steadily.

Do you have this as well:

Index: kvm/arch/x86/kvm/lapic.c
===================================================================
--- kvm.orig/arch/x86/kvm/lapic.c
+++ kvm/arch/x86/kvm/lapic.c
@@ -129,8 +129,7 @@ static inline u32 kvm_x2apic_id(struct k

 bool posted_interrupt_inject_timer(struct kvm_vcpu *vcpu)
 {
-       return pi_inject_timer && kvm_vcpu_apicv_active(vcpu) &&
-               kvm_hlt_in_guest(vcpu->kvm);
+       return pi_inject_timer && kvm_vcpu_apicv_active(vcpu);
 }
 EXPORT_SYMBOL_GPL(posted_interrupt_inject_timer);

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 2/5] KVM: LAPIC: inject lapic timer interrupt by posted interrupt
  2019-06-25 19:00               ` Marcelo Tosatti
@ 2019-06-26 11:02                 ` Wanpeng Li
  2019-06-26 16:44                   ` Marcelo Tosatti
  0 siblings, 1 reply; 23+ messages in thread
From: Wanpeng Li @ 2019-06-26 11:02 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Paolo Bonzini, LKML, kvm, Radim Krčmář

On Wed, 26 Jun 2019 at 03:03, Marcelo Tosatti <mtosatti@redhat.com> wrote:
>
> On Mon, Jun 24, 2019 at 04:53:53PM +0800, Wanpeng Li wrote:
> > On Sat, 22 Jun 2019 at 06:11, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> > >
> > > On Fri, Jun 21, 2019 at 09:42:39AM +0800, Wanpeng Li wrote:
> > > > On Thu, 20 Jun 2019 at 05:04, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> > > > >
> > > > > Hi Li,
> > > > >
> > > > > On Wed, Jun 19, 2019 at 08:36:06AM +0800, Wanpeng Li wrote:
> > > > > > On Tue, 18 Jun 2019 at 21:36, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> > > > > > >
> > > > > > > On Mon, Jun 17, 2019 at 07:24:44PM +0800, Wanpeng Li wrote:
> > > > > > > > From: Wanpeng Li <wanpengli@tencent.com>
> > > > > > > >
> > > > > > > > Dedicated instances are currently disturbed by unnecessary jitter due
> > > > > > > > to the emulated lapic timers fire on the same pCPUs which vCPUs resident.
> > > > > > > > There is no hardware virtual timer on Intel for guest like ARM. Both
> > > > > > > > programming timer in guest and the emulated timer fires incur vmexits.
> > > > > > > > This patch tries to avoid vmexit which is incurred by the emulated
> > > > > > > > timer fires in dedicated instance scenario.
> > > > > > > >
> > > > > > > > When nohz_full is enabled in dedicated instances scenario, the emulated
> > > > > > > > timers can be offload to the nearest busy housekeeping cpus since APICv
> > > > > > > > is really common in recent years. The guest timer interrupt is injected
> > > > > > > > by posted-interrupt which is delivered by housekeeping cpu once the emulated
> > > > > > > > timer fires.
> > > > > > > >
> > > > > > > > The host admin should fine tuned, e.g. dedicated instances scenario w/
> > > > > > > > nohz_full cover the pCPUs which vCPUs resident, several pCPUs surplus
> > > > > > > > for busy housekeeping, disable mwait/hlt/pause vmexits to keep in non-root
> > > > > > > > mode, ~3% redis performance benefit can be observed on Skylake server.
> > > > > > > >
> > > > > > > > w/o patch:
> > > > > > > >
> > > > > > > >             VM-EXIT  Samples  Samples%  Time%   Min Time  Max Time   Avg time
> > > > > > > >
> > > > > > > > EXTERNAL_INTERRUPT    42916    49.43%   39.30%   0.47us   106.09us   0.71us ( +-   1.09% )
> > > > > > > >
> > > > > > > > w/ patch:
> > > > > > > >
> > > > > > > >             VM-EXIT  Samples  Samples%  Time%   Min Time  Max Time         Avg time
> > > > > > > >
> > > > > > > > EXTERNAL_INTERRUPT    6871     9.29%     2.96%   0.44us    57.88us   0.72us ( +-   4.02% )
> > > > > > > >
> > > > > > > > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > > > > > > > Cc: Radim Krčmář <rkrcmar@redhat.com>
> > > > > > > > Cc: Marcelo Tosatti <mtosatti@redhat.com>
> > > > > > > > Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> > > > > > > > ---
> > > > > > > >  arch/x86/kvm/lapic.c            | 33 ++++++++++++++++++++++++++-------
> > > > > > > >  arch/x86/kvm/lapic.h            |  1 +
> > > > > > > >  arch/x86/kvm/vmx/vmx.c          |  3 ++-
> > > > > > > >  arch/x86/kvm/x86.c              |  5 +++++
> > > > > > > >  arch/x86/kvm/x86.h              |  2 ++
> > > > > > > >  include/linux/sched/isolation.h |  2 ++
> > > > > > > >  kernel/sched/isolation.c        |  6 ++++++
> > > > > > > >  7 files changed, 44 insertions(+), 8 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > > > > > > > index 87ecb56..9ceeee5 100644
> > > > > > > > --- a/arch/x86/kvm/lapic.c
> > > > > > > > +++ b/arch/x86/kvm/lapic.c
> > > > > > > > @@ -122,6 +122,13 @@ static inline u32 kvm_x2apic_id(struct kvm_lapic *apic)
> > > > > > > >       return apic->vcpu->vcpu_id;
> > > > > > > >  }
> > > > > > > >
> > > > > > > > +bool posted_interrupt_inject_timer(struct kvm_vcpu *vcpu)
> > > > > > > > +{
> > > > > > > > +     return pi_inject_timer && kvm_vcpu_apicv_active(vcpu) &&
> > > > > > > > +             kvm_hlt_in_guest(vcpu->kvm);
> > > > > > > > +}
> > > > > > > > +EXPORT_SYMBOL_GPL(posted_interrupt_inject_timer);
> > > > > > >
> > > > > > > Paolo, can you explain the reasoning behind this?
> > > > > > >
> > > > > > > Should not be necessary...
> > > >
> > > > https://lkml.org/lkml/2019/6/5/436  "Here you need to check
> > > > kvm_halt_in_guest, not kvm_mwait_in_guest, because you need to go
> > > > through kvm_apic_expired if the guest needs to be woken up from
> > > > kvm_vcpu_block."
> > >
> > > Ah, i think he means that a sleeping vcpu (in kvm_vcpu_block) must
> > > be woken up, if it receives a timer interrupt.
> > >
> > > But your patch will go through:
> > >
> > > kvm_apic_inject_pending_timer_irqs
> > > __apic_accept_irq ->
> > > vmx_deliver_posted_interrupt ->
> > > kvm_vcpu_trigger_posted_interrupt returns false
> > > (because vcpu->mode != IN_GUEST_MODE) ->
> > > kvm_vcpu_kick
> > >
> > > Which will wakeup the vcpu.
> >
> > Hi Marcelo,
> >
> > >
> > > Apart from this oops, which triggers when running:
> > > taskset -c 1 ./cyclictest -D 3600 -p 99 -t 1 -h 30 -m -n  -i 50000 -b 40
> >
> > I try both host and guest use latest kvm/queue  w/ CONFIG_PREEMPT
> > enabled, and expose mwait as your config, however, there is no oops.
> > Can you reproduce steadily or encounter casually? Can you reproduce
> > w/o the patchset?
>
> Hi Li,

Hi Marcelo,

>
> Steadily.
>
> Do you have this as well:

w/ or w/o below diff, testing on both SKX and HSW servers on hand, I
didn't see any oops. Could you observe the oops disappear when w/o
below diff? If the answer is yes, then the oops will not block to
merge the patchset since Paolo prefers to add the kvm_hlt_in_guest()
condition to guarantee be woken up from kvm_vcpu_block(). For the
exitless injection if the guest is running(DPDK style workloads that
busy-spin on network card) scenarios, we can find a solution later.

Regards,
Wanpeng Li

>
> Index: kvm/arch/x86/kvm/lapic.c
> ===================================================================
> --- kvm.orig/arch/x86/kvm/lapic.c
> +++ kvm/arch/x86/kvm/lapic.c
> @@ -129,8 +129,7 @@ static inline u32 kvm_x2apic_id(struct k
>
>  bool posted_interrupt_inject_timer(struct kvm_vcpu *vcpu)
>  {
> -       return pi_inject_timer && kvm_vcpu_apicv_active(vcpu) &&
> -               kvm_hlt_in_guest(vcpu->kvm);
> +       return pi_inject_timer && kvm_vcpu_apicv_active(vcpu);
>  }
>  EXPORT_SYMBOL_GPL(posted_interrupt_inject_timer);

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 2/5] KVM: LAPIC: inject lapic timer interrupt by posted interrupt
  2019-06-26 11:02                 ` Wanpeng Li
@ 2019-06-26 16:44                   ` Marcelo Tosatti
  2019-06-28  8:26                     ` Wanpeng Li
  0 siblings, 1 reply; 23+ messages in thread
From: Marcelo Tosatti @ 2019-06-26 16:44 UTC (permalink / raw)
  To: Wanpeng Li; +Cc: Paolo Bonzini, LKML, kvm, Radim Krčmář

On Wed, Jun 26, 2019 at 07:02:13PM +0800, Wanpeng Li wrote:
> On Wed, 26 Jun 2019 at 03:03, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> >
> > On Mon, Jun 24, 2019 at 04:53:53PM +0800, Wanpeng Li wrote:
> > > On Sat, 22 Jun 2019 at 06:11, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> > > >
> > > > On Fri, Jun 21, 2019 at 09:42:39AM +0800, Wanpeng Li wrote:
> > > > > On Thu, 20 Jun 2019 at 05:04, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> > > > > >
> > > > > > Hi Li,
> > > > > >
> > > > > > On Wed, Jun 19, 2019 at 08:36:06AM +0800, Wanpeng Li wrote:
> > > > > > > On Tue, 18 Jun 2019 at 21:36, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> > > > > > > >
> > > > > > > > On Mon, Jun 17, 2019 at 07:24:44PM +0800, Wanpeng Li wrote:
> > > > > > > > > From: Wanpeng Li <wanpengli@tencent.com>
> > > > > > > > >
> > > > > > > > > Dedicated instances are currently disturbed by unnecessary jitter due
> > > > > > > > > to the emulated lapic timers fire on the same pCPUs which vCPUs resident.
> > > > > > > > > There is no hardware virtual timer on Intel for guest like ARM. Both
> > > > > > > > > programming timer in guest and the emulated timer fires incur vmexits.
> > > > > > > > > This patch tries to avoid vmexit which is incurred by the emulated
> > > > > > > > > timer fires in dedicated instance scenario.
> > > > > > > > >
> > > > > > > > > When nohz_full is enabled in dedicated instances scenario, the emulated
> > > > > > > > > timers can be offload to the nearest busy housekeeping cpus since APICv
> > > > > > > > > is really common in recent years. The guest timer interrupt is injected
> > > > > > > > > by posted-interrupt which is delivered by housekeeping cpu once the emulated
> > > > > > > > > timer fires.
> > > > > > > > >
> > > > > > > > > The host admin should fine tuned, e.g. dedicated instances scenario w/
> > > > > > > > > nohz_full cover the pCPUs which vCPUs resident, several pCPUs surplus
> > > > > > > > > for busy housekeeping, disable mwait/hlt/pause vmexits to keep in non-root
> > > > > > > > > mode, ~3% redis performance benefit can be observed on Skylake server.
> > > > > > > > >
> > > > > > > > > w/o patch:
> > > > > > > > >
> > > > > > > > >             VM-EXIT  Samples  Samples%  Time%   Min Time  Max Time   Avg time
> > > > > > > > >
> > > > > > > > > EXTERNAL_INTERRUPT    42916    49.43%   39.30%   0.47us   106.09us   0.71us ( +-   1.09% )
> > > > > > > > >
> > > > > > > > > w/ patch:
> > > > > > > > >
> > > > > > > > >             VM-EXIT  Samples  Samples%  Time%   Min Time  Max Time         Avg time
> > > > > > > > >
> > > > > > > > > EXTERNAL_INTERRUPT    6871     9.29%     2.96%   0.44us    57.88us   0.72us ( +-   4.02% )
> > > > > > > > >
> > > > > > > > > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > > > > > > > > Cc: Radim Krčmář <rkrcmar@redhat.com>
> > > > > > > > > Cc: Marcelo Tosatti <mtosatti@redhat.com>
> > > > > > > > > Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> > > > > > > > > ---
> > > > > > > > >  arch/x86/kvm/lapic.c            | 33 ++++++++++++++++++++++++++-------
> > > > > > > > >  arch/x86/kvm/lapic.h            |  1 +
> > > > > > > > >  arch/x86/kvm/vmx/vmx.c          |  3 ++-
> > > > > > > > >  arch/x86/kvm/x86.c              |  5 +++++
> > > > > > > > >  arch/x86/kvm/x86.h              |  2 ++
> > > > > > > > >  include/linux/sched/isolation.h |  2 ++
> > > > > > > > >  kernel/sched/isolation.c        |  6 ++++++
> > > > > > > > >  7 files changed, 44 insertions(+), 8 deletions(-)
> > > > > > > > >
> > > > > > > > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > > > > > > > > index 87ecb56..9ceeee5 100644
> > > > > > > > > --- a/arch/x86/kvm/lapic.c
> > > > > > > > > +++ b/arch/x86/kvm/lapic.c
> > > > > > > > > @@ -122,6 +122,13 @@ static inline u32 kvm_x2apic_id(struct kvm_lapic *apic)
> > > > > > > > >       return apic->vcpu->vcpu_id;
> > > > > > > > >  }
> > > > > > > > >
> > > > > > > > > +bool posted_interrupt_inject_timer(struct kvm_vcpu *vcpu)
> > > > > > > > > +{
> > > > > > > > > +     return pi_inject_timer && kvm_vcpu_apicv_active(vcpu) &&
> > > > > > > > > +             kvm_hlt_in_guest(vcpu->kvm);
> > > > > > > > > +}
> > > > > > > > > +EXPORT_SYMBOL_GPL(posted_interrupt_inject_timer);
> > > > > > > >
> > > > > > > > Paolo, can you explain the reasoning behind this?
> > > > > > > >
> > > > > > > > Should not be necessary...
> > > > >
> > > > > https://lkml.org/lkml/2019/6/5/436  "Here you need to check
> > > > > kvm_halt_in_guest, not kvm_mwait_in_guest, because you need to go
> > > > > through kvm_apic_expired if the guest needs to be woken up from
> > > > > kvm_vcpu_block."
> > > >
> > > > Ah, i think he means that a sleeping vcpu (in kvm_vcpu_block) must
> > > > be woken up, if it receives a timer interrupt.
> > > >
> > > > But your patch will go through:
> > > >
> > > > kvm_apic_inject_pending_timer_irqs
> > > > __apic_accept_irq ->
> > > > vmx_deliver_posted_interrupt ->
> > > > kvm_vcpu_trigger_posted_interrupt returns false
> > > > (because vcpu->mode != IN_GUEST_MODE) ->
> > > > kvm_vcpu_kick
> > > >
> > > > Which will wakeup the vcpu.
> > >
> > > Hi Marcelo,
> > >
> > > >
> > > > Apart from this oops, which triggers when running:
> > > > taskset -c 1 ./cyclictest -D 3600 -p 99 -t 1 -h 30 -m -n  -i 50000 -b 40
> > >
> > > I try both host and guest use latest kvm/queue  w/ CONFIG_PREEMPT
> > > enabled, and expose mwait as your config, however, there is no oops.
> > > Can you reproduce steadily or encounter casually? Can you reproduce
> > > w/o the patchset?
> >
> > Hi Li,
> 
> Hi Marcelo,
> 
> >
> > Steadily.
> >
> > Do you have this as well:
> 
> w/ or w/o below diff, testing on both SKX and HSW servers on hand, I
> didn't see any oops. Could you observe the oops disappear when w/o
> below diff? If the answer is yes, then the oops will not block to
> merge the patchset since Paolo prefers to add the kvm_hlt_in_guest()
> condition to guarantee be woken up from kvm_vcpu_block(). 

He agreed that its not necessary. Removing the HLT in guest widens 
the scope of the patch greatly.

> For the
> exitless injection if the guest is running(DPDK style workloads that
> busy-spin on network card) scenarios, we can find a solution later.

What is the use-case for HLT in guest again?

I'll find the source for the oops (or confirm can't reproduce with 
kvm/queue RSN).


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 2/5] KVM: LAPIC: inject lapic timer interrupt by posted interrupt
  2019-06-26 16:44                   ` Marcelo Tosatti
@ 2019-06-28  8:26                     ` Wanpeng Li
  0 siblings, 0 replies; 23+ messages in thread
From: Wanpeng Li @ 2019-06-28  8:26 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Paolo Bonzini, LKML, kvm, Radim Krčmář

On Thu, 27 Jun 2019 at 00:44, Marcelo Tosatti <mtosatti@redhat.com> wrote:
>
> On Wed, Jun 26, 2019 at 07:02:13PM +0800, Wanpeng Li wrote:
> > On Wed, 26 Jun 2019 at 03:03, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> > >
> > > On Mon, Jun 24, 2019 at 04:53:53PM +0800, Wanpeng Li wrote:
> > > > On Sat, 22 Jun 2019 at 06:11, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> > > > >
> > > > > On Fri, Jun 21, 2019 at 09:42:39AM +0800, Wanpeng Li wrote:
> > > > > > On Thu, 20 Jun 2019 at 05:04, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> > > > > > >
> > > > > > > Hi Li,
> > > > > > >
> > > > > > > On Wed, Jun 19, 2019 at 08:36:06AM +0800, Wanpeng Li wrote:
> > > > > > > > On Tue, 18 Jun 2019 at 21:36, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> > > > > > > > >
> > > > > > > > > On Mon, Jun 17, 2019 at 07:24:44PM +0800, Wanpeng Li wrote:
> > > > > > > > > > From: Wanpeng Li <wanpengli@tencent.com>
> > > > > > > > > >
> > > > > > > > > > Dedicated instances are currently disturbed by unnecessary jitter due
> > > > > > > > > > to the emulated lapic timers fire on the same pCPUs which vCPUs resident.
> > > > > > > > > > There is no hardware virtual timer on Intel for guest like ARM. Both
> > > > > > > > > > programming timer in guest and the emulated timer fires incur vmexits.
> > > > > > > > > > This patch tries to avoid vmexit which is incurred by the emulated
> > > > > > > > > > timer fires in dedicated instance scenario.
> > > > > > > > > >
> > > > > > > > > > When nohz_full is enabled in dedicated instances scenario, the emulated
> > > > > > > > > > timers can be offload to the nearest busy housekeeping cpus since APICv
> > > > > > > > > > is really common in recent years. The guest timer interrupt is injected
> > > > > > > > > > by posted-interrupt which is delivered by housekeeping cpu once the emulated
> > > > > > > > > > timer fires.
> > > > > > > > > >
> > > > > > > > > > The host admin should fine tuned, e.g. dedicated instances scenario w/
> > > > > > > > > > nohz_full cover the pCPUs which vCPUs resident, several pCPUs surplus
> > > > > > > > > > for busy housekeeping, disable mwait/hlt/pause vmexits to keep in non-root
> > > > > > > > > > mode, ~3% redis performance benefit can be observed on Skylake server.
> > > > > > > > > >
> > > > > > > > > > w/o patch:
> > > > > > > > > >
> > > > > > > > > >             VM-EXIT  Samples  Samples%  Time%   Min Time  Max Time   Avg time
> > > > > > > > > >
> > > > > > > > > > EXTERNAL_INTERRUPT    42916    49.43%   39.30%   0.47us   106.09us   0.71us ( +-   1.09% )
> > > > > > > > > >
> > > > > > > > > > w/ patch:
> > > > > > > > > >
> > > > > > > > > >             VM-EXIT  Samples  Samples%  Time%   Min Time  Max Time         Avg time
> > > > > > > > > >
> > > > > > > > > > EXTERNAL_INTERRUPT    6871     9.29%     2.96%   0.44us    57.88us   0.72us ( +-   4.02% )
> > > > > > > > > >
> > > > > > > > > > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > > > > > > > > > Cc: Radim Krčmář <rkrcmar@redhat.com>
> > > > > > > > > > Cc: Marcelo Tosatti <mtosatti@redhat.com>
> > > > > > > > > > Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> > > > > > > > > > ---
> > > > > > > > > >  arch/x86/kvm/lapic.c            | 33 ++++++++++++++++++++++++++-------
> > > > > > > > > >  arch/x86/kvm/lapic.h            |  1 +
> > > > > > > > > >  arch/x86/kvm/vmx/vmx.c          |  3 ++-
> > > > > > > > > >  arch/x86/kvm/x86.c              |  5 +++++
> > > > > > > > > >  arch/x86/kvm/x86.h              |  2 ++
> > > > > > > > > >  include/linux/sched/isolation.h |  2 ++
> > > > > > > > > >  kernel/sched/isolation.c        |  6 ++++++
> > > > > > > > > >  7 files changed, 44 insertions(+), 8 deletions(-)
> > > > > > > > > >
> > > > > > > > > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > > > > > > > > > index 87ecb56..9ceeee5 100644
> > > > > > > > > > --- a/arch/x86/kvm/lapic.c
> > > > > > > > > > +++ b/arch/x86/kvm/lapic.c
> > > > > > > > > > @@ -122,6 +122,13 @@ static inline u32 kvm_x2apic_id(struct kvm_lapic *apic)
> > > > > > > > > >       return apic->vcpu->vcpu_id;
> > > > > > > > > >  }
> > > > > > > > > >
> > > > > > > > > > +bool posted_interrupt_inject_timer(struct kvm_vcpu *vcpu)
> > > > > > > > > > +{
> > > > > > > > > > +     return pi_inject_timer && kvm_vcpu_apicv_active(vcpu) &&
> > > > > > > > > > +             kvm_hlt_in_guest(vcpu->kvm);
> > > > > > > > > > +}
> > > > > > > > > > +EXPORT_SYMBOL_GPL(posted_interrupt_inject_timer);
> > > > > > > > >
> > > > > > > > > Paolo, can you explain the reasoning behind this?
> > > > > > > > >
> > > > > > > > > Should not be necessary...
> > > > > >
> > > > > > https://lkml.org/lkml/2019/6/5/436  "Here you need to check
> > > > > > kvm_halt_in_guest, not kvm_mwait_in_guest, because you need to go
> > > > > > through kvm_apic_expired if the guest needs to be woken up from
> > > > > > kvm_vcpu_block."
> > > > >
> > > > > Ah, i think he means that a sleeping vcpu (in kvm_vcpu_block) must
> > > > > be woken up, if it receives a timer interrupt.
> > > > >
> > > > > But your patch will go through:
> > > > >
> > > > > kvm_apic_inject_pending_timer_irqs
> > > > > __apic_accept_irq ->
> > > > > vmx_deliver_posted_interrupt ->
> > > > > kvm_vcpu_trigger_posted_interrupt returns false
> > > > > (because vcpu->mode != IN_GUEST_MODE) ->
> > > > > kvm_vcpu_kick
> > > > >
> > > > > Which will wakeup the vcpu.
> > > >
> > > > Hi Marcelo,
> > > >
> > > > >
> > > > > Apart from this oops, which triggers when running:
> > > > > taskset -c 1 ./cyclictest -D 3600 -p 99 -t 1 -h 30 -m -n  -i 50000 -b 40
> > > >
> > > > I try both host and guest use latest kvm/queue  w/ CONFIG_PREEMPT
> > > > enabled, and expose mwait as your config, however, there is no oops.
> > > > Can you reproduce steadily or encounter casually? Can you reproduce
> > > > w/o the patchset?
> > >
> > > Hi Li,
> >
> > Hi Marcelo,
> >
> > >
> > > Steadily.
> > >
> > > Do you have this as well:
> >
> > w/ or w/o below diff, testing on both SKX and HSW servers on hand, I
> > didn't see any oops. Could you observe the oops disappear when w/o
> > below diff? If the answer is yes, then the oops will not block to
> > merge the patchset since Paolo prefers to add the kvm_hlt_in_guest()
> > condition to guarantee be woken up from kvm_vcpu_block().
>
> He agreed that its not necessary. Removing the HLT in guest widens
> the scope of the patch greatly.
>
> > For the
> > exitless injection if the guest is running(DPDK style workloads that
> > busy-spin on network card) scenarios, we can find a solution later.
>
> What is the use-case for HLT in guest again?

Together w/ mwait/hlt/pause no vmexits for dedicated instances. In
addition, hlt in guest will disable pv qspinlock which is not optimal
for dedicated instance. Refer to commit
b2798ba0b876 (KVM: X86: Choose qspinlock when dedicated physical CPUs
are available) and commit caa057a2cad64 (KVM: X86: Provide a
capability to disable HLT intercepts).

>
> I'll find the source for the oops (or confirm can't reproduce with
> kvm/queue RSN).

Looking forward to it, thanks Marcelo, hope we can catch the upcoming
merge window. :)

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, back to index

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-17 11:24 [PATCH v4 0/5] KVM: LAPIC: Implement Exitless Timer Wanpeng Li
2019-06-17 11:24 ` [PATCH v4 1/5] KVM: LAPIC: Make lapic timer unpinned Wanpeng Li
2019-06-17 11:48   ` Peter Xu
2019-06-18  0:38     ` Wanpeng Li
2019-06-17 11:24 ` [PATCH v4 2/5] KVM: LAPIC: inject lapic timer interrupt by posted interrupt Wanpeng Li
2019-06-18 13:35   ` Marcelo Tosatti
2019-06-19  0:36     ` Wanpeng Li
2019-06-19 21:03       ` Marcelo Tosatti
2019-06-20  0:52         ` Wanpeng Li
2019-06-21  1:42         ` Wanpeng Li
2019-06-21 21:42           ` Marcelo Tosatti
2019-06-24  8:53             ` Wanpeng Li
2019-06-25 19:00               ` Marcelo Tosatti
2019-06-26 11:02                 ` Wanpeng Li
2019-06-26 16:44                   ` Marcelo Tosatti
2019-06-28  8:26                     ` Wanpeng Li
2019-06-25 17:02             ` Paolo Bonzini
2019-06-17 11:24 ` [PATCH v4 3/5] KVM: LAPIC: Ignore timer migration when lapic timer is injected by pi Wanpeng Li
2019-06-17 11:24 ` [PATCH v4 4/5] KVM: LAPIC: Don't posted inject already-expired timer Wanpeng Li
2019-06-17 11:24 ` [PATCH v4 5/5] KVM: LAPIC: add advance timer support to pi_inject_timer Wanpeng Li
2019-06-17 21:32   ` Radim Krčmář
2019-06-18  0:44     ` Wanpeng Li
2019-06-18  0:57       ` Wanpeng Li

KVM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kvm/0 kvm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kvm kvm/ https://lore.kernel.org/kvm \
		kvm@vger.kernel.org kvm@archiver.kernel.org
	public-inbox-index kvm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.kvm


AGPL code for this site: git clone https://public-inbox.org/ public-inbox