linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 1/6] KVM: Fix leak vCPU's VMCS value into other pCPU
@ 2019-08-05  2:03 Wanpeng Li
  2019-08-05  2:03 ` [PATCH v4 2/6] KVM: LAPIC: Don't need to wakeup vCPU twice afer timer fire Wanpeng Li
                   ` (5 more replies)
  0 siblings, 6 replies; 12+ messages in thread
From: Wanpeng Li @ 2019-08-05  2:03 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Radim Krčmář,
	Christian Borntraeger, Marc Zyngier, stable

From: Wanpeng Li <wanpengli@tencent.com>

After commit d73eb57b80b (KVM: Boost vCPUs that are delivering interrupts), a 
five years old bug is exposed. Running ebizzy benchmark in three 80 vCPUs VMs 
on one 80 pCPUs Skylake server, a lot of rcu_sched stall warning splatting 
in the VMs after stress testing:

 INFO: rcu_sched detected stalls on CPUs/tasks: { 4 41 57 62 77} (detected by 15, t=60004 jiffies, g=899, c=898, q=15073)
 Call Trace:
   flush_tlb_mm_range+0x68/0x140
   tlb_flush_mmu.part.75+0x37/0xe0
   tlb_finish_mmu+0x55/0x60
   zap_page_range+0x142/0x190
   SyS_madvise+0x3cd/0x9c0
   system_call_fastpath+0x1c/0x21

swait_active() sustains to be true before finish_swait() is called in 
kvm_vcpu_block(), voluntarily preempted vCPUs are taken into account 
by kvm_vcpu_on_spin() loop greatly increases the probability condition 
kvm_arch_vcpu_runnable(vcpu) is checked and can be true, when APICv 
is enabled the yield-candidate vCPU's VMCS RVI field leaks(by 
vmx_sync_pir_to_irr()) into spinning-on-a-taken-lock vCPU's current 
VMCS.

This patch fixes it by checking conservatively a subset of events.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Marc Zyngier <Marc.Zyngier@arm.com>
Cc: stable@vger.kernel.org
Fixes: 98f4a1467 (KVM: add kvm_arch_vcpu_runnable() test to kvm_vcpu_on_spin() loop)
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
v3 -> v4:
 * just test KVM_REQ_*
 * rename the hook to apicv_has_pending_interrupt
 * wrap with #ifdef CONFIG_KVM_ASYNC_PF 
v2 -> v3:
 * check conservatively a subset of events
v1 -> v2:
 * checking swait_active(&vcpu->wq) for involuntary preemption

 arch/mips/kvm/mips.c            |  5 +++++
 arch/powerpc/kvm/powerpc.c      |  5 +++++
 arch/s390/kvm/kvm-s390.c        |  5 +++++
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/svm.c              |  6 ++++++
 arch/x86/kvm/vmx/vmx.c          |  6 ++++++
 arch/x86/kvm/x86.c              | 16 ++++++++++++++++
 include/linux/kvm_host.h        |  1 +
 virt/kvm/arm/arm.c              |  5 +++++
 virt/kvm/kvm_main.c             | 16 +++++++++++++++-
 10 files changed, 65 insertions(+), 1 deletion(-)

diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index 2cfe839..95a4642 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -98,6 +98,11 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
 	return !!(vcpu->arch.pending_exceptions);
 }
 
+bool kvm_arch_dy_runnable(struct kvm_vcpu *vcpu)
+{
+	return kvm_arch_vcpu_runnable(vcpu);
+}
+
 bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu)
 {
 	return false;
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 0dba7eb..3e34d5f 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -50,6 +50,11 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
 	return !!(v->arch.pending_exceptions) || kvm_request_pending(v);
 }
 
+bool kvm_arch_dy_runnable(struct kvm_vcpu *vcpu)
+{
+	return kvm_arch_vcpu_runnable(vcpu);
+}
+
 bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu)
 {
 	return false;
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 3f520cd8..5623b23 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -3102,6 +3102,11 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
 	return kvm_s390_vcpu_has_irq(vcpu, 0);
 }
 
+bool kvm_arch_dy_runnable(struct kvm_vcpu *vcpu)
+{
+	return kvm_arch_vcpu_runnable(vcpu);
+}
+
 bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu)
 {
 	return !(vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE);
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 7b0a4ee..25ffa7c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1175,6 +1175,7 @@ struct kvm_x86_ops {
 	int (*update_pi_irte)(struct kvm *kvm, unsigned int host_irq,
 			      uint32_t guest_irq, bool set);
 	void (*apicv_post_state_restore)(struct kvm_vcpu *vcpu);
+	bool (*apicv_has_pending_interrupt)(struct kvm_vcpu *vcpu);
 
 	int (*set_hv_timer)(struct kvm_vcpu *vcpu, u64 guest_deadline_tsc,
 			    bool *expired);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 7eafc69..1b4384f 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -5190,6 +5190,11 @@ static void svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
 		kvm_vcpu_wake_up(vcpu);
 }
 
+static bool svm_apicv_has_pending_interrupt(struct kvm_vcpu *vcpu)
+{
+	return false;
+}
+
 static void svm_ir_list_del(struct vcpu_svm *svm, struct amd_iommu_pi_data *pi)
 {
 	unsigned long flags;
@@ -7314,6 +7319,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
 
 	.pmu_ops = &amd_pmu_ops,
 	.deliver_posted_interrupt = svm_deliver_avic_intr,
+	.apicv_has_pending_interrupt = svm_apicv_has_pending_interrupt,
 	.update_pi_irte = svm_update_pi_irte,
 	.setup_mce = svm_setup_mce,
 
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 074385c..59871b6 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6117,6 +6117,11 @@ static int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
 	return max_irr;
 }
 
+static bool vmx_apicv_has_pending_interrupt(struct kvm_vcpu *vcpu)
+{
+	return pi_test_on(vcpu_to_pi_desc(vcpu));
+}
+
 static void vmx_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)
 {
 	if (!kvm_vcpu_apicv_active(vcpu))
@@ -7726,6 +7731,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
 	.guest_apic_has_interrupt = vmx_guest_apic_has_interrupt,
 	.sync_pir_to_irr = vmx_sync_pir_to_irr,
 	.deliver_posted_interrupt = vmx_deliver_posted_interrupt,
+	.apicv_has_pending_interrupt = vmx_apicv_has_pending_interrupt,
 
 	.set_tss_addr = vmx_set_tss_addr,
 	.set_identity_map_addr = vmx_set_identity_map_addr,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c6d951c..f715efb 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9698,6 +9698,22 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
 	return kvm_vcpu_running(vcpu) || kvm_vcpu_has_events(vcpu);
 }
 
+bool kvm_arch_dy_runnable(struct kvm_vcpu *vcpu)
+{
+	if (READ_ONCE(vcpu->arch.pv.pv_unhalted))
+		return true;
+
+	if (kvm_test_request(KVM_REQ_NMI, vcpu) ||
+		kvm_test_request(KVM_REQ_SMI, vcpu) ||
+		 kvm_test_request(KVM_REQ_EVENT, vcpu))
+		return true;
+
+	if (vcpu->arch.apicv_active && kvm_x86_ops->apicv_has_pending_interrupt(vcpu))
+		return true;
+
+	return false;
+}
+
 bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu)
 {
 	return vcpu->arch.preempted_in_kernel;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 5c5b586..9e4c2bb 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -872,6 +872,7 @@ int kvm_arch_check_processor_compat(void);
 int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu);
 bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu);
 int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu);
+bool kvm_arch_dy_runnable(struct kvm_vcpu *vcpu);
 
 #ifndef __KVM_HAVE_ARCH_VM_ALLOC
 /*
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index acc4324..2927895 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -444,6 +444,11 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
 		&& !v->arch.power_off && !v->arch.pause);
 }
 
+bool kvm_arch_dy_runnable(struct kvm_vcpu *vcpu)
+{
+	return kvm_arch_vcpu_runnable(vcpu);
+}
+
 bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu)
 {
 	return vcpu_mode_priv(vcpu);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 887f3b0..e121423 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2477,6 +2477,20 @@ static bool kvm_vcpu_eligible_for_directed_yield(struct kvm_vcpu *vcpu)
 #endif
 }
 
+static bool vcpu_runnable(struct kvm_vcpu *vcpu)
+{
+	/* It is called outside vcpu_load/vcpu_put */
+	if (kvm_arch_dy_runnable(vcpu))
+		return true;
+
+#ifdef CONFIG_KVM_ASYNC_PF
+	if (!list_empty_careful(&vcpu->async_pf.done))
+		return true;
+#endif
+
+	return false;
+}
+
 void kvm_vcpu_on_spin(struct kvm_vcpu *me, bool yield_to_kernel_mode)
 {
 	struct kvm *kvm = me->kvm;
@@ -2506,7 +2520,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me, bool yield_to_kernel_mode)
 				continue;
 			if (vcpu == me)
 				continue;
-			if (swait_active(&vcpu->wq) && !kvm_arch_vcpu_runnable(vcpu))
+			if (swait_active(&vcpu->wq) && !vcpu_runnable(vcpu))
 				continue;
 			if (yield_to_kernel_mode && !kvm_arch_vcpu_in_kernel(vcpu))
 				continue;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 2/6] KVM: LAPIC: Don't need to wakeup vCPU twice afer timer fire
  2019-08-05  2:03 [PATCH v4 1/6] KVM: Fix leak vCPU's VMCS value into other pCPU Wanpeng Li
@ 2019-08-05  2:03 ` Wanpeng Li
  2019-08-05  2:03 ` [PATCH v4 3/6] KVM: Check preempted_in_kernel for involuntary preemption Wanpeng Li
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Wanpeng Li @ 2019-08-05  2:03 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: Paolo Bonzini, Radim Krčmář

From: Wanpeng Li <wanpengli@tencent.com>

kvm_set_pending_timer() will take care to wake up the sleeping vCPU which
has pending timer, don't need to check this in apic_timer_expired() again.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
 arch/x86/kvm/lapic.c | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 0aa1586..685d17c 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1548,7 +1548,6 @@ static void kvm_apic_inject_pending_timer_irqs(struct kvm_lapic *apic)
 static void apic_timer_expired(struct kvm_lapic *apic)
 {
 	struct kvm_vcpu *vcpu = apic->vcpu;
-	struct swait_queue_head *q = &vcpu->wq;
 	struct kvm_timer *ktimer = &apic->lapic_timer;
 
 	if (atomic_read(&apic->lapic_timer.pending))
@@ -1566,13 +1565,6 @@ static void apic_timer_expired(struct kvm_lapic *apic)
 
 	atomic_inc(&apic->lapic_timer.pending);
 	kvm_set_pending_timer(vcpu);
-
-	/*
-	 * For x86, the atomic_inc() is serialized, thus
-	 * using swait_active() is safe.
-	 */
-	if (swait_active(q))
-		swake_up_one(q);
 }
 
 static void start_sw_tscdeadline(struct kvm_lapic *apic)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 3/6] KVM: Check preempted_in_kernel for involuntary preemption
  2019-08-05  2:03 [PATCH v4 1/6] KVM: Fix leak vCPU's VMCS value into other pCPU Wanpeng Li
  2019-08-05  2:03 ` [PATCH v4 2/6] KVM: LAPIC: Don't need to wakeup vCPU twice afer timer fire Wanpeng Li
@ 2019-08-05  2:03 ` Wanpeng Li
  2019-08-05  2:03 ` [PATCH v4 4/6] KVM: X86: Use IPI shorthands in kvm guest when support Wanpeng Li
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Wanpeng Li @ 2019-08-05  2:03 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: Paolo Bonzini, Radim Krčmář

From: Wanpeng Li <wanpengli@tencent.com>

preempted_in_kernel is updated in preempt_notifier when involuntary preemption
ocurrs, it can be stale when the voluntarily preempted vCPUs are taken into
account by kvm_vcpu_on_spin() loop. This patch lets it just check preempted_in_kernel
for involuntary preemption.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
 virt/kvm/kvm_main.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index e121423..2ae9e84 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2522,7 +2522,8 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me, bool yield_to_kernel_mode)
 				continue;
 			if (swait_active(&vcpu->wq) && !vcpu_runnable(vcpu))
 				continue;
-			if (yield_to_kernel_mode && !kvm_arch_vcpu_in_kernel(vcpu))
+			if (READ_ONCE(vcpu->preempted) && yield_to_kernel_mode &&
+				!kvm_arch_vcpu_in_kernel(vcpu))
 				continue;
 			if (!kvm_vcpu_eligible_for_directed_yield(vcpu))
 				continue;
@@ -4219,7 +4220,7 @@ static void kvm_sched_in(struct preempt_notifier *pn, int cpu)
 {
 	struct kvm_vcpu *vcpu = preempt_notifier_to_vcpu(pn);
 
-	vcpu->preempted = false;
+	WRITE_ONCE(vcpu->preempted, false);
 	WRITE_ONCE(vcpu->ready, false);
 
 	kvm_arch_sched_in(vcpu, cpu);
@@ -4233,7 +4234,7 @@ static void kvm_sched_out(struct preempt_notifier *pn,
 	struct kvm_vcpu *vcpu = preempt_notifier_to_vcpu(pn);
 
 	if (current->state == TASK_RUNNING) {
-		vcpu->preempted = true;
+		WRITE_ONCE(vcpu->preempted, true);
 		WRITE_ONCE(vcpu->ready, true);
 	}
 	kvm_arch_vcpu_put(vcpu);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 4/6] KVM: X86: Use IPI shorthands in kvm guest when support
  2019-08-05  2:03 [PATCH v4 1/6] KVM: Fix leak vCPU's VMCS value into other pCPU Wanpeng Li
  2019-08-05  2:03 ` [PATCH v4 2/6] KVM: LAPIC: Don't need to wakeup vCPU twice afer timer fire Wanpeng Li
  2019-08-05  2:03 ` [PATCH v4 3/6] KVM: Check preempted_in_kernel for involuntary preemption Wanpeng Li
@ 2019-08-05  2:03 ` Wanpeng Li
  2019-08-05  2:03 ` [PATCH v4 5/6] KVM: LAPIC: Add pv ipi tracepoint Wanpeng Li
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Wanpeng Li @ 2019-08-05  2:03 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Radim Krčmář,
	Thomas Gleixner, Sean Christopherson, Nadav Amit

From: Wanpeng Li <wanpengli@tencent.com>

IPI shorthand is supported now by linux apic/x2apic driver, switch to
IPI shorthand for all excluding self and all including self destination
shorthand in kvm guest, to avoid splitting the target mask into several
PV IPI hypercalls. This patch removes the kvm_send_ipi_all() and
kvm_send_ipi_allbutself() since the callers in APIC codes have already
taken care of apic_use_ipi_shorthand and fallback to ->send_IPI_mask
and ->send_IPI_mask_allbutself if it is false.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Nadav Amit <namit@vmware.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
Note: rebase against tip tree's x86/apic branch, but just modify kvm codes
v1 -> v2:
 * remove kvm_send_ipi_all() and kvm_send_ipi_allbutself()

 arch/x86/kernel/kvm.c | 12 ------------
 1 file changed, 12 deletions(-)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index b7f34fe..96626d8 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -505,16 +505,6 @@ static void kvm_send_ipi_mask_allbutself(const struct cpumask *mask, int vector)
 	__send_ipi_mask(local_mask, vector);
 }
 
-static void kvm_send_ipi_allbutself(int vector)
-{
-	kvm_send_ipi_mask_allbutself(cpu_online_mask, vector);
-}
-
-static void kvm_send_ipi_all(int vector)
-{
-	__send_ipi_mask(cpu_online_mask, vector);
-}
-
 /*
  * Set the IPI entry points
  */
@@ -522,8 +512,6 @@ static void kvm_setup_pv_ipi(void)
 {
 	apic->send_IPI_mask = kvm_send_ipi_mask;
 	apic->send_IPI_mask_allbutself = kvm_send_ipi_mask_allbutself;
-	apic->send_IPI_allbutself = kvm_send_ipi_allbutself;
-	apic->send_IPI_all = kvm_send_ipi_all;
 	pr_info("KVM setup pv IPIs\n");
 }
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 5/6] KVM: LAPIC: Add pv ipi tracepoint
  2019-08-05  2:03 [PATCH v4 1/6] KVM: Fix leak vCPU's VMCS value into other pCPU Wanpeng Li
                   ` (2 preceding siblings ...)
  2019-08-05  2:03 ` [PATCH v4 4/6] KVM: X86: Use IPI shorthands in kvm guest when support Wanpeng Li
@ 2019-08-05  2:03 ` Wanpeng Li
  2019-08-05  2:03 ` [PATCH v4 6/6] KVM: X86: Add pv tlb shootdown tracepoint Wanpeng Li
  2019-08-05 23:17 ` [PATCH v4 1/6] KVM: Fix leak vCPU's VMCS value into other pCPU Paolo Bonzini
  5 siblings, 0 replies; 12+ messages in thread
From: Wanpeng Li @ 2019-08-05  2:03 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: Paolo Bonzini, Radim Krčmář

From: Wanpeng Li <wanpengli@tencent.com>

Add pv ipi tracepoint.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
 arch/x86/kvm/lapic.c |  2 ++
 arch/x86/kvm/trace.h | 25 +++++++++++++++++++++++++
 2 files changed, 27 insertions(+)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 685d17c..df5cd07 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -570,6 +570,8 @@ int kvm_pv_send_ipi(struct kvm *kvm, unsigned long ipi_bitmap_low,
 	irq.level = (icr & APIC_INT_ASSERT) != 0;
 	irq.trig_mode = icr & APIC_INT_LEVELTRIG;
 
+	trace_kvm_pv_send_ipi(irq.vector, min, ipi_bitmap_low, ipi_bitmap_high);
+
 	if (icr & APIC_DEST_MASK)
 		return -KVM_EINVAL;
 	if (icr & APIC_SHORT_MASK)
diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index b5c831e..ce6ee34 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -1462,6 +1462,31 @@ TRACE_EVENT(kvm_hv_send_ipi_ex,
 		  __entry->vector, __entry->format,
 		  __entry->valid_bank_mask)
 );
+
+/*
+ * Tracepoints for kvm_pv_send_ipi.
+ */
+TRACE_EVENT(kvm_pv_send_ipi,
+	TP_PROTO(u32 vector, u32 min, unsigned long ipi_bitmap_low, unsigned long ipi_bitmap_high),
+	TP_ARGS(vector, min, ipi_bitmap_low, ipi_bitmap_high),
+
+	TP_STRUCT__entry(
+		__field(u32, vector)
+		__field(u32, min)
+		__field(unsigned long, ipi_bitmap_low)
+		__field(unsigned long, ipi_bitmap_high)
+	),
+
+	TP_fast_assign(
+		__entry->vector = vector;
+		__entry->min = min;
+		__entry->ipi_bitmap_low = ipi_bitmap_low;
+		__entry->ipi_bitmap_high = ipi_bitmap_high;
+	),
+
+	TP_printk("vector %d min 0x%x ipi_bitmap_low 0x%lx ipi_bitmap_high 0x%lx",
+		  __entry->vector, __entry->min, __entry->ipi_bitmap_low, __entry->ipi_bitmap_high)
+);
 #endif /* _TRACE_KVM_H */
 
 #undef TRACE_INCLUDE_PATH
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 6/6] KVM: X86: Add pv tlb shootdown tracepoint
  2019-08-05  2:03 [PATCH v4 1/6] KVM: Fix leak vCPU's VMCS value into other pCPU Wanpeng Li
                   ` (3 preceding siblings ...)
  2019-08-05  2:03 ` [PATCH v4 5/6] KVM: LAPIC: Add pv ipi tracepoint Wanpeng Li
@ 2019-08-05  2:03 ` Wanpeng Li
  2019-08-05 23:17 ` [PATCH v4 1/6] KVM: Fix leak vCPU's VMCS value into other pCPU Paolo Bonzini
  5 siblings, 0 replies; 12+ messages in thread
From: Wanpeng Li @ 2019-08-05  2:03 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: Paolo Bonzini, Radim Krčmář

From: Wanpeng Li <wanpengli@tencent.com>

Add pv tlb shootdown tracepoint.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
 arch/x86/kvm/trace.h | 19 +++++++++++++++++++
 arch/x86/kvm/x86.c   |  2 ++
 2 files changed, 21 insertions(+)

diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index ce6ee34..84f32d3 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -1487,6 +1487,25 @@ TRACE_EVENT(kvm_pv_send_ipi,
 	TP_printk("vector %d min 0x%x ipi_bitmap_low 0x%lx ipi_bitmap_high 0x%lx",
 		  __entry->vector, __entry->min, __entry->ipi_bitmap_low, __entry->ipi_bitmap_high)
 );
+
+TRACE_EVENT(kvm_pv_tlb_flush,
+	TP_PROTO(unsigned int vcpu_id, bool need_flush_tlb),
+	TP_ARGS(vcpu_id, need_flush_tlb),
+
+	TP_STRUCT__entry(
+		__field(	unsigned int,	vcpu_id		)
+		__field(	bool,	need_flush_tlb		)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu_id	= vcpu_id;
+		__entry->need_flush_tlb = need_flush_tlb;
+	),
+
+	TP_printk("vcpu %u need_flush_tlb %s", __entry->vcpu_id,
+		__entry->need_flush_tlb ? "true" : "false")
+);
+
 #endif /* _TRACE_KVM_H */
 
 #undef TRACE_INCLUDE_PATH
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f715efb..7a302cd 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2452,6 +2452,8 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
 	 * Doing a TLB flush here, on the guest's behalf, can avoid
 	 * expensive IPIs.
 	 */
+	trace_kvm_pv_tlb_flush(vcpu->vcpu_id,
+		vcpu->arch.st.steal.preempted & KVM_VCPU_FLUSH_TLB);
 	if (xchg(&vcpu->arch.st.steal.preempted, 0) & KVM_VCPU_FLUSH_TLB)
 		kvm_vcpu_flush_tlb(vcpu, false);
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 1/6] KVM: Fix leak vCPU's VMCS value into other pCPU
  2019-08-05  2:03 [PATCH v4 1/6] KVM: Fix leak vCPU's VMCS value into other pCPU Wanpeng Li
                   ` (4 preceding siblings ...)
  2019-08-05  2:03 ` [PATCH v4 6/6] KVM: X86: Add pv tlb shootdown tracepoint Wanpeng Li
@ 2019-08-05 23:17 ` Paolo Bonzini
  2019-08-06  0:35   ` Wanpeng Li
  5 siblings, 1 reply; 12+ messages in thread
From: Paolo Bonzini @ 2019-08-05 23:17 UTC (permalink / raw)
  To: Wanpeng Li, linux-kernel, kvm
  Cc: Radim Krčmář, Christian Borntraeger, Marc Zyngier, stable

On 05/08/19 04:03, Wanpeng Li wrote:
> From: Wanpeng Li <wanpengli@tencent.com>
> 
> After commit d73eb57b80b (KVM: Boost vCPUs that are delivering interrupts), a 
> five years old bug is exposed. Running ebizzy benchmark in three 80 vCPUs VMs 
> on one 80 pCPUs Skylake server, a lot of rcu_sched stall warning splatting 
> in the VMs after stress testing:
> 
>  INFO: rcu_sched detected stalls on CPUs/tasks: { 4 41 57 62 77} (detected by 15, t=60004 jiffies, g=899, c=898, q=15073)
>  Call Trace:
>    flush_tlb_mm_range+0x68/0x140
>    tlb_flush_mmu.part.75+0x37/0xe0
>    tlb_finish_mmu+0x55/0x60
>    zap_page_range+0x142/0x190
>    SyS_madvise+0x3cd/0x9c0
>    system_call_fastpath+0x1c/0x21
> 
> swait_active() sustains to be true before finish_swait() is called in 
> kvm_vcpu_block(), voluntarily preempted vCPUs are taken into account 
> by kvm_vcpu_on_spin() loop greatly increases the probability condition 
> kvm_arch_vcpu_runnable(vcpu) is checked and can be true, when APICv 
> is enabled the yield-candidate vCPU's VMCS RVI field leaks(by 
> vmx_sync_pir_to_irr()) into spinning-on-a-taken-lock vCPU's current 
> VMCS.
> 
> This patch fixes it by checking conservatively a subset of events.
> 
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Cc: Christian Borntraeger <borntraeger@de.ibm.com>
> Cc: Marc Zyngier <Marc.Zyngier@arm.com>
> Cc: stable@vger.kernel.org
> Fixes: 98f4a1467 (KVM: add kvm_arch_vcpu_runnable() test to kvm_vcpu_on_spin() loop)
> Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> ---
> v3 -> v4:
>  * just test KVM_REQ_*
>  * rename the hook to apicv_has_pending_interrupt
>  * wrap with #ifdef CONFIG_KVM_ASYNC_PF 
> v2 -> v3:
>  * check conservatively a subset of events
> v1 -> v2:
>  * checking swait_active(&vcpu->wq) for involuntary preemption
> 
>  arch/mips/kvm/mips.c            |  5 +++++
>  arch/powerpc/kvm/powerpc.c      |  5 +++++
>  arch/s390/kvm/kvm-s390.c        |  5 +++++
>  arch/x86/include/asm/kvm_host.h |  1 +
>  arch/x86/kvm/svm.c              |  6 ++++++
>  arch/x86/kvm/vmx/vmx.c          |  6 ++++++
>  arch/x86/kvm/x86.c              | 16 ++++++++++++++++
>  include/linux/kvm_host.h        |  1 +
>  virt/kvm/arm/arm.c              |  5 +++++
>  virt/kvm/kvm_main.c             | 16 +++++++++++++++-
>  10 files changed, 65 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
> index 2cfe839..95a4642 100644
> --- a/arch/mips/kvm/mips.c
> +++ b/arch/mips/kvm/mips.c
> @@ -98,6 +98,11 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
>  	return !!(vcpu->arch.pending_exceptions);
>  }
>  
> +bool kvm_arch_dy_runnable(struct kvm_vcpu *vcpu)

Using a __weak definition for the default implementation is a bit more
concise.  Queued with that change.

Paolo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 1/6] KVM: Fix leak vCPU's VMCS value into other pCPU
  2019-08-05 23:17 ` [PATCH v4 1/6] KVM: Fix leak vCPU's VMCS value into other pCPU Paolo Bonzini
@ 2019-08-06  0:35   ` Wanpeng Li
  2019-08-06  6:20     ` Paolo Bonzini
  0 siblings, 1 reply; 12+ messages in thread
From: Wanpeng Li @ 2019-08-06  0:35 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: LKML, kvm, Radim Krčmář,
	Christian Borntraeger, Marc Zyngier, # v3 . 10+

On Tue, 6 Aug 2019 at 07:17, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 05/08/19 04:03, Wanpeng Li wrote:
> > From: Wanpeng Li <wanpengli@tencent.com>
> >
> > After commit d73eb57b80b (KVM: Boost vCPUs that are delivering interrupts), a
> > five years old bug is exposed. Running ebizzy benchmark in three 80 vCPUs VMs
> > on one 80 pCPUs Skylake server, a lot of rcu_sched stall warning splatting
> > in the VMs after stress testing:
> >
> >  INFO: rcu_sched detected stalls on CPUs/tasks: { 4 41 57 62 77} (detected by 15, t=60004 jiffies, g=899, c=898, q=15073)
> >  Call Trace:
> >    flush_tlb_mm_range+0x68/0x140
> >    tlb_flush_mmu.part.75+0x37/0xe0
> >    tlb_finish_mmu+0x55/0x60
> >    zap_page_range+0x142/0x190
> >    SyS_madvise+0x3cd/0x9c0
> >    system_call_fastpath+0x1c/0x21
> >
> > swait_active() sustains to be true before finish_swait() is called in
> > kvm_vcpu_block(), voluntarily preempted vCPUs are taken into account
> > by kvm_vcpu_on_spin() loop greatly increases the probability condition
> > kvm_arch_vcpu_runnable(vcpu) is checked and can be true, when APICv
> > is enabled the yield-candidate vCPU's VMCS RVI field leaks(by
> > vmx_sync_pir_to_irr()) into spinning-on-a-taken-lock vCPU's current
> > VMCS.
> >
> > This patch fixes it by checking conservatively a subset of events.
> >
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: Radim Krčmář <rkrcmar@redhat.com>
> > Cc: Christian Borntraeger <borntraeger@de.ibm.com>
> > Cc: Marc Zyngier <Marc.Zyngier@arm.com>
> > Cc: stable@vger.kernel.org
> > Fixes: 98f4a1467 (KVM: add kvm_arch_vcpu_runnable() test to kvm_vcpu_on_spin() loop)
> > Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> > ---
> > v3 -> v4:
> >  * just test KVM_REQ_*
> >  * rename the hook to apicv_has_pending_interrupt
> >  * wrap with #ifdef CONFIG_KVM_ASYNC_PF
> > v2 -> v3:
> >  * check conservatively a subset of events
> > v1 -> v2:
> >  * checking swait_active(&vcpu->wq) for involuntary preemption
> >
> >  arch/mips/kvm/mips.c            |  5 +++++
> >  arch/powerpc/kvm/powerpc.c      |  5 +++++
> >  arch/s390/kvm/kvm-s390.c        |  5 +++++
> >  arch/x86/include/asm/kvm_host.h |  1 +
> >  arch/x86/kvm/svm.c              |  6 ++++++
> >  arch/x86/kvm/vmx/vmx.c          |  6 ++++++
> >  arch/x86/kvm/x86.c              | 16 ++++++++++++++++
> >  include/linux/kvm_host.h        |  1 +
> >  virt/kvm/arm/arm.c              |  5 +++++
> >  virt/kvm/kvm_main.c             | 16 +++++++++++++++-
> >  10 files changed, 65 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
> > index 2cfe839..95a4642 100644
> > --- a/arch/mips/kvm/mips.c
> > +++ b/arch/mips/kvm/mips.c
> > @@ -98,6 +98,11 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
> >       return !!(vcpu->arch.pending_exceptions);
> >  }
> >
> > +bool kvm_arch_dy_runnable(struct kvm_vcpu *vcpu)
>
> Using a __weak definition for the default implementation is a bit more
> concise.  Queued with that change.

Thank you, Paolo! Btw, how about other 5 patches?

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 1/6] KVM: Fix leak vCPU's VMCS value into other pCPU
  2019-08-06  0:35   ` Wanpeng Li
@ 2019-08-06  6:20     ` Paolo Bonzini
  2019-08-22  0:46       ` Wanpeng Li
  0 siblings, 1 reply; 12+ messages in thread
From: Paolo Bonzini @ 2019-08-06  6:20 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: LKML, kvm, Radim Krčmář,
	Christian Borntraeger, Marc Zyngier, # v3 . 10+

On 06/08/19 02:35, Wanpeng Li wrote:
> Thank you, Paolo! Btw, how about other 5 patches?

Queued everything else too.

Paolo

> Regards,
> Wanpeng Li


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 1/6] KVM: Fix leak vCPU's VMCS value into other pCPU
  2019-08-06  6:20     ` Paolo Bonzini
@ 2019-08-22  0:46       ` Wanpeng Li
  2019-08-22  8:35         ` Paolo Bonzini
  0 siblings, 1 reply; 12+ messages in thread
From: Wanpeng Li @ 2019-08-22  0:46 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: LKML, kvm, Radim Krčmář,
	Christian Borntraeger, Marc Zyngier, # v3 . 10+

On Tue, 6 Aug 2019 at 14:20, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 06/08/19 02:35, Wanpeng Li wrote:
> > Thank you, Paolo! Btw, how about other 5 patches?
>
> Queued everything else too.

How about patch 4/6~5/6, they are not in kvm/queue. :)

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 1/6] KVM: Fix leak vCPU's VMCS value into other pCPU
  2019-08-22  0:46       ` Wanpeng Li
@ 2019-08-22  8:35         ` Paolo Bonzini
  2019-08-22  8:45           ` Wanpeng Li
  0 siblings, 1 reply; 12+ messages in thread
From: Paolo Bonzini @ 2019-08-22  8:35 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: LKML, kvm, Radim Krčmář,
	Christian Borntraeger, Marc Zyngier, # v3 . 10+

On 22/08/19 02:46, Wanpeng Li wrote:
> On Tue, 6 Aug 2019 at 14:20, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>
>> On 06/08/19 02:35, Wanpeng Li wrote:
>>> Thank you, Paolo! Btw, how about other 5 patches?
>>
>> Queued everything else too.
> 
> How about patch 4/6~5/6, they are not in kvm/queue. :)

I queued 4.

For patch 5, I don't really see the benefit since the hypercall
arguments are already traced.

Paolo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 1/6] KVM: Fix leak vCPU's VMCS value into other pCPU
  2019-08-22  8:35         ` Paolo Bonzini
@ 2019-08-22  8:45           ` Wanpeng Li
  0 siblings, 0 replies; 12+ messages in thread
From: Wanpeng Li @ 2019-08-22  8:45 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: LKML, kvm, Radim Krčmář,
	Christian Borntraeger, Marc Zyngier, # v3 . 10+

On Thu, 22 Aug 2019 at 16:35, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 22/08/19 02:46, Wanpeng Li wrote:
> > On Tue, 6 Aug 2019 at 14:20, Paolo Bonzini <pbonzini@redhat.com> wrote:
> >>
> >> On 06/08/19 02:35, Wanpeng Li wrote:
> >>> Thank you, Paolo! Btw, how about other 5 patches?
> >>
> >> Queued everything else too.
> >
> > How about patch 4/6~5/6, they are not in kvm/queue. :)
>
> I queued 4.
>
> For patch 5, I don't really see the benefit since the hypercall
> arguments are already traced.

Agreed, thank you.

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-08-22  8:45 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-05  2:03 [PATCH v4 1/6] KVM: Fix leak vCPU's VMCS value into other pCPU Wanpeng Li
2019-08-05  2:03 ` [PATCH v4 2/6] KVM: LAPIC: Don't need to wakeup vCPU twice afer timer fire Wanpeng Li
2019-08-05  2:03 ` [PATCH v4 3/6] KVM: Check preempted_in_kernel for involuntary preemption Wanpeng Li
2019-08-05  2:03 ` [PATCH v4 4/6] KVM: X86: Use IPI shorthands in kvm guest when support Wanpeng Li
2019-08-05  2:03 ` [PATCH v4 5/6] KVM: LAPIC: Add pv ipi tracepoint Wanpeng Li
2019-08-05  2:03 ` [PATCH v4 6/6] KVM: X86: Add pv tlb shootdown tracepoint Wanpeng Li
2019-08-05 23:17 ` [PATCH v4 1/6] KVM: Fix leak vCPU's VMCS value into other pCPU Paolo Bonzini
2019-08-06  0:35   ` Wanpeng Li
2019-08-06  6:20     ` Paolo Bonzini
2019-08-22  0:46       ` Wanpeng Li
2019-08-22  8:35         ` Paolo Bonzini
2019-08-22  8:45           ` Wanpeng Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).