linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/6] KVM: x86: cleanup and speedup for APICv
@ 2016-12-19 16:17 Paolo Bonzini
  2016-12-19 16:17 ` [PATCH 1/6] KVM: vmx: clear pending interrupts on KVM_SET_LAPIC Paolo Bonzini
                   ` (6 more replies)
  0 siblings, 7 replies; 25+ messages in thread
From: Paolo Bonzini @ 2016-12-19 16:17 UTC (permalink / raw)
  To: linux-kernel, kvm

These are the fixed patches that now pass vmx.flat.  The issue in
"KVM: x86: do not scan IRR twice on APICv vmentry" was that,
in v1 of the patches, kvm_cpu_get_interrupt did not scan IRR
at all if it found PIR.ON=0.  This is now fixed in patches 4 and 5.

Another latent bug has to be fixed in patch 2.  The patch also
highlights the maze that check_nested_events has become.  Fixing
that is left for later.

The patches are on top of the (non-APICv-related) KVM_REQ_EVENT
patches from this morning.  According to kvm-unit-tests, patch 6 speeds
up self-IPIs (if not done through the accelerated self-IPI MSR) by 5-10%.

Paolo

Paolo Bonzini (6):
  KVM: vmx: clear pending interrupts on KVM_SET_LAPIC
  kvm: nVMX: move nested events check to kvm_vcpu_running
  KVM: x86: preparatory changes for APICv cleanups
  KVM: vmx: move sync_pir_to_irr from apic_find_highest_irr to callers
  KVM: x86: do not scan IRR twice on APICv vmentry
  kvm: x86: do not use KVM_REQ_EVENT for APICv interrupt injection

 arch/x86/include/asm/kvm_host.h |  2 +-
 arch/x86/kvm/lapic.c            | 42 ++++++++++++++++++-------------
 arch/x86/kvm/lapic.h            |  4 +--
 arch/x86/kvm/svm.c              |  6 -----
 arch/x86/kvm/vmx.c              | 55 +++++++++++++++++++++++++++--------------
 arch/x86/kvm/x86.c              | 53 +++++++++++++++++++++------------------
 6 files changed, 93 insertions(+), 69 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/6] KVM: vmx: clear pending interrupts on KVM_SET_LAPIC
  2016-12-19 16:17 [PATCH v2 0/6] KVM: x86: cleanup and speedup for APICv Paolo Bonzini
@ 2016-12-19 16:17 ` Paolo Bonzini
  2017-02-07 17:42   ` Radim Krčmář
  2016-12-19 16:17 ` [PATCH 2/6] kvm: nVMX: move nested events check to kvm_vcpu_running Paolo Bonzini
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Paolo Bonzini @ 2016-12-19 16:17 UTC (permalink / raw)
  To: linux-kernel, kvm

Pending interrupts might be in the PI descriptor when the
LAPIC is restored from an external state; we do not want
them to be injected.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/lapic.c | 3 +--
 arch/x86/kvm/vmx.c   | 9 +++++++++
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 4dc02482faf7..5fa546e27b7e 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2185,8 +2185,7 @@ int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s)
 				1 : count_vectors(apic->regs + APIC_ISR);
 	apic->highest_isr_cache = -1;
 	if (vcpu->arch.apicv_active) {
-		if (kvm_x86_ops->apicv_post_state_restore)
-			kvm_x86_ops->apicv_post_state_restore(vcpu);
+		kvm_x86_ops->apicv_post_state_restore(vcpu);
 		kvm_x86_ops->hwapic_irr_update(vcpu,
 				apic_find_highest_irr(apic));
 		kvm_x86_ops->hwapic_isr_update(vcpu,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 1ba7aaf6ae7f..661956caf162 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -8761,6 +8761,14 @@ static void vmx_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)
 	vmcs_write64(EOI_EXIT_BITMAP3, eoi_exit_bitmap[3]);
 }
 
+static void vmx_apicv_post_state_restore(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+	pi_clear_on(&vmx->pi_desc);
+	memset(vmx->pi_desc.pir, 0, sizeof(vmx->pi_desc.pir));
+}
+
 static void vmx_complete_atomic_exit(struct vcpu_vmx *vmx)
 {
 	u32 exit_intr_info;
@@ -11590,6 +11598,7 @@ static void vmx_setup_mce(struct kvm_vcpu *vcpu)
 	.get_enable_apicv = vmx_get_enable_apicv,
 	.refresh_apicv_exec_ctrl = vmx_refresh_apicv_exec_ctrl,
 	.load_eoi_exitmap = vmx_load_eoi_exitmap,
+	.apicv_post_state_restore = vmx_apicv_post_state_restore,
 	.hwapic_irr_update = vmx_hwapic_irr_update,
 	.hwapic_isr_update = vmx_hwapic_isr_update,
 	.sync_pir_to_irr = vmx_sync_pir_to_irr,
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/6] kvm: nVMX: move nested events check to kvm_vcpu_running
  2016-12-19 16:17 [PATCH v2 0/6] KVM: x86: cleanup and speedup for APICv Paolo Bonzini
  2016-12-19 16:17 ` [PATCH 1/6] KVM: vmx: clear pending interrupts on KVM_SET_LAPIC Paolo Bonzini
@ 2016-12-19 16:17 ` Paolo Bonzini
  2017-02-07 18:16   ` Radim Krčmář
  2016-12-19 16:17 ` [PATCH 3/6] KVM: x86: preparatory changes for APICv cleanups Paolo Bonzini
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Paolo Bonzini @ 2016-12-19 16:17 UTC (permalink / raw)
  To: linux-kernel, kvm

vcpu_run calls kvm_vcpu_running, not kvm_arch_vcpu_runnable,
and the former does not call check_nested_events.

Once KVM_REQ_EVENT is removed from the APICv interrupt injection
path, however, this would leave no place to trigger a vmexit
from L2 to L1, causing a missed interrupt delivery while in guest
mode.  This is caught by the "ack interrupt on exit" test in
vmx.flat.

[This does not change the calls to check_nested_events in
 inject_pending_event.  That is material for a separate cleanup.]

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/x86.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d5a5fc2f8758..32e5f54a8eba 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6920,6 +6920,9 @@ static inline int vcpu_block(struct kvm *kvm, struct kvm_vcpu *vcpu)
 
 static inline bool kvm_vcpu_running(struct kvm_vcpu *vcpu)
 {
+	if (is_guest_mode(vcpu) && kvm_x86_ops->check_nested_events)
+		kvm_x86_ops->check_nested_events(vcpu, false);
+
 	return (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE &&
 		!vcpu->arch.apf.halted);
 }
@@ -8286,9 +8289,6 @@ static inline bool kvm_vcpu_has_events(struct kvm_vcpu *vcpu)
 
 int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
 {
-	if (is_guest_mode(vcpu) && kvm_x86_ops->check_nested_events)
-		kvm_x86_ops->check_nested_events(vcpu, false);
-
 	return kvm_vcpu_running(vcpu) || kvm_vcpu_has_events(vcpu);
 }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 3/6] KVM: x86: preparatory changes for APICv cleanups
  2016-12-19 16:17 [PATCH v2 0/6] KVM: x86: cleanup and speedup for APICv Paolo Bonzini
  2016-12-19 16:17 ` [PATCH 1/6] KVM: vmx: clear pending interrupts on KVM_SET_LAPIC Paolo Bonzini
  2016-12-19 16:17 ` [PATCH 2/6] kvm: nVMX: move nested events check to kvm_vcpu_running Paolo Bonzini
@ 2016-12-19 16:17 ` Paolo Bonzini
  2017-02-07 18:20   ` Radim Krčmář
  2016-12-19 16:17 ` [PATCH 4/6] KVM: vmx: move sync_pir_to_irr from apic_find_highest_irr to callers Paolo Bonzini
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Paolo Bonzini @ 2016-12-19 16:17 UTC (permalink / raw)
  To: linux-kernel, kvm

Add return value to __kvm_apic_update_irr/kvm_apic_update_irr.
Move vmx_sync_pir_to_irr around.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/lapic.c | 25 +++++++++++++++++--------
 arch/x86/kvm/lapic.h |  4 ++--
 arch/x86/kvm/vmx.c   | 32 ++++++++++++++++----------------
 3 files changed, 35 insertions(+), 26 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 5fa546e27b7e..4c76c602576e 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -341,7 +341,7 @@ static int find_highest_vector(void *bitmap)
 	     vec >= 0; vec -= APIC_VECTORS_PER_REG) {
 		reg = bitmap + REG_POS(vec);
 		if (*reg)
-			return fls(*reg) - 1 + vec;
+			return __fls(*reg) + vec;
 	}
 
 	return -1;
@@ -361,27 +361,36 @@ static u8 count_vectors(void *bitmap)
 	return count;
 }
 
-void __kvm_apic_update_irr(u32 *pir, void *regs)
+int __kvm_apic_update_irr(u32 *pir, void *regs)
 {
-	u32 i, pir_val;
+	u32 i, vec;
+	u32 pir_val, irr_val;
+	int max_irr = -1;
 
-	for (i = 0; i <= 7; i++) {
+	for (i = vec = 0; i <= 7; i++, vec += 32) {
 		pir_val = READ_ONCE(pir[i]);
+		irr_val = *((u32 *)(regs + APIC_IRR + i * 0x10));
 		if (pir_val) {
-			pir_val = xchg(&pir[i], 0);
-			*((u32 *)(regs + APIC_IRR + i * 0x10)) |= pir_val;
+			irr_val |= xchg(&pir[i], 0);
+			*((u32 *)(regs + APIC_IRR + i * 0x10)) = irr_val;
 		}
+		if (irr_val)
+			max_irr = __fls(irr_val) + vec;
 	}
+
+	return max_irr;
 }
 EXPORT_SYMBOL_GPL(__kvm_apic_update_irr);
 
-void kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir)
+int kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir)
 {
 	struct kvm_lapic *apic = vcpu->arch.apic;
+	int max_irr;
 
-	__kvm_apic_update_irr(pir, apic->regs);
+	max_irr = __kvm_apic_update_irr(pir, apic->regs);
 
 	kvm_make_request(KVM_REQ_EVENT, vcpu);
+	return max_irr;
 }
 EXPORT_SYMBOL_GPL(kvm_apic_update_irr);
 
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 5b5b1ba644cb..8aa54fdc43d1 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -71,8 +71,8 @@ int kvm_lapic_reg_read(struct kvm_lapic *apic, u32 offset, int len,
 bool kvm_apic_match_dest(struct kvm_vcpu *vcpu, struct kvm_lapic *source,
 			   int short_hand, unsigned int dest, int dest_mode);
 
-void __kvm_apic_update_irr(u32 *pir, void *regs);
-void kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir);
+int __kvm_apic_update_irr(u32 *pir, void *regs);
+int kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir);
 void kvm_apic_update_ppr(struct kvm_vcpu *vcpu);
 int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq,
 		     struct dest_map *dest_map);
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 661956caf162..eab8ab023705 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5068,22 +5068,6 @@ static void vmx_deliver_posted_interrupt(struct kvm_vcpu *vcpu, int vector)
 		kvm_vcpu_kick(vcpu);
 }
 
-static void vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
-{
-	struct vcpu_vmx *vmx = to_vmx(vcpu);
-
-	if (!pi_test_on(&vmx->pi_desc))
-		return;
-
-	pi_clear_on(&vmx->pi_desc);
-	/*
-	 * IOMMU can write to PIR.ON, so the barrier matters even on UP.
-	 * But on x86 this is just a compiler barrier anyway.
-	 */
-	smp_mb__after_atomic();
-	kvm_apic_update_irr(vcpu, vmx->pi_desc.pir);
-}
-
 /*
  * Set up the vmcs's constant host-state fields, i.e., host-state fields that
  * will not change in the lifetime of the guest.
@@ -8750,6 +8734,22 @@ static void vmx_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr)
 	}
 }
 
+static void vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+	if (!pi_test_on(&vmx->pi_desc))
+		return;
+
+	pi_clear_on(&vmx->pi_desc);
+	/*
+	 * IOMMU can write to PIR.ON, so the barrier matters even on UP.
+	 * But on x86 this is just a compiler barrier anyway.
+	 */
+	smp_mb__after_atomic();
+	kvm_apic_update_irr(vcpu, vmx->pi_desc.pir);
+}
+
 static void vmx_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)
 {
 	if (!kvm_vcpu_apicv_active(vcpu))
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 4/6] KVM: vmx: move sync_pir_to_irr from apic_find_highest_irr to callers
  2016-12-19 16:17 [PATCH v2 0/6] KVM: x86: cleanup and speedup for APICv Paolo Bonzini
                   ` (2 preceding siblings ...)
  2016-12-19 16:17 ` [PATCH 3/6] KVM: x86: preparatory changes for APICv cleanups Paolo Bonzini
@ 2016-12-19 16:17 ` Paolo Bonzini
  2016-12-19 16:17 ` [PATCH 5/6] KVM: x86: do not scan IRR twice on APICv vmentry Paolo Bonzini
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 25+ messages in thread
From: Paolo Bonzini @ 2016-12-19 16:17 UTC (permalink / raw)
  To: linux-kernel, kvm

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/lapic.c | 7 ++++---
 arch/x86/kvm/x86.c   | 4 +++-
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 4c76c602576e..4af0e105caad 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -410,8 +410,6 @@ static inline int apic_find_highest_irr(struct kvm_lapic *apic)
 	if (!apic->irr_pending)
 		return -1;
 
-	if (apic->vcpu->arch.apicv_active)
-		kvm_x86_ops->sync_pir_to_irr(apic->vcpu);
 	result = apic_search_irr(apic);
 	ASSERT(result == -1 || result >= 16);
 
@@ -581,7 +579,10 @@ static void pv_eoi_clr_pending(struct kvm_vcpu *vcpu)
 
 static int apic_has_interrupt_for_ppr(struct kvm_lapic *apic, u32 ppr)
 {
-	int highest_irr = apic_find_highest_irr(apic);
+	int highest_irr;
+	if (apic->vcpu->arch.apicv_active)
+		kvm_x86_ops->sync_pir_to_irr(apic->vcpu);
+	highest_irr = apic_find_highest_irr(apic);
 	if (highest_irr == -1 || (highest_irr & 0xF0) <= ppr)
 		return -1;
 	return highest_irr;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 32e5f54a8eba..641e9309ee08 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6719,9 +6719,11 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 		 * Update architecture specific hints for APIC
 		 * virtual interrupt delivery.
 		 */
-		if (vcpu->arch.apicv_active)
+		if (vcpu->arch.apicv_active) {
+			kvm_x86_ops->sync_pir_to_irr(vcpu);
 			kvm_x86_ops->hwapic_irr_update(vcpu,
 				kvm_lapic_find_highest_irr(vcpu));
+		}
 	}
 
 	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 5/6] KVM: x86: do not scan IRR twice on APICv vmentry
  2016-12-19 16:17 [PATCH v2 0/6] KVM: x86: cleanup and speedup for APICv Paolo Bonzini
                   ` (3 preceding siblings ...)
  2016-12-19 16:17 ` [PATCH 4/6] KVM: vmx: move sync_pir_to_irr from apic_find_highest_irr to callers Paolo Bonzini
@ 2016-12-19 16:17 ` Paolo Bonzini
  2017-02-07 20:19   ` Radim Krčmář
  2016-12-19 16:17 ` [PATCH 6/6] kvm: x86: do not use KVM_REQ_EVENT for APICv interrupt injection Paolo Bonzini
  2017-02-07 17:23 ` [PATCH v2 0/6] KVM: x86: cleanup and speedup for APICv Paolo Bonzini
  6 siblings, 1 reply; 25+ messages in thread
From: Paolo Bonzini @ 2016-12-19 16:17 UTC (permalink / raw)
  To: linux-kernel, kvm

Calls to apic_find_highest_irr are scanning IRR twice, once
in vmx_sync_pir_from_irr and once in apic_search_irr.  Change
sync_pir_from_irr to get the new maximum IRR from kvm_apic_update_irr;
now that it does the computation, it can also do the RVI write.

In order to avoid complications in svm.c, make the callback optional.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |  2 +-
 arch/x86/kvm/lapic.c            |  8 +++++---
 arch/x86/kvm/svm.c              |  6 ------
 arch/x86/kvm/vmx.c              | 30 ++++++++++++++++++------------
 arch/x86/kvm/x86.c              |  7 ++-----
 5 files changed, 26 insertions(+), 27 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4b820b2b6bfa..b182b0ee0997 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -965,7 +965,7 @@ struct kvm_x86_ops {
 	void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
 	void (*set_apic_access_page_addr)(struct kvm_vcpu *vcpu, hpa_t hpa);
 	void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
-	void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
+	int (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
 	int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
 	int (*get_tdp_level)(void);
 	u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 4af0e105caad..f644dd1dbe71 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -515,6 +515,7 @@ int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu)
 	 */
 	return apic_find_highest_irr(vcpu->arch.apic);
 }
+EXPORT_SYMBOL_GPL(kvm_lapic_find_highest_irr);
 
 static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
 			     int vector, int level, int trig_mode,
@@ -580,9 +581,10 @@ static void pv_eoi_clr_pending(struct kvm_vcpu *vcpu)
 static int apic_has_interrupt_for_ppr(struct kvm_lapic *apic, u32 ppr)
 {
 	int highest_irr;
-	if (apic->vcpu->arch.apicv_active)
-		kvm_x86_ops->sync_pir_to_irr(apic->vcpu);
-	highest_irr = apic_find_highest_irr(apic);
+	if (kvm_x86_ops->sync_pir_to_irr)
+		highest_irr = kvm_x86_ops->sync_pir_to_irr(apic->vcpu);
+	else
+		highest_irr = apic_find_highest_irr(apic);
 	if (highest_irr == -1 || (highest_irr & 0xF0) <= ppr)
 		return -1;
 	return highest_irr;
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 08a4d3ab3455..dc4b8834caee 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -4357,11 +4357,6 @@ static void svm_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)
 	return;
 }
 
-static void svm_sync_pir_to_irr(struct kvm_vcpu *vcpu)
-{
-	return;
-}
-
 static void svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
 {
 	kvm_lapic_set_irr(vec, vcpu->arch.apic);
@@ -5371,7 +5366,6 @@ static inline void avic_post_state_restore(struct kvm_vcpu *vcpu)
 	.get_enable_apicv = svm_get_enable_apicv,
 	.refresh_apicv_exec_ctrl = svm_refresh_apicv_exec_ctrl,
 	.load_eoi_exitmap = svm_load_eoi_exitmap,
-	.sync_pir_to_irr = svm_sync_pir_to_irr,
 	.hwapic_irr_update = svm_hwapic_irr_update,
 	.hwapic_isr_update = svm_hwapic_isr_update,
 	.apicv_post_state_restore = avic_post_state_restore,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index eab8ab023705..27e40b180242 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -6660,8 +6660,10 @@ static __init int hardware_setup(void)
 	if (!cpu_has_vmx_ple())
 		ple_gap = 0;
 
-	if (!cpu_has_vmx_apicv())
+	if (!cpu_has_vmx_apicv()) {
 		enable_apicv = 0;
+		kvm_x86_ops->sync_pir_to_irr = NULL;
+	}
 
 	if (cpu_has_vmx_tsc_scaling()) {
 		kvm_has_tsc_control = true;
@@ -8734,20 +8736,24 @@ static void vmx_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr)
 	}
 }
 
-static void vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
+static int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
+	int max_irr;
 
-	if (!pi_test_on(&vmx->pi_desc))
-		return;
-
-	pi_clear_on(&vmx->pi_desc);
-	/*
-	 * IOMMU can write to PIR.ON, so the barrier matters even on UP.
-	 * But on x86 this is just a compiler barrier anyway.
-	 */
-	smp_mb__after_atomic();
-	kvm_apic_update_irr(vcpu, vmx->pi_desc.pir);
+	if (vcpu->arch.apicv_active && pi_test_on(&vmx->pi_desc)) {
+		pi_clear_on(&vmx->pi_desc);
+		/*
+		 * IOMMU can write to PIR.ON, so the barrier matters even on UP.
+		 * But on x86 this is just a compiler barrier anyway.
+		 */
+		smp_mb__after_atomic();
+		max_irr = kvm_apic_update_irr(vcpu, vmx->pi_desc.pir);
+	} else {
+		max_irr = kvm_lapic_find_highest_irr(vcpu);
+	}
+	vmx_hwapic_irr_update(vcpu, max_irr);
+	return max_irr;
 }
 
 static void vmx_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 641e9309ee08..c666414adc1d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2871,7 +2871,7 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 static int kvm_vcpu_ioctl_get_lapic(struct kvm_vcpu *vcpu,
 				    struct kvm_lapic_state *s)
 {
-	if (vcpu->arch.apicv_active)
+	if (kvm_x86_ops->sync_pir_to_irr)
 		kvm_x86_ops->sync_pir_to_irr(vcpu);
 
 	return kvm_apic_get_state(vcpu, s);
@@ -6719,11 +6719,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 		 * Update architecture specific hints for APIC
 		 * virtual interrupt delivery.
 		 */
-		if (vcpu->arch.apicv_active) {
+		if (kvm_x86_ops->sync_pir_to_irr)
 			kvm_x86_ops->sync_pir_to_irr(vcpu);
-			kvm_x86_ops->hwapic_irr_update(vcpu,
-				kvm_lapic_find_highest_irr(vcpu));
-		}
 	}
 
 	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 6/6] kvm: x86: do not use KVM_REQ_EVENT for APICv interrupt injection
  2016-12-19 16:17 [PATCH v2 0/6] KVM: x86: cleanup and speedup for APICv Paolo Bonzini
                   ` (4 preceding siblings ...)
  2016-12-19 16:17 ` [PATCH 5/6] KVM: x86: do not scan IRR twice on APICv vmentry Paolo Bonzini
@ 2016-12-19 16:17 ` Paolo Bonzini
  2017-02-07 19:58   ` Radim Krčmář
  2017-03-09  1:23   ` Wanpeng Li
  2017-02-07 17:23 ` [PATCH v2 0/6] KVM: x86: cleanup and speedup for APICv Paolo Bonzini
  6 siblings, 2 replies; 25+ messages in thread
From: Paolo Bonzini @ 2016-12-19 16:17 UTC (permalink / raw)
  To: linux-kernel, kvm

Since bf9f6ac8d749 ("KVM: Update Posted-Interrupts Descriptor when vCPU
is blocked", 2015-09-18) the posted interrupt descriptor is checked
unconditionally for PIR.ON.  Therefore we don't need KVM_REQ_EVENT to
trigger the scan and, if NMIs or SMIs are not involved, we can avoid
the complicated event injection path.

Calling kvm_vcpu_kick if PIR.ON=1 is also useless, though it has been
there since APICv was introduced.

However, without the KVM_REQ_EVENT safety net KVM needs to be much
more careful about races between vmx_deliver_posted_interrupt and
vcpu_enter_guest.  First, the IPI for posted interrupts may be issued
between setting vcpu->mode = IN_GUEST_MODE and disabling interrupts.
If that happens, kvm_trigger_posted_interrupt returns true, but
smp_kvm_posted_intr_ipi doesn't do anything about it.  The guest is
entered with PIR.ON, but the posted interrupt IPI has not been sent
and the interrupt is only delivered to the guest on the next vmentry
(if any).  To fix this, disable interrupts before setting vcpu->mode.
This ensures that the IPI is delayed until the guest enters non-root mode;
it is then trapped by the processor causing the interrupt to be injected.

Second, the IPI may be issued between

                        kvm_x86_ops->hwapic_irr_update(vcpu,
                                kvm_lapic_find_highest_irr(vcpu));

and vcpu->mode = IN_GUEST_MODE.  In this case, kvm_vcpu_kick is called
but it (correctly) doesn't do anything because it sees vcpu->mode ==
OUTSIDE_GUEST_MODE.  Again, the guest is entered with PIR.ON but no
posted interrupt IPI is pending; this time, the fix for this is to move
the RVI update after IN_GUEST_MODE.

Both issues were previously masked by the liberal usage of KVM_REQ_EVENT.
In both race scenarios KVM_REQ_EVENT would cancel guest entry, resulting
in another vmentry which would inject the interrupt.

This saves about 300 cycles on the self_ipi_* tests of vmexit.flat.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/lapic.c | 11 ++++-------
 arch/x86/kvm/vmx.c   |  8 +++++---
 arch/x86/kvm/x86.c   | 44 +++++++++++++++++++++++++-------------------
 3 files changed, 34 insertions(+), 29 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index f644dd1dbe71..5ea94b622e88 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -385,12 +385,8 @@ int __kvm_apic_update_irr(u32 *pir, void *regs)
 int kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir)
 {
 	struct kvm_lapic *apic = vcpu->arch.apic;
-	int max_irr;
 
-	max_irr = __kvm_apic_update_irr(pir, apic->regs);
-
-	kvm_make_request(KVM_REQ_EVENT, vcpu);
-	return max_irr;
+	return __kvm_apic_update_irr(pir, apic->regs);
 }
 EXPORT_SYMBOL_GPL(kvm_apic_update_irr);
 
@@ -423,9 +419,10 @@ static inline void apic_clear_irr(int vec, struct kvm_lapic *apic)
 	vcpu = apic->vcpu;
 
 	if (unlikely(vcpu->arch.apicv_active)) {
-		/* try to update RVI */
+		/* need to update RVI */
 		apic_clear_vector(vec, apic->regs + APIC_IRR);
-		kvm_make_request(KVM_REQ_EVENT, vcpu);
+		kvm_x86_ops->hwapic_irr_update(vcpu,
+				apic_find_highest_irr(apic));
 	} else {
 		apic->irr_pending = false;
 		apic_clear_vector(vec, apic->regs + APIC_IRR);
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 27e40b180242..3dd4fad35a3e 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5062,9 +5062,11 @@ static void vmx_deliver_posted_interrupt(struct kvm_vcpu *vcpu, int vector)
 	if (pi_test_and_set_pir(vector, &vmx->pi_desc))
 		return;
 
-	r = pi_test_and_set_on(&vmx->pi_desc);
-	kvm_make_request(KVM_REQ_EVENT, vcpu);
-	if (r || !kvm_vcpu_trigger_posted_interrupt(vcpu))
+	/* If a previous notification has sent the IPI, nothing to do.  */
+	if (pi_test_and_set_on(&vmx->pi_desc))
+		return;
+
+	if (!kvm_vcpu_trigger_posted_interrupt(vcpu))
 		kvm_vcpu_kick(vcpu);
 }
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c666414adc1d..725473ba6dd3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6710,19 +6710,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 			kvm_hv_process_stimers(vcpu);
 	}
 
-	/*
-	 * KVM_REQ_EVENT is not set when posted interrupts are set by
-	 * VT-d hardware, so we have to update RVI unconditionally.
-	 */
-	if (kvm_lapic_enabled(vcpu)) {
-		/*
-		 * Update architecture specific hints for APIC
-		 * virtual interrupt delivery.
-		 */
-		if (kvm_x86_ops->sync_pir_to_irr)
-			kvm_x86_ops->sync_pir_to_irr(vcpu);
-	}
-
 	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
 		++vcpu->stat.req_event;
 		kvm_apic_accept_events(vcpu);
@@ -6767,20 +6754,39 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 	kvm_x86_ops->prepare_guest_switch(vcpu);
 	if (vcpu->fpu_active)
 		kvm_load_guest_fpu(vcpu);
+
+	/*
+	 * Disabling IRQs before setting IN_GUEST_MODE.  Posted interrupt
+	 * IPI are then delayed after guest entry, which ensures that they
+	 * result in virtual interrupt delivery.
+	 */
+	local_irq_disable();
 	vcpu->mode = IN_GUEST_MODE;
 
 	srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
 
 	/*
-	 * We should set ->mode before check ->requests,
-	 * Please see the comment in kvm_make_all_cpus_request.
-	 * This also orders the write to mode from any reads
-	 * to the page tables done while the VCPU is running.
-	 * Please see the comment in kvm_flush_remote_tlbs.
+	 * 1) We should set ->mode before checking ->requests.  Please see
+	 * the comment in kvm_make_all_cpus_request.
+	 *
+	 * 2) For APICv, we should set ->mode before checking PIR.ON.  This
+	 * pairs with the memory barrier implicit in pi_test_and_set_on
+	 * (see vmx_deliver_posted_interrupt).
+	 *
+	 * 3) This also orders the write to mode from any reads to the page
+	 * tables done while the VCPU is running.  Please see the comment
+	 * in kvm_flush_remote_tlbs.
 	 */
 	smp_mb__after_srcu_read_unlock();
 
-	local_irq_disable();
+	if (kvm_lapic_enabled(vcpu)) {
+		/*
+		 * This handles the case where a posted interrupt was
+		 * notified with kvm_vcpu_kick.
+		 */
+		if (kvm_x86_ops->sync_pir_to_irr)
+			kvm_x86_ops->sync_pir_to_irr(vcpu);
+	}
 
 	if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests
 	    || need_resched() || signal_pending(current)) {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 0/6] KVM: x86: cleanup and speedup for APICv
  2016-12-19 16:17 [PATCH v2 0/6] KVM: x86: cleanup and speedup for APICv Paolo Bonzini
                   ` (5 preceding siblings ...)
  2016-12-19 16:17 ` [PATCH 6/6] kvm: x86: do not use KVM_REQ_EVENT for APICv interrupt injection Paolo Bonzini
@ 2017-02-07 17:23 ` Paolo Bonzini
  2017-02-07 21:52   ` Radim Krčmář
  6 siblings, 1 reply; 25+ messages in thread
From: Paolo Bonzini @ 2017-02-07 17:23 UTC (permalink / raw)
  To: linux-kernel, kvm



On 19/12/2016 17:17, Paolo Bonzini wrote:
> These are the fixed patches that now pass vmx.flat.  The issue in
> "KVM: x86: do not scan IRR twice on APICv vmentry" was that,
> in v1 of the patches, kvm_cpu_get_interrupt did not scan IRR
> at all if it found PIR.ON=0.  This is now fixed in patches 4 and 5.
> 
> Another latent bug has to be fixed in patch 2.  The patch also
> highlights the maze that check_nested_events has become.  Fixing
> that is left for later.
> 
> The patches are on top of the (non-APICv-related) KVM_REQ_EVENT
> patches from this morning.  According to kvm-unit-tests, patch 6 speeds
> up self-IPIs (if not done through the accelerated self-IPI MSR) by 5-10%.
> 
> Paolo

Ping?

Paolo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/6] KVM: vmx: clear pending interrupts on KVM_SET_LAPIC
  2016-12-19 16:17 ` [PATCH 1/6] KVM: vmx: clear pending interrupts on KVM_SET_LAPIC Paolo Bonzini
@ 2017-02-07 17:42   ` Radim Krčmář
  0 siblings, 0 replies; 25+ messages in thread
From: Radim Krčmář @ 2017-02-07 17:42 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm

2016-12-19 17:17+0100, Paolo Bonzini:
> Pending interrupts might be in the PI descriptor when the
> LAPIC is restored from an external state; we do not want
> them to be injected.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---

Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>

>  arch/x86/kvm/lapic.c | 3 +--
>  arch/x86/kvm/vmx.c   | 9 +++++++++
>  2 files changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 4dc02482faf7..5fa546e27b7e 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -2185,8 +2185,7 @@ int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s)
>  				1 : count_vectors(apic->regs + APIC_ISR);
>  	apic->highest_isr_cache = -1;
>  	if (vcpu->arch.apicv_active) {
> -		if (kvm_x86_ops->apicv_post_state_restore)
> -			kvm_x86_ops->apicv_post_state_restore(vcpu);
> +		kvm_x86_ops->apicv_post_state_restore(vcpu);
>  		kvm_x86_ops->hwapic_irr_update(vcpu,
>  				apic_find_highest_irr(apic));
>  		kvm_x86_ops->hwapic_isr_update(vcpu,
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 1ba7aaf6ae7f..661956caf162 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -8761,6 +8761,14 @@ static void vmx_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)
>  	vmcs_write64(EOI_EXIT_BITMAP3, eoi_exit_bitmap[3]);
>  }
>  
> +static void vmx_apicv_post_state_restore(struct kvm_vcpu *vcpu)
> +{
> +	struct vcpu_vmx *vmx = to_vmx(vcpu);
> +
> +	pi_clear_on(&vmx->pi_desc);
> +	memset(vmx->pi_desc.pir, 0, sizeof(vmx->pi_desc.pir));
> +}
> +
>  static void vmx_complete_atomic_exit(struct vcpu_vmx *vmx)
>  {
>  	u32 exit_intr_info;
> @@ -11590,6 +11598,7 @@ static void vmx_setup_mce(struct kvm_vcpu *vcpu)
>  	.get_enable_apicv = vmx_get_enable_apicv,
>  	.refresh_apicv_exec_ctrl = vmx_refresh_apicv_exec_ctrl,
>  	.load_eoi_exitmap = vmx_load_eoi_exitmap,
> +	.apicv_post_state_restore = vmx_apicv_post_state_restore,
>  	.hwapic_irr_update = vmx_hwapic_irr_update,
>  	.hwapic_isr_update = vmx_hwapic_isr_update,
>  	.sync_pir_to_irr = vmx_sync_pir_to_irr,
> -- 
> 1.8.3.1
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/6] kvm: nVMX: move nested events check to kvm_vcpu_running
  2016-12-19 16:17 ` [PATCH 2/6] kvm: nVMX: move nested events check to kvm_vcpu_running Paolo Bonzini
@ 2017-02-07 18:16   ` Radim Krčmář
  0 siblings, 0 replies; 25+ messages in thread
From: Radim Krčmář @ 2017-02-07 18:16 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm

2016-12-19 17:17+0100, Paolo Bonzini:
> vcpu_run calls kvm_vcpu_running, not kvm_arch_vcpu_runnable,
> and the former does not call check_nested_events.
> 
> Once KVM_REQ_EVENT is removed from the APICv interrupt injection
> path, however, this would leave no place to trigger a vmexit
> from L2 to L1, causing a missed interrupt delivery while in guest
> mode.  This is caught by the "ack interrupt on exit" test in
> vmx.flat.
> 
> [This does not change the calls to check_nested_events in
>  inject_pending_event.  That is material for a separate cleanup.]
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---

I think we should now remove the call to check_nested_events() from
inject_pending_event() and we could also call it in vcpu_enter_guest()
directly.

Still,

Reviewed-by: Radim Krčmář <rkrcmar@redhat.com

>  arch/x86/kvm/x86.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index d5a5fc2f8758..32e5f54a8eba 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -6920,6 +6920,9 @@ static inline int vcpu_block(struct kvm *kvm, struct kvm_vcpu *vcpu)
>  
>  static inline bool kvm_vcpu_running(struct kvm_vcpu *vcpu)
>  {
> +	if (is_guest_mode(vcpu) && kvm_x86_ops->check_nested_events)
> +		kvm_x86_ops->check_nested_events(vcpu, false);
> +
>  	return (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE &&
>  		!vcpu->arch.apf.halted);
>  }
> @@ -8286,9 +8289,6 @@ static inline bool kvm_vcpu_has_events(struct kvm_vcpu *vcpu)
>  
>  int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
>  {
> -	if (is_guest_mode(vcpu) && kvm_x86_ops->check_nested_events)
> -		kvm_x86_ops->check_nested_events(vcpu, false);
> -
>  	return kvm_vcpu_running(vcpu) || kvm_vcpu_has_events(vcpu);
>  }
>  
> -- 
> 1.8.3.1
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/6] KVM: x86: preparatory changes for APICv cleanups
  2016-12-19 16:17 ` [PATCH 3/6] KVM: x86: preparatory changes for APICv cleanups Paolo Bonzini
@ 2017-02-07 18:20   ` Radim Krčmář
  0 siblings, 0 replies; 25+ messages in thread
From: Radim Krčmář @ 2017-02-07 18:20 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm

2016-12-19 17:17+0100, Paolo Bonzini:
> Add return value to __kvm_apic_update_irr/kvm_apic_update_irr.
> Move vmx_sync_pir_to_irr around.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---

Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>

>  arch/x86/kvm/lapic.c | 25 +++++++++++++++++--------
>  arch/x86/kvm/lapic.h |  4 ++--
>  arch/x86/kvm/vmx.c   | 32 ++++++++++++++++----------------
>  3 files changed, 35 insertions(+), 26 deletions(-)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 5fa546e27b7e..4c76c602576e 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -341,7 +341,7 @@ static int find_highest_vector(void *bitmap)
>  	     vec >= 0; vec -= APIC_VECTORS_PER_REG) {
>  		reg = bitmap + REG_POS(vec);
>  		if (*reg)
> -			return fls(*reg) - 1 + vec;
> +			return __fls(*reg) + vec;
>  	}
>  
>  	return -1;
> @@ -361,27 +361,36 @@ static u8 count_vectors(void *bitmap)
>  	return count;
>  }
>  
> -void __kvm_apic_update_irr(u32 *pir, void *regs)
> +int __kvm_apic_update_irr(u32 *pir, void *regs)
>  {
> -	u32 i, pir_val;
> +	u32 i, vec;
> +	u32 pir_val, irr_val;
> +	int max_irr = -1;
>  
> -	for (i = 0; i <= 7; i++) {
> +	for (i = vec = 0; i <= 7; i++, vec += 32) {
>  		pir_val = READ_ONCE(pir[i]);
> +		irr_val = *((u32 *)(regs + APIC_IRR + i * 0x10));
>  		if (pir_val) {
> -			pir_val = xchg(&pir[i], 0);
> -			*((u32 *)(regs + APIC_IRR + i * 0x10)) |= pir_val;
> +			irr_val |= xchg(&pir[i], 0);
> +			*((u32 *)(regs + APIC_IRR + i * 0x10)) = irr_val;
>  		}
> +		if (irr_val)
> +			max_irr = __fls(irr_val) + vec;
>  	}
> +
> +	return max_irr;
>  }
>  EXPORT_SYMBOL_GPL(__kvm_apic_update_irr);
>  
> -void kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir)
> +int kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir)
>  {
>  	struct kvm_lapic *apic = vcpu->arch.apic;
> +	int max_irr;
>  
> -	__kvm_apic_update_irr(pir, apic->regs);
> +	max_irr = __kvm_apic_update_irr(pir, apic->regs);
>  
>  	kvm_make_request(KVM_REQ_EVENT, vcpu);
> +	return max_irr;
>  }
>  EXPORT_SYMBOL_GPL(kvm_apic_update_irr);
>  
> diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
> index 5b5b1ba644cb..8aa54fdc43d1 100644
> --- a/arch/x86/kvm/lapic.h
> +++ b/arch/x86/kvm/lapic.h
> @@ -71,8 +71,8 @@ int kvm_lapic_reg_read(struct kvm_lapic *apic, u32 offset, int len,
>  bool kvm_apic_match_dest(struct kvm_vcpu *vcpu, struct kvm_lapic *source,
>  			   int short_hand, unsigned int dest, int dest_mode);
>  
> -void __kvm_apic_update_irr(u32 *pir, void *regs);
> -void kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir);
> +int __kvm_apic_update_irr(u32 *pir, void *regs);
> +int kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir);
>  void kvm_apic_update_ppr(struct kvm_vcpu *vcpu);
>  int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq,
>  		     struct dest_map *dest_map);
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 661956caf162..eab8ab023705 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -5068,22 +5068,6 @@ static void vmx_deliver_posted_interrupt(struct kvm_vcpu *vcpu, int vector)
>  		kvm_vcpu_kick(vcpu);
>  }
>  
> -static void vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
> -{
> -	struct vcpu_vmx *vmx = to_vmx(vcpu);
> -
> -	if (!pi_test_on(&vmx->pi_desc))
> -		return;
> -
> -	pi_clear_on(&vmx->pi_desc);
> -	/*
> -	 * IOMMU can write to PIR.ON, so the barrier matters even on UP.
> -	 * But on x86 this is just a compiler barrier anyway.
> -	 */
> -	smp_mb__after_atomic();
> -	kvm_apic_update_irr(vcpu, vmx->pi_desc.pir);
> -}
> -
>  /*
>   * Set up the vmcs's constant host-state fields, i.e., host-state fields that
>   * will not change in the lifetime of the guest.
> @@ -8750,6 +8734,22 @@ static void vmx_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr)
>  	}
>  }
>  
> +static void vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
> +{
> +	struct vcpu_vmx *vmx = to_vmx(vcpu);
> +
> +	if (!pi_test_on(&vmx->pi_desc))
> +		return;
> +
> +	pi_clear_on(&vmx->pi_desc);
> +	/*
> +	 * IOMMU can write to PIR.ON, so the barrier matters even on UP.
> +	 * But on x86 this is just a compiler barrier anyway.
> +	 */
> +	smp_mb__after_atomic();
> +	kvm_apic_update_irr(vcpu, vmx->pi_desc.pir);
> +}
> +
>  static void vmx_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)
>  {
>  	if (!kvm_vcpu_apicv_active(vcpu))
> -- 
> 1.8.3.1
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 6/6] kvm: x86: do not use KVM_REQ_EVENT for APICv interrupt injection
  2016-12-19 16:17 ` [PATCH 6/6] kvm: x86: do not use KVM_REQ_EVENT for APICv interrupt injection Paolo Bonzini
@ 2017-02-07 19:58   ` Radim Krčmář
  2017-02-08 16:23     ` Paolo Bonzini
  2017-03-09  1:23   ` Wanpeng Li
  1 sibling, 1 reply; 25+ messages in thread
From: Radim Krčmář @ 2017-02-07 19:58 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm

2016-12-19 17:17+0100, Paolo Bonzini:
> Since bf9f6ac8d749 ("KVM: Update Posted-Interrupts Descriptor when vCPU
> is blocked", 2015-09-18) the posted interrupt descriptor is checked
> unconditionally for PIR.ON.  Therefore we don't need KVM_REQ_EVENT to
> trigger the scan and, if NMIs or SMIs are not involved, we can avoid
> the complicated event injection path.
> 
> Calling kvm_vcpu_kick if PIR.ON=1 is also useless, though it has been
> there since APICv was introduced.
> 
> However, without the KVM_REQ_EVENT safety net KVM needs to be much
> more careful about races between vmx_deliver_posted_interrupt and
> vcpu_enter_guest.  First, the IPI for posted interrupts may be issued
> between setting vcpu->mode = IN_GUEST_MODE and disabling interrupts.
> If that happens, kvm_trigger_posted_interrupt returns true, but
> smp_kvm_posted_intr_ipi doesn't do anything about it.  The guest is
> entered with PIR.ON, but the posted interrupt IPI has not been sent
> and the interrupt is only delivered to the guest on the next vmentry
> (if any).  To fix this, disable interrupts before setting vcpu->mode.
> This ensures that the IPI is delayed until the guest enters non-root mode;
> it is then trapped by the processor causing the interrupt to be injected.
> 
> Second, the IPI may be issued between
> 
>                         kvm_x86_ops->hwapic_irr_update(vcpu,
>                                 kvm_lapic_find_highest_irr(vcpu));
> 
> and vcpu->mode = IN_GUEST_MODE.  In this case, kvm_vcpu_kick is called
> but it (correctly) doesn't do anything because it sees vcpu->mode ==
> OUTSIDE_GUEST_MODE.  Again, the guest is entered with PIR.ON but no
> posted interrupt IPI is pending; this time, the fix for this is to move
> the RVI update after IN_GUEST_MODE.
> 
> Both issues were previously masked by the liberal usage of KVM_REQ_EVENT.
> In both race scenarios KVM_REQ_EVENT would cancel guest entry, resulting
> in another vmentry which would inject the interrupt.
> 
> This saves about 300 cycles on the self_ipi_* tests of vmexit.flat.

Please mention that this also fixes an existing problem with posted
interrupts from devices.  If we didn't check PIR.ON after disabling host
interrupts, we might delay delivery to the next VM exit or posted
interrupt.  (It was recently posted.)

> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> @@ -6767,20 +6754,39 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  	kvm_x86_ops->prepare_guest_switch(vcpu);
>  	if (vcpu->fpu_active)
>  		kvm_load_guest_fpu(vcpu);
> +
> +	/*
> +	 * Disabling IRQs before setting IN_GUEST_MODE.  Posted interrupt
> +	 * IPI are then delayed after guest entry, which ensures that they
> +	 * result in virtual interrupt delivery.
> +	 */
> +	local_irq_disable();
>  	vcpu->mode = IN_GUEST_MODE;
>  
>  	srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
>  
>  	/*
> -	 * We should set ->mode before check ->requests,
> -	 * Please see the comment in kvm_make_all_cpus_request.
> -	 * This also orders the write to mode from any reads
> -	 * to the page tables done while the VCPU is running.
> -	 * Please see the comment in kvm_flush_remote_tlbs.
> +	 * 1) We should set ->mode before checking ->requests.  Please see
> +	 * the comment in kvm_make_all_cpus_request.
> +	 *
> +	 * 2) For APICv, we should set ->mode before checking PIR.ON.  This
> +	 * pairs with the memory barrier implicit in pi_test_and_set_on
> +	 * (see vmx_deliver_posted_interrupt).
> +	 *
> +	 * 3) This also orders the write to mode from any reads to the page
> +	 * tables done while the VCPU is running.  Please see the comment
> +	 * in kvm_flush_remote_tlbs.
>  	 */
>  	smp_mb__after_srcu_read_unlock();
> -	local_irq_disable();
> +	if (kvm_lapic_enabled(vcpu)) {
> +		/*
> +		 * This handles the case where a posted interrupt was
> +		 * notified with kvm_vcpu_kick.
> +		 */
> +		if (kvm_x86_ops->sync_pir_to_irr)
> +			kvm_x86_ops->sync_pir_to_irr(vcpu);

Hm, this is not working well when nesting while L1 has assigned devices:
if the posted interrupt arrives just before local_irq_disable(), then
we'll just enter L2 instead of doing a nested VM exit (in case we have
interrupt exiting).

And after reading the code a bit, I think we allow posted interrupts in
L2 while L1 has assigned devices that use posted interrupts, and that it
doesn't work.

Am I missing something?

Thanks.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/6] KVM: x86: do not scan IRR twice on APICv vmentry
  2016-12-19 16:17 ` [PATCH 5/6] KVM: x86: do not scan IRR twice on APICv vmentry Paolo Bonzini
@ 2017-02-07 20:19   ` Radim Krčmář
  2017-02-07 21:49     ` Radim Krčmář
  2017-02-08 14:10     ` Paolo Bonzini
  0 siblings, 2 replies; 25+ messages in thread
From: Radim Krčmář @ 2017-02-07 20:19 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm

2016-12-19 17:17+0100, Paolo Bonzini:
> Calls to apic_find_highest_irr are scanning IRR twice, once
> in vmx_sync_pir_from_irr and once in apic_search_irr.  Change
> sync_pir_from_irr to get the new maximum IRR from kvm_apic_update_irr;
> now that it does the computation, it can also do the RVI write.
> 
> In order to avoid complications in svm.c, make the callback optional.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> @@ -8734,20 +8736,24 @@ static void vmx_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr)
>  	}
>  }
>  
> -static void vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
> +static int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
>  {
>  	struct vcpu_vmx *vmx = to_vmx(vcpu);
> +	int max_irr;
>  
> -	if (!pi_test_on(&vmx->pi_desc))
> -		return;
> -
> -	pi_clear_on(&vmx->pi_desc);
> -	/*
> -	 * IOMMU can write to PIR.ON, so the barrier matters even on UP.
> -	 * But on x86 this is just a compiler barrier anyway.
> -	 */
> -	smp_mb__after_atomic();
> -	kvm_apic_update_irr(vcpu, vmx->pi_desc.pir);
> +	if (vcpu->arch.apicv_active && pi_test_on(&vmx->pi_desc)) {
> +		pi_clear_on(&vmx->pi_desc);
> +		/*
> +		 * IOMMU can write to PIR.ON, so the barrier matters even on UP.
> +		 * But on x86 this is just a compiler barrier anyway.
> +		 */
> +		smp_mb__after_atomic();
> +		max_irr = kvm_apic_update_irr(vcpu, vmx->pi_desc.pir);
> +	} else {
> +		max_irr = kvm_lapic_find_highest_irr(vcpu);
> +	}
> +	vmx_hwapic_irr_update(vcpu, max_irr);

Btw. a v1 discussion revolved about the need to have
vmx_hwapic_irr_update() here when the maximal IRR should always be in
RVI, and, uh, I didn't follow up (negligible attention span) ...

There is one place where that doesn't hold: we don't update RVI after a
EXTERNAL_INTERRUPT nested VM exit without VM_EXIT_ACK_INTR_ON_EXIT, but
IRR has likely changed.  Isn't that the problem?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/6] KVM: x86: do not scan IRR twice on APICv vmentry
  2017-02-07 20:19   ` Radim Krčmář
@ 2017-02-07 21:49     ` Radim Krčmář
  2017-02-08 14:10     ` Paolo Bonzini
  1 sibling, 0 replies; 25+ messages in thread
From: Radim Krčmář @ 2017-02-07 21:49 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm

2017-02-07 21:19+0100, Radim Krčmář:
> 2016-12-19 17:17+0100, Paolo Bonzini:
> > Calls to apic_find_highest_irr are scanning IRR twice, once
> > in vmx_sync_pir_from_irr and once in apic_search_irr.  Change
> > sync_pir_from_irr to get the new maximum IRR from kvm_apic_update_irr;
> > now that it does the computation, it can also do the RVI write.
> > 
> > In order to avoid complications in svm.c, make the callback optional.
> > 
> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > ---
> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > @@ -8734,20 +8736,24 @@ static void vmx_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr)
> >  	}
> >  }
> >  
> > -static void vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
> > +static int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
> >  {
> >  	struct vcpu_vmx *vmx = to_vmx(vcpu);
> > +	int max_irr;
> >  
> > -	if (!pi_test_on(&vmx->pi_desc))
> > -		return;
> > -
> > -	pi_clear_on(&vmx->pi_desc);
> > -	/*
> > -	 * IOMMU can write to PIR.ON, so the barrier matters even on UP.
> > -	 * But on x86 this is just a compiler barrier anyway.
> > -	 */
> > -	smp_mb__after_atomic();
> > -	kvm_apic_update_irr(vcpu, vmx->pi_desc.pir);
> > +	if (vcpu->arch.apicv_active && pi_test_on(&vmx->pi_desc)) {
> > +		pi_clear_on(&vmx->pi_desc);
> > +		/*
> > +		 * IOMMU can write to PIR.ON, so the barrier matters even on UP.
> > +		 * But on x86 this is just a compiler barrier anyway.
> > +		 */
> > +		smp_mb__after_atomic();
> > +		max_irr = kvm_apic_update_irr(vcpu, vmx->pi_desc.pir);
> > +	} else {
> > +		max_irr = kvm_lapic_find_highest_irr(vcpu);
> > +	}
> > +	vmx_hwapic_irr_update(vcpu, max_irr);
> 
> Btw. a v1 discussion revolved about the need to have
> vmx_hwapic_irr_update() here when the maximal IRR should always be in
> RVI, and, uh, I didn't follow up (negligible attention span) ...
> 
> There is one place where that doesn't hold: we don't update RVI after a
> EXTERNAL_INTERRUPT nested VM exit without VM_EXIT_ACK_INTR_ON_EXIT, but
> IRR has likely changed.  Isn't that the problem?

The following patch on top of the whole series survives kvm-unit-tests
and very mimimal testing: (Not sure if I have missed something.)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index a1e9cab7d01f..8b98c1681803 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -577,10 +577,10 @@ static void pv_eoi_clr_pending(struct kvm_vcpu *vcpu)
 
 static int apic_has_interrupt_for_ppr(struct kvm_lapic *apic, u32 ppr)
 {
-	int highest_irr;
+	int highest_irr = -1;
 	if (kvm_x86_ops->sync_pir_to_irr)
 		highest_irr = kvm_x86_ops->sync_pir_to_irr(apic->vcpu);
-	else
+	if (highest_irr == -1)
 		highest_irr = apic_find_highest_irr(apic);
 	if (highest_irr == -1 || (highest_irr & 0xF0) <= ppr)
 		return -1;
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 9c8a16edf88d..637c7bd2f3ab 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -8728,18 +8728,18 @@ static int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 	int max_irr;
 
-	if (vcpu->arch.apicv_active && pi_test_on(&vmx->pi_desc)) {
-		pi_clear_on(&vmx->pi_desc);
-		/*
-		 * IOMMU can write to PIR.ON, so the barrier matters even on UP.
-		 * But on x86 this is just a compiler barrier anyway.
-		 */
-		smp_mb__after_atomic();
-		max_irr = kvm_apic_update_irr(vcpu, vmx->pi_desc.pir);
-	} else {
-		max_irr = kvm_lapic_find_highest_irr(vcpu);
-	}
+	if (!vcpu->arch.apicv_active || !pi_test_on(&vmx->pi_desc))
+		return -1;
+
+	pi_clear_on(&vmx->pi_desc);
+	/*
+	 * IOMMU can write to PIR.ON, so the barrier matters even on UP.
+	 * But on x86 this is just a compiler barrier anyway.
+	 */
+	smp_mb__after_atomic();
+	max_irr = kvm_apic_update_irr(vcpu, vmx->pi_desc.pir);
 	vmx_hwapic_irr_update(vcpu, max_irr);
+
 	return max_irr;
 }
 
@@ -11145,6 +11145,7 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason,
 
 	/* in case we halted in L2 */
 	vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
+	kvm_x86_ops->hwapic_irr_update(vcpu, kvm_lapic_find_highest_irr(vcpu));
 }
 
 /*

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 0/6] KVM: x86: cleanup and speedup for APICv
  2017-02-07 17:23 ` [PATCH v2 0/6] KVM: x86: cleanup and speedup for APICv Paolo Bonzini
@ 2017-02-07 21:52   ` Radim Krčmář
  2017-02-08 10:04     ` Paolo Bonzini
  0 siblings, 1 reply; 25+ messages in thread
From: Radim Krčmář @ 2017-02-07 21:52 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm

2017-02-07 18:23+0100, Paolo Bonzini:
> On 19/12/2016 17:17, Paolo Bonzini wrote:
>> These are the fixed patches that now pass vmx.flat.  The issue in
>> "KVM: x86: do not scan IRR twice on APICv vmentry" was that,
>> in v1 of the patches, kvm_cpu_get_interrupt did not scan IRR
>> at all if it found PIR.ON=0.  This is now fixed in patches 4 and 5.
>> 
>> Another latent bug has to be fixed in patch 2.  The patch also
>> highlights the maze that check_nested_events has become.  Fixing
>> that is left for later.
>> 
>> The patches are on top of the (non-APICv-related) KVM_REQ_EVENT
>> patches from this morning.  According to kvm-unit-tests, patch 6 speeds
>> up self-IPIs (if not done through the accelerated self-IPI MSR) by 5-10%.
>> 
>> Paolo
> 
> Ping?

I think the patches are ready to be applied -- there might be some rough
edges with nested, but it was broken even before, and they fix at least
one known bug.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 0/6] KVM: x86: cleanup and speedup for APICv
  2017-02-07 21:52   ` Radim Krčmář
@ 2017-02-08 10:04     ` Paolo Bonzini
  2017-02-08 13:33       ` Radim Krčmář
  0 siblings, 1 reply; 25+ messages in thread
From: Paolo Bonzini @ 2017-02-08 10:04 UTC (permalink / raw)
  To: Radim Krčmář; +Cc: linux-kernel, kvm



On 07/02/2017 22:52, Radim Krčmář wrote:
> there might be some rough
> edges with nested, but it was broken even before, and they fix at least
> one known bug.

I don't think so, nested IRQ injection is tested very well.  Why do you
say it was broken even before?

Paolo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 0/6] KVM: x86: cleanup and speedup for APICv
  2017-02-08 10:04     ` Paolo Bonzini
@ 2017-02-08 13:33       ` Radim Krčmář
  2017-02-08 15:01         ` Paolo Bonzini
  0 siblings, 1 reply; 25+ messages in thread
From: Radim Krčmář @ 2017-02-08 13:33 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm

2017-02-08 11:04+0100, Paolo Bonzini:
> On 07/02/2017 22:52, Radim Krčmář wrote:
>> there might be some rough
>> edges with nested, but it was broken even before, and they fix at least
>> one known bug.
> 
> I don't think so, nested IRQ injection is tested very well.  Why do you
> say it was broken even before?

I was basing on recent bug report where we missed IOMMU injection in a
non-nested case [1].  While we do cover the non-nested case with this
series, I think we still miss posted interrupts from IOMMU that should
trigger a nested VM exit from L2 to L1 -- details in review of [6/6].

1: http://www.spinics.net/lists/kvm/msg144355.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/6] KVM: x86: do not scan IRR twice on APICv vmentry
  2017-02-07 20:19   ` Radim Krčmář
  2017-02-07 21:49     ` Radim Krčmář
@ 2017-02-08 14:10     ` Paolo Bonzini
  2017-02-08 14:24       ` Radim Krčmář
  1 sibling, 1 reply; 25+ messages in thread
From: Paolo Bonzini @ 2017-02-08 14:10 UTC (permalink / raw)
  To: Radim Krčmář; +Cc: linux-kernel, kvm



On 07/02/2017 21:19, Radim Krčmář wrote:
> 2016-12-19 17:17+0100, Paolo Bonzini:
>> Calls to apic_find_highest_irr are scanning IRR twice, once
>> in vmx_sync_pir_from_irr and once in apic_search_irr.  Change
>> sync_pir_from_irr to get the new maximum IRR from kvm_apic_update_irr;
>> now that it does the computation, it can also do the RVI write.
>>
>> In order to avoid complications in svm.c, make the callback optional.
>>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> @@ -8734,20 +8736,24 @@ static void vmx_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr)
>>  	}
>>  }
>>  
>> -static void vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
>> +static int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
>>  {
>>  	struct vcpu_vmx *vmx = to_vmx(vcpu);
>> +	int max_irr;
>>  
>> -	if (!pi_test_on(&vmx->pi_desc))
>> -		return;
>> -
>> -	pi_clear_on(&vmx->pi_desc);
>> -	/*
>> -	 * IOMMU can write to PIR.ON, so the barrier matters even on UP.
>> -	 * But on x86 this is just a compiler barrier anyway.
>> -	 */
>> -	smp_mb__after_atomic();
>> -	kvm_apic_update_irr(vcpu, vmx->pi_desc.pir);
>> +	if (vcpu->arch.apicv_active && pi_test_on(&vmx->pi_desc)) {
>> +		pi_clear_on(&vmx->pi_desc);
>> +		/*
>> +		 * IOMMU can write to PIR.ON, so the barrier matters even on UP.
>> +		 * But on x86 this is just a compiler barrier anyway.
>> +		 */
>> +		smp_mb__after_atomic();
>> +		max_irr = kvm_apic_update_irr(vcpu, vmx->pi_desc.pir);
>> +	} else {
>> +		max_irr = kvm_lapic_find_highest_irr(vcpu);
>> +	}
>> +	vmx_hwapic_irr_update(vcpu, max_irr);
> 
> Btw. a v1 discussion revolved about the need to have
> vmx_hwapic_irr_update() here when the maximal IRR should always be in
> RVI, and, uh, I didn't follow up (negligible attention span) ...
>
> There is one place where that doesn't hold: we don't update RVI after a
> EXTERNAL_INTERRUPT nested VM exit without VM_EXIT_ACK_INTR_ON_EXIT, but
> IRR has likely changed.  Isn't that the problem?

I'm not sure... there shouldn't be any issue with missed RVI updates in
this series, since it does

	if (kvm_lapic_enabled(vcpu)) {
		/*
		 * This handles the case where a posted interrupt was
		 * notified with kvm_vcpu_kick.
		 */
		if (kvm_x86_ops->sync_pir_to_irr)
			kvm_x86_ops->sync_pir_to_irr(vcpu);
	}

on every VM entry (and kvm_lapic_find_highest_irr inside the callback).
That is not something I really like, but it's no worse than what was
there before

                if (vcpu->arch.apicv_active)
                        kvm_x86_ops->hwapic_irr_update(vcpu,
                                kvm_lapic_find_highest_irr(vcpu));
        }

and obviously better than going unnecessarily through KVM_REQ_EVENT
processing.

Paolo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/6] KVM: x86: do not scan IRR twice on APICv vmentry
  2017-02-08 14:10     ` Paolo Bonzini
@ 2017-02-08 14:24       ` Radim Krčmář
  0 siblings, 0 replies; 25+ messages in thread
From: Radim Krčmář @ 2017-02-08 14:24 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm

2017-02-08 15:10+0100, Paolo Bonzini:
> 
> 
> On 07/02/2017 21:19, Radim Krčmář wrote:
> > 2016-12-19 17:17+0100, Paolo Bonzini:
> >> Calls to apic_find_highest_irr are scanning IRR twice, once
> >> in vmx_sync_pir_from_irr and once in apic_search_irr.  Change
> >> sync_pir_from_irr to get the new maximum IRR from kvm_apic_update_irr;
> >> now that it does the computation, it can also do the RVI write.
> >>
> >> In order to avoid complications in svm.c, make the callback optional.
> >>
> >> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> >> ---
> >> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> >> @@ -8734,20 +8736,24 @@ static void vmx_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr)
> >>  	}
> >>  }
> >>  
> >> -static void vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
> >> +static int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
> >>  {
> >>  	struct vcpu_vmx *vmx = to_vmx(vcpu);
> >> +	int max_irr;
> >>  
> >> -	if (!pi_test_on(&vmx->pi_desc))
> >> -		return;
> >> -
> >> -	pi_clear_on(&vmx->pi_desc);
> >> -	/*
> >> -	 * IOMMU can write to PIR.ON, so the barrier matters even on UP.
> >> -	 * But on x86 this is just a compiler barrier anyway.
> >> -	 */
> >> -	smp_mb__after_atomic();
> >> -	kvm_apic_update_irr(vcpu, vmx->pi_desc.pir);
> >> +	if (vcpu->arch.apicv_active && pi_test_on(&vmx->pi_desc)) {
> >> +		pi_clear_on(&vmx->pi_desc);
> >> +		/*
> >> +		 * IOMMU can write to PIR.ON, so the barrier matters even on UP.
> >> +		 * But on x86 this is just a compiler barrier anyway.
> >> +		 */
> >> +		smp_mb__after_atomic();
> >> +		max_irr = kvm_apic_update_irr(vcpu, vmx->pi_desc.pir);
> >> +	} else {
> >> +		max_irr = kvm_lapic_find_highest_irr(vcpu);
> >> +	}
> >> +	vmx_hwapic_irr_update(vcpu, max_irr);
> > 
> > Btw. a v1 discussion revolved about the need to have
> > vmx_hwapic_irr_update() here when the maximal IRR should always be in
> > RVI, and, uh, I didn't follow up (negligible attention span) ...
> >
> > There is one place where that doesn't hold: we don't update RVI after a
> > EXTERNAL_INTERRUPT nested VM exit without VM_EXIT_ACK_INTR_ON_EXIT, but
> > IRR has likely changed.  Isn't that the problem?
> 
> I'm not sure... there shouldn't be any issue with missed RVI updates in
> this series, since it does

> 	if (kvm_lapic_enabled(vcpu)) {
> 		/*
> 		 * This handles the case where a posted interrupt was
> 		 * notified with kvm_vcpu_kick.
> 		 */
> 		if (kvm_x86_ops->sync_pir_to_irr)
> 			kvm_x86_ops->sync_pir_to_irr(vcpu);
> 	}
> 
> on every VM entry (and kvm_lapic_find_highest_irr inside the callback).
> That is not something I really like, but it's no worse than what was
> there before
> 
>                 if (vcpu->arch.apicv_active)
>                         kvm_x86_ops->hwapic_irr_update(vcpu,
>                                 kvm_lapic_find_highest_irr(vcpu));
>         }
> 
> and obviously better than going unnecessarily through KVM_REQ_EVENT
> processing.

I agree.  I wanted to point out that we could get rid of the RVI update
on VM entry when PI.ON is clear, like you originally planned (because
RVI should already be the max IRR).

And the reason why v1 didn't work out was likely in missing the RVI
update on nested VM exit, which forced v2 to have this work-around.
.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 0/6] KVM: x86: cleanup and speedup for APICv
  2017-02-08 13:33       ` Radim Krčmář
@ 2017-02-08 15:01         ` Paolo Bonzini
  0 siblings, 0 replies; 25+ messages in thread
From: Paolo Bonzini @ 2017-02-08 15:01 UTC (permalink / raw)
  To: Radim Krčmář; +Cc: linux-kernel, kvm



On 08/02/2017 14:33, Radim Krčmář wrote:
> 2017-02-08 11:04+0100, Paolo Bonzini:
>> On 07/02/2017 22:52, Radim Krčmář wrote:
>>> there might be some rough
>>> edges with nested, but it was broken even before, and they fix at least
>>> one known bug.
>>
>> I don't think so, nested IRQ injection is tested very well.  Why do you
>> say it was broken even before?
> 
> I was basing on recent bug report where we missed IOMMU injection in a
> non-nested case [1].  While we do cover the non-nested case with this
> series, I think we still miss posted interrupts from IOMMU that should
> trigger a nested VM exit from L2 to L1 -- details in review of [6/6].
> 
> 1: http://www.spinics.net/lists/kvm/msg144355.html

Uh, nice.  Totally missed that.

Paolo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 6/6] kvm: x86: do not use KVM_REQ_EVENT for APICv interrupt injection
  2017-02-07 19:58   ` Radim Krčmář
@ 2017-02-08 16:23     ` Paolo Bonzini
  2017-02-09 15:11       ` Radim Krčmář
  0 siblings, 1 reply; 25+ messages in thread
From: Paolo Bonzini @ 2017-02-08 16:23 UTC (permalink / raw)
  To: Radim Krčmář; +Cc: linux-kernel, kvm, Wu, Feng



On 07/02/2017 20:58, Radim Krčmář wrote:
>> -	local_irq_disable();
>> +	if (kvm_lapic_enabled(vcpu)) {
>> +		/*
>> +		 * This handles the case where a posted interrupt was
>> +		 * notified with kvm_vcpu_kick.
>> +		 */
>> +		if (kvm_x86_ops->sync_pir_to_irr)
>> +			kvm_x86_ops->sync_pir_to_irr(vcpu);
> Hm, this is not working well when nesting while L1 has assigned devices:
> if the posted interrupt arrives just before local_irq_disable(), then
> we'll just enter L2 instead of doing a nested VM exit (in case we have
> interrupt exiting).
> 
> And after reading the code a bit, I think we allow posted interrupts in
> L2 while L1 has assigned devices that use posted interrupts, and that it
> doesn't work.

So you mean the interrupt is delivered to L2?  The fix would be to wrap
L2 entry and exit with some subset of pi_pre_block/pi_post_block.

Paolo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 6/6] kvm: x86: do not use KVM_REQ_EVENT for APICv interrupt injection
  2017-02-08 16:23     ` Paolo Bonzini
@ 2017-02-09 15:11       ` Radim Krčmář
  0 siblings, 0 replies; 25+ messages in thread
From: Radim Krčmář @ 2017-02-09 15:11 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm, Wu, Feng

2017-02-08 17:23+0100, Paolo Bonzini:
> On 07/02/2017 20:58, Radim Krčmář wrote:
>>> -	local_irq_disable();
>>> +	if (kvm_lapic_enabled(vcpu)) {
>>> +		/*
>>> +		 * This handles the case where a posted interrupt was
>>> +		 * notified with kvm_vcpu_kick.
>>> +		 */
>>> +		if (kvm_x86_ops->sync_pir_to_irr)
>>> +			kvm_x86_ops->sync_pir_to_irr(vcpu);
>> Hm, this is not working well when nesting while L1 has assigned devices:
>> if the posted interrupt arrives just before local_irq_disable(), then
>> we'll just enter L2 instead of doing a nested VM exit (in case we have
>> interrupt exiting).
>> 
>> And after reading the code a bit, I think we allow posted interrupts in
>> L2 while L1 has assigned devices that use posted interrupts, and that it
>> doesn't work.
> 
> So you mean the interrupt is delivered to L2?  The fix would be to wrap
> L2 entry and exit with some subset of pi_pre_block/pi_post_block.

I hope not, as their PI strucutres are separate, so we'd be just
delaying the interrupt injection to L1.  The CPU running L2 guest will
notice a posted notification, but its PIR.ON will/might not be set.
L1's PIR.ON will be set, but no-one is going to care until the next VM
exit.

I'll add some unit tests to check that I understood the bug correctly.

Changing the notification vector for L2 would be an ok solution.
We'd reserve a new vector in L0 and check L1's interrupts.  If it were
targetting a VCPU that is currently in L2 with a notification vector
configured for L2, we'd translate that vector into the notification
vector we set for L2 -- L1 could then post interrupts to L2 without a VM
exit.  And "posted" interrupts for L1 while in L2 would trigger a VM
exit, because the notification vector would be different.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 6/6] kvm: x86: do not use KVM_REQ_EVENT for APICv interrupt injection
  2016-12-19 16:17 ` [PATCH 6/6] kvm: x86: do not use KVM_REQ_EVENT for APICv interrupt injection Paolo Bonzini
  2017-02-07 19:58   ` Radim Krčmář
@ 2017-03-09  1:23   ` Wanpeng Li
  2017-03-09  9:40     ` Wanpeng Li
  1 sibling, 1 reply; 25+ messages in thread
From: Wanpeng Li @ 2017-03-09  1:23 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm

2016-12-20 0:17 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
> Since bf9f6ac8d749 ("KVM: Update Posted-Interrupts Descriptor when vCPU
> is blocked", 2015-09-18) the posted interrupt descriptor is checked
> unconditionally for PIR.ON.  Therefore we don't need KVM_REQ_EVENT to
> trigger the scan and, if NMIs or SMIs are not involved, we can avoid
> the complicated event injection path.
>
> Calling kvm_vcpu_kick if PIR.ON=1 is also useless, though it has been
> there since APICv was introduced.
>
> However, without the KVM_REQ_EVENT safety net KVM needs to be much
> more careful about races between vmx_deliver_posted_interrupt and
> vcpu_enter_guest.  First, the IPI for posted interrupts may be issued
> between setting vcpu->mode = IN_GUEST_MODE and disabling interrupts.
> If that happens, kvm_trigger_posted_interrupt returns true, but
> smp_kvm_posted_intr_ipi doesn't do anything about it.  The guest is
> entered with PIR.ON, but the posted interrupt IPI has not been sent
> and the interrupt is only delivered to the guest on the next vmentry
> (if any).  To fix this, disable interrupts before setting vcpu->mode.
> This ensures that the IPI is delayed until the guest enters non-root mode;
> it is then trapped by the processor causing the interrupt to be injected.
>
> Second, the IPI may be issued between
>
>                         kvm_x86_ops->hwapic_irr_update(vcpu,
>                                 kvm_lapic_find_highest_irr(vcpu));
>
> and vcpu->mode = IN_GUEST_MODE.  In this case, kvm_vcpu_kick is called
> but it (correctly) doesn't do anything because it sees vcpu->mode ==
> OUTSIDE_GUEST_MODE.  Again, the guest is entered with PIR.ON but no
> posted interrupt IPI is pending; this time, the fix for this is to move
> the RVI update after IN_GUEST_MODE.
>
> Both issues were previously masked by the liberal usage of KVM_REQ_EVENT.
> In both race scenarios KVM_REQ_EVENT would cancel guest entry, resulting
> in another vmentry which would inject the interrupt.
>
> This saves about 300 cycles on the self_ipi_* tests of vmexit.flat.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/x86/kvm/lapic.c | 11 ++++-------
>  arch/x86/kvm/vmx.c   |  8 +++++---
>  arch/x86/kvm/x86.c   | 44 +++++++++++++++++++++++++-------------------
>  3 files changed, 34 insertions(+), 29 deletions(-)
>
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index f644dd1dbe71..5ea94b622e88 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -385,12 +385,8 @@ int __kvm_apic_update_irr(u32 *pir, void *regs)
>  int kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir)
>  {
>         struct kvm_lapic *apic = vcpu->arch.apic;
> -       int max_irr;
>
> -       max_irr = __kvm_apic_update_irr(pir, apic->regs);
> -
> -       kvm_make_request(KVM_REQ_EVENT, vcpu);
> -       return max_irr;
> +       return __kvm_apic_update_irr(pir, apic->regs);
>  }
>  EXPORT_SYMBOL_GPL(kvm_apic_update_irr);
>
> @@ -423,9 +419,10 @@ static inline void apic_clear_irr(int vec, struct kvm_lapic *apic)
>         vcpu = apic->vcpu;
>
>         if (unlikely(vcpu->arch.apicv_active)) {
> -               /* try to update RVI */
> +               /* need to update RVI */
>                 apic_clear_vector(vec, apic->regs + APIC_IRR);
> -               kvm_make_request(KVM_REQ_EVENT, vcpu);
> +               kvm_x86_ops->hwapic_irr_update(vcpu,
> +                               apic_find_highest_irr(apic));
>         } else {
>                 apic->irr_pending = false;
>                 apic_clear_vector(vec, apic->regs + APIC_IRR);
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 27e40b180242..3dd4fad35a3e 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -5062,9 +5062,11 @@ static void vmx_deliver_posted_interrupt(struct kvm_vcpu *vcpu, int vector)
>         if (pi_test_and_set_pir(vector, &vmx->pi_desc))
>                 return;
>
> -       r = pi_test_and_set_on(&vmx->pi_desc);
> -       kvm_make_request(KVM_REQ_EVENT, vcpu);
> -       if (r || !kvm_vcpu_trigger_posted_interrupt(vcpu))
> +       /* If a previous notification has sent the IPI, nothing to do.  */
> +       if (pi_test_and_set_on(&vmx->pi_desc))
> +               return;
> +
> +       if (!kvm_vcpu_trigger_posted_interrupt(vcpu))
>                 kvm_vcpu_kick(vcpu);
>  }
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index c666414adc1d..725473ba6dd3 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -6710,19 +6710,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>                         kvm_hv_process_stimers(vcpu);
>         }
>
> -       /*
> -        * KVM_REQ_EVENT is not set when posted interrupts are set by
> -        * VT-d hardware, so we have to update RVI unconditionally.
> -        */
> -       if (kvm_lapic_enabled(vcpu)) {
> -               /*
> -                * Update architecture specific hints for APIC
> -                * virtual interrupt delivery.
> -                */
> -               if (kvm_x86_ops->sync_pir_to_irr)
> -                       kvm_x86_ops->sync_pir_to_irr(vcpu);
> -       }
> -
>         if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
>                 ++vcpu->stat.req_event;
>                 kvm_apic_accept_events(vcpu);
> @@ -6767,20 +6754,39 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>         kvm_x86_ops->prepare_guest_switch(vcpu);
>         if (vcpu->fpu_active)
>                 kvm_load_guest_fpu(vcpu);
> +
> +       /*
> +        * Disabling IRQs before setting IN_GUEST_MODE.  Posted interrupt
> +        * IPI are then delayed after guest entry, which ensures that they
> +        * result in virtual interrupt delivery.
> +        */
> +       local_irq_disable();
>         vcpu->mode = IN_GUEST_MODE;
>
>         srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
>
>         /*
> -        * We should set ->mode before check ->requests,
> -        * Please see the comment in kvm_make_all_cpus_request.
> -        * This also orders the write to mode from any reads
> -        * to the page tables done while the VCPU is running.
> -        * Please see the comment in kvm_flush_remote_tlbs.
> +        * 1) We should set ->mode before checking ->requests.  Please see
> +        * the comment in kvm_make_all_cpus_request.
> +        *
> +        * 2) For APICv, we should set ->mode before checking PIR.ON.  This
> +        * pairs with the memory barrier implicit in pi_test_and_set_on
> +        * (see vmx_deliver_posted_interrupt).
> +        *
> +        * 3) This also orders the write to mode from any reads to the page
> +        * tables done while the VCPU is running.  Please see the comment
> +        * in kvm_flush_remote_tlbs.
>          */
>         smp_mb__after_srcu_read_unlock();
>
> -       local_irq_disable();

The local_irq_disable() movement is unnecessary if you move sync_pir_to_irr.

- IPI after vcpu->mode = IN_GUEST_MODE and interrupt disable, PI is
successfully.
- IPI between vcpu->mode = IN_GUEST_MODE and interrupt disable, the
sync_ir_to_irr will catch the PIR and set RVI.

Regards,
Wanpeng Li

> +       if (kvm_lapic_enabled(vcpu)) {
> +               /*
> +                * This handles the case where a posted interrupt was
> +                * notified with kvm_vcpu_kick.
> +                */
> +               if (kvm_x86_ops->sync_pir_to_irr)
> +                       kvm_x86_ops->sync_pir_to_irr(vcpu);
> +       }
>
>         if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests
>             || need_resched() || signal_pending(current)) {
> --
> 1.8.3.1
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 6/6] kvm: x86: do not use KVM_REQ_EVENT for APICv interrupt injection
  2017-03-09  1:23   ` Wanpeng Li
@ 2017-03-09  9:40     ` Wanpeng Li
  2017-03-09 10:03       ` Paolo Bonzini
  0 siblings, 1 reply; 25+ messages in thread
From: Wanpeng Li @ 2017-03-09  9:40 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm

2017-03-09 9:23 GMT+08:00 Wanpeng Li <kernellwp@gmail.com>:
> 2016-12-20 0:17 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
>> Since bf9f6ac8d749 ("KVM: Update Posted-Interrupts Descriptor when vCPU
>> is blocked", 2015-09-18) the posted interrupt descriptor is checked
>> unconditionally for PIR.ON.  Therefore we don't need KVM_REQ_EVENT to
>> trigger the scan and, if NMIs or SMIs are not involved, we can avoid
>> the complicated event injection path.
>>
>> Calling kvm_vcpu_kick if PIR.ON=1 is also useless, though it has been
>> there since APICv was introduced.
>>
>> However, without the KVM_REQ_EVENT safety net KVM needs to be much
>> more careful about races between vmx_deliver_posted_interrupt and
>> vcpu_enter_guest.  First, the IPI for posted interrupts may be issued
>> between setting vcpu->mode = IN_GUEST_MODE and disabling interrupts.
>> If that happens, kvm_trigger_posted_interrupt returns true, but
>> smp_kvm_posted_intr_ipi doesn't do anything about it.  The guest is
>> entered with PIR.ON, but the posted interrupt IPI has not been sent
>> and the interrupt is only delivered to the guest on the next vmentry
>> (if any).  To fix this, disable interrupts before setting vcpu->mode.
>> This ensures that the IPI is delayed until the guest enters non-root mode;
>> it is then trapped by the processor causing the interrupt to be injected.
>>
>> Second, the IPI may be issued between
>>
>>                         kvm_x86_ops->hwapic_irr_update(vcpu,
>>                                 kvm_lapic_find_highest_irr(vcpu));
>>
>> and vcpu->mode = IN_GUEST_MODE.  In this case, kvm_vcpu_kick is called
>> but it (correctly) doesn't do anything because it sees vcpu->mode ==
>> OUTSIDE_GUEST_MODE.  Again, the guest is entered with PIR.ON but no
>> posted interrupt IPI is pending; this time, the fix for this is to move
>> the RVI update after IN_GUEST_MODE.
>>
>> Both issues were previously masked by the liberal usage of KVM_REQ_EVENT.
>> In both race scenarios KVM_REQ_EVENT would cancel guest entry, resulting
>> in another vmentry which would inject the interrupt.
>>
>> This saves about 300 cycles on the self_ipi_* tests of vmexit.flat.
>>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>>  arch/x86/kvm/lapic.c | 11 ++++-------
>>  arch/x86/kvm/vmx.c   |  8 +++++---
>>  arch/x86/kvm/x86.c   | 44 +++++++++++++++++++++++++-------------------
>>  3 files changed, 34 insertions(+), 29 deletions(-)
>>
>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>> index f644dd1dbe71..5ea94b622e88 100644
>> --- a/arch/x86/kvm/lapic.c
>> +++ b/arch/x86/kvm/lapic.c
>> @@ -385,12 +385,8 @@ int __kvm_apic_update_irr(u32 *pir, void *regs)
>>  int kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir)
>>  {
>>         struct kvm_lapic *apic = vcpu->arch.apic;
>> -       int max_irr;
>>
>> -       max_irr = __kvm_apic_update_irr(pir, apic->regs);
>> -
>> -       kvm_make_request(KVM_REQ_EVENT, vcpu);
>> -       return max_irr;
>> +       return __kvm_apic_update_irr(pir, apic->regs);
>>  }
>>  EXPORT_SYMBOL_GPL(kvm_apic_update_irr);
>>
>> @@ -423,9 +419,10 @@ static inline void apic_clear_irr(int vec, struct kvm_lapic *apic)
>>         vcpu = apic->vcpu;
>>
>>         if (unlikely(vcpu->arch.apicv_active)) {
>> -               /* try to update RVI */
>> +               /* need to update RVI */
>>                 apic_clear_vector(vec, apic->regs + APIC_IRR);
>> -               kvm_make_request(KVM_REQ_EVENT, vcpu);
>> +               kvm_x86_ops->hwapic_irr_update(vcpu,
>> +                               apic_find_highest_irr(apic));
>>         } else {
>>                 apic->irr_pending = false;
>>                 apic_clear_vector(vec, apic->regs + APIC_IRR);
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index 27e40b180242..3dd4fad35a3e 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -5062,9 +5062,11 @@ static void vmx_deliver_posted_interrupt(struct kvm_vcpu *vcpu, int vector)
>>         if (pi_test_and_set_pir(vector, &vmx->pi_desc))
>>                 return;
>>
>> -       r = pi_test_and_set_on(&vmx->pi_desc);
>> -       kvm_make_request(KVM_REQ_EVENT, vcpu);
>> -       if (r || !kvm_vcpu_trigger_posted_interrupt(vcpu))
>> +       /* If a previous notification has sent the IPI, nothing to do.  */
>> +       if (pi_test_and_set_on(&vmx->pi_desc))
>> +               return;
>> +
>> +       if (!kvm_vcpu_trigger_posted_interrupt(vcpu))
>>                 kvm_vcpu_kick(vcpu);
>>  }
>>
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index c666414adc1d..725473ba6dd3 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -6710,19 +6710,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>>                         kvm_hv_process_stimers(vcpu);
>>         }
>>
>> -       /*
>> -        * KVM_REQ_EVENT is not set when posted interrupts are set by
>> -        * VT-d hardware, so we have to update RVI unconditionally.
>> -        */
>> -       if (kvm_lapic_enabled(vcpu)) {
>> -               /*
>> -                * Update architecture specific hints for APIC
>> -                * virtual interrupt delivery.
>> -                */
>> -               if (kvm_x86_ops->sync_pir_to_irr)
>> -                       kvm_x86_ops->sync_pir_to_irr(vcpu);
>> -       }
>> -
>>         if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
>>                 ++vcpu->stat.req_event;
>>                 kvm_apic_accept_events(vcpu);
>> @@ -6767,20 +6754,39 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>>         kvm_x86_ops->prepare_guest_switch(vcpu);
>>         if (vcpu->fpu_active)
>>                 kvm_load_guest_fpu(vcpu);
>> +
>> +       /*
>> +        * Disabling IRQs before setting IN_GUEST_MODE.  Posted interrupt
>> +        * IPI are then delayed after guest entry, which ensures that they
>> +        * result in virtual interrupt delivery.
>> +        */
>> +       local_irq_disable();
>>         vcpu->mode = IN_GUEST_MODE;
>>
>>         srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
>>
>>         /*
>> -        * We should set ->mode before check ->requests,
>> -        * Please see the comment in kvm_make_all_cpus_request.
>> -        * This also orders the write to mode from any reads
>> -        * to the page tables done while the VCPU is running.
>> -        * Please see the comment in kvm_flush_remote_tlbs.
>> +        * 1) We should set ->mode before checking ->requests.  Please see
>> +        * the comment in kvm_make_all_cpus_request.
>> +        *
>> +        * 2) For APICv, we should set ->mode before checking PIR.ON.  This
>> +        * pairs with the memory barrier implicit in pi_test_and_set_on
>> +        * (see vmx_deliver_posted_interrupt).
>> +        *
>> +        * 3) This also orders the write to mode from any reads to the page
>> +        * tables done while the VCPU is running.  Please see the comment
>> +        * in kvm_flush_remote_tlbs.
>>          */
>>         smp_mb__after_srcu_read_unlock();
>>
>> -       local_irq_disable();
>
> The local_irq_disable() movement is unnecessary if you move sync_pir_to_irr.

In addition, this movement will increase the time of irq disable to
some degree. Do you think I can send a patch to revert it?

Regards,
Wanpeng Li

>
> - IPI after vcpu->mode = IN_GUEST_MODE and interrupt disable, PI is
> successfully.
> - IPI between vcpu->mode = IN_GUEST_MODE and interrupt disable, the
> sync_ir_to_irr will catch the PIR and set RVI.
>
> Regards,
> Wanpeng Li
>
>> +       if (kvm_lapic_enabled(vcpu)) {
>> +               /*
>> +                * This handles the case where a posted interrupt was
>> +                * notified with kvm_vcpu_kick.
>> +                */
>> +               if (kvm_x86_ops->sync_pir_to_irr)
>> +                       kvm_x86_ops->sync_pir_to_irr(vcpu);
>> +       }
>>
>>         if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests
>>             || need_resched() || signal_pending(current)) {
>> --
>> 1.8.3.1
>>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 6/6] kvm: x86: do not use KVM_REQ_EVENT for APICv interrupt injection
  2017-03-09  9:40     ` Wanpeng Li
@ 2017-03-09 10:03       ` Paolo Bonzini
  0 siblings, 0 replies; 25+ messages in thread
From: Paolo Bonzini @ 2017-03-09 10:03 UTC (permalink / raw)
  To: Wanpeng Li; +Cc: linux-kernel, kvm



On 09/03/2017 10:40, Wanpeng Li wrote:
> 2017-03-09 9:23 GMT+08:00 Wanpeng Li <kernellwp@gmail.com>:
>> 2016-12-20 0:17 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
>>> Since bf9f6ac8d749 ("KVM: Update Posted-Interrupts Descriptor when vCPU
>>> is blocked", 2015-09-18) the posted interrupt descriptor is checked
>>> unconditionally for PIR.ON.  Therefore we don't need KVM_REQ_EVENT to
>>> trigger the scan and, if NMIs or SMIs are not involved, we can avoid
>>> the complicated event injection path.
>>>
>>> Calling kvm_vcpu_kick if PIR.ON=1 is also useless, though it has been
>>> there since APICv was introduced.
>>>
>>> However, without the KVM_REQ_EVENT safety net KVM needs to be much
>>> more careful about races between vmx_deliver_posted_interrupt and
>>> vcpu_enter_guest.  First, the IPI for posted interrupts may be issued
>>> between setting vcpu->mode = IN_GUEST_MODE and disabling interrupts.
>>> If that happens, kvm_trigger_posted_interrupt returns true, but
>>> smp_kvm_posted_intr_ipi doesn't do anything about it.  The guest is
>>> entered with PIR.ON, but the posted interrupt IPI has not been sent
>>> and the interrupt is only delivered to the guest on the next vmentry
>>> (if any).  To fix this, disable interrupts before setting vcpu->mode.
>>> This ensures that the IPI is delayed until the guest enters non-root mode;
>>> it is then trapped by the processor causing the interrupt to be injected.
>>>
>>> Second, the IPI may be issued between
>>>
>>>                         kvm_x86_ops->hwapic_irr_update(vcpu,
>>>                                 kvm_lapic_find_highest_irr(vcpu));
>>>
>>> and vcpu->mode = IN_GUEST_MODE.  In this case, kvm_vcpu_kick is called
>>> but it (correctly) doesn't do anything because it sees vcpu->mode ==
>>> OUTSIDE_GUEST_MODE.  Again, the guest is entered with PIR.ON but no
>>> posted interrupt IPI is pending; this time, the fix for this is to move
>>> the RVI update after IN_GUEST_MODE.
>>>
>>> Both issues were previously masked by the liberal usage of KVM_REQ_EVENT.
>>> In both race scenarios KVM_REQ_EVENT would cancel guest entry, resulting
>>> in another vmentry which would inject the interrupt.
>>>
>>> This saves about 300 cycles on the self_ipi_* tests of vmexit.flat.
>>>
>>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>> ---
>>>  arch/x86/kvm/lapic.c | 11 ++++-------
>>>  arch/x86/kvm/vmx.c   |  8 +++++---
>>>  arch/x86/kvm/x86.c   | 44 +++++++++++++++++++++++++-------------------
>>>  3 files changed, 34 insertions(+), 29 deletions(-)
>>>
>>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>>> index f644dd1dbe71..5ea94b622e88 100644
>>> --- a/arch/x86/kvm/lapic.c
>>> +++ b/arch/x86/kvm/lapic.c
>>> @@ -385,12 +385,8 @@ int __kvm_apic_update_irr(u32 *pir, void *regs)
>>>  int kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir)
>>>  {
>>>         struct kvm_lapic *apic = vcpu->arch.apic;
>>> -       int max_irr;
>>>
>>> -       max_irr = __kvm_apic_update_irr(pir, apic->regs);
>>> -
>>> -       kvm_make_request(KVM_REQ_EVENT, vcpu);
>>> -       return max_irr;
>>> +       return __kvm_apic_update_irr(pir, apic->regs);
>>>  }
>>>  EXPORT_SYMBOL_GPL(kvm_apic_update_irr);
>>>
>>> @@ -423,9 +419,10 @@ static inline void apic_clear_irr(int vec, struct kvm_lapic *apic)
>>>         vcpu = apic->vcpu;
>>>
>>>         if (unlikely(vcpu->arch.apicv_active)) {
>>> -               /* try to update RVI */
>>> +               /* need to update RVI */
>>>                 apic_clear_vector(vec, apic->regs + APIC_IRR);
>>> -               kvm_make_request(KVM_REQ_EVENT, vcpu);
>>> +               kvm_x86_ops->hwapic_irr_update(vcpu,
>>> +                               apic_find_highest_irr(apic));
>>>         } else {
>>>                 apic->irr_pending = false;
>>>                 apic_clear_vector(vec, apic->regs + APIC_IRR);
>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>> index 27e40b180242..3dd4fad35a3e 100644
>>> --- a/arch/x86/kvm/vmx.c
>>> +++ b/arch/x86/kvm/vmx.c
>>> @@ -5062,9 +5062,11 @@ static void vmx_deliver_posted_interrupt(struct kvm_vcpu *vcpu, int vector)
>>>         if (pi_test_and_set_pir(vector, &vmx->pi_desc))
>>>                 return;
>>>
>>> -       r = pi_test_and_set_on(&vmx->pi_desc);
>>> -       kvm_make_request(KVM_REQ_EVENT, vcpu);
>>> -       if (r || !kvm_vcpu_trigger_posted_interrupt(vcpu))
>>> +       /* If a previous notification has sent the IPI, nothing to do.  */
>>> +       if (pi_test_and_set_on(&vmx->pi_desc))
>>> +               return;
>>> +
>>> +       if (!kvm_vcpu_trigger_posted_interrupt(vcpu))
>>>                 kvm_vcpu_kick(vcpu);
>>>  }
>>>
>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>> index c666414adc1d..725473ba6dd3 100644
>>> --- a/arch/x86/kvm/x86.c
>>> +++ b/arch/x86/kvm/x86.c
>>> @@ -6710,19 +6710,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>>>                         kvm_hv_process_stimers(vcpu);
>>>         }
>>>
>>> -       /*
>>> -        * KVM_REQ_EVENT is not set when posted interrupts are set by
>>> -        * VT-d hardware, so we have to update RVI unconditionally.
>>> -        */
>>> -       if (kvm_lapic_enabled(vcpu)) {
>>> -               /*
>>> -                * Update architecture specific hints for APIC
>>> -                * virtual interrupt delivery.
>>> -                */
>>> -               if (kvm_x86_ops->sync_pir_to_irr)
>>> -                       kvm_x86_ops->sync_pir_to_irr(vcpu);
>>> -       }
>>> -
>>>         if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
>>>                 ++vcpu->stat.req_event;
>>>                 kvm_apic_accept_events(vcpu);
>>> @@ -6767,20 +6754,39 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>>>         kvm_x86_ops->prepare_guest_switch(vcpu);
>>>         if (vcpu->fpu_active)
>>>                 kvm_load_guest_fpu(vcpu);
>>> +
>>> +       /*
>>> +        * Disabling IRQs before setting IN_GUEST_MODE.  Posted interrupt
>>> +        * IPI are then delayed after guest entry, which ensures that they
>>> +        * result in virtual interrupt delivery.
>>> +        */
>>> +       local_irq_disable();
>>>         vcpu->mode = IN_GUEST_MODE;
>>>
>>>         srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
>>>
>>>         /*
>>> -        * We should set ->mode before check ->requests,
>>> -        * Please see the comment in kvm_make_all_cpus_request.
>>> -        * This also orders the write to mode from any reads
>>> -        * to the page tables done while the VCPU is running.
>>> -        * Please see the comment in kvm_flush_remote_tlbs.
>>> +        * 1) We should set ->mode before checking ->requests.  Please see
>>> +        * the comment in kvm_make_all_cpus_request.
>>> +        *
>>> +        * 2) For APICv, we should set ->mode before checking PIR.ON.  This
>>> +        * pairs with the memory barrier implicit in pi_test_and_set_on
>>> +        * (see vmx_deliver_posted_interrupt).
>>> +        *
>>> +        * 3) This also orders the write to mode from any reads to the page
>>> +        * tables done while the VCPU is running.  Please see the comment
>>> +        * in kvm_flush_remote_tlbs.
>>>          */
>>>         smp_mb__after_srcu_read_unlock();
>>>
>>> -       local_irq_disable();
>>
>> The local_irq_disable() movement is unnecessary if you move sync_pir_to_irr.
> 
> In addition, this movement will increase the time of irq disable to
> some degree. Do you think I can send a patch to revert it?

The difference is a few dozen hundred clock cycles, I don't think it
matters.  Also, a posted interrupt sent to the host while IN_GUEST_MODE
is more expensive than one sent while the processor is in non-root mode.

All in all, I think it's preferrable to keep the local_irq_disable here.
 Your observation seems correct though.

Paolo

> Regards,
> Wanpeng Li
> 
>>
>> - IPI after vcpu->mode = IN_GUEST_MODE and interrupt disable, PI is
>> successfully.
>> - IPI between vcpu->mode = IN_GUEST_MODE and interrupt disable, the
>> sync_ir_to_irr will catch the PIR and set RVI.
>>
>> Regards,
>> Wanpeng Li
>>
>>> +       if (kvm_lapic_enabled(vcpu)) {
>>> +               /*
>>> +                * This handles the case where a posted interrupt was
>>> +                * notified with kvm_vcpu_kick.
>>> +                */
>>> +               if (kvm_x86_ops->sync_pir_to_irr)
>>> +                       kvm_x86_ops->sync_pir_to_irr(vcpu);
>>> +       }
>>>
>>>         if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests
>>>             || need_resched() || signal_pending(current)) {
>>> --
>>> 1.8.3.1
>>>

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2017-03-09 10:04 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-19 16:17 [PATCH v2 0/6] KVM: x86: cleanup and speedup for APICv Paolo Bonzini
2016-12-19 16:17 ` [PATCH 1/6] KVM: vmx: clear pending interrupts on KVM_SET_LAPIC Paolo Bonzini
2017-02-07 17:42   ` Radim Krčmář
2016-12-19 16:17 ` [PATCH 2/6] kvm: nVMX: move nested events check to kvm_vcpu_running Paolo Bonzini
2017-02-07 18:16   ` Radim Krčmář
2016-12-19 16:17 ` [PATCH 3/6] KVM: x86: preparatory changes for APICv cleanups Paolo Bonzini
2017-02-07 18:20   ` Radim Krčmář
2016-12-19 16:17 ` [PATCH 4/6] KVM: vmx: move sync_pir_to_irr from apic_find_highest_irr to callers Paolo Bonzini
2016-12-19 16:17 ` [PATCH 5/6] KVM: x86: do not scan IRR twice on APICv vmentry Paolo Bonzini
2017-02-07 20:19   ` Radim Krčmář
2017-02-07 21:49     ` Radim Krčmář
2017-02-08 14:10     ` Paolo Bonzini
2017-02-08 14:24       ` Radim Krčmář
2016-12-19 16:17 ` [PATCH 6/6] kvm: x86: do not use KVM_REQ_EVENT for APICv interrupt injection Paolo Bonzini
2017-02-07 19:58   ` Radim Krčmář
2017-02-08 16:23     ` Paolo Bonzini
2017-02-09 15:11       ` Radim Krčmář
2017-03-09  1:23   ` Wanpeng Li
2017-03-09  9:40     ` Wanpeng Li
2017-03-09 10:03       ` Paolo Bonzini
2017-02-07 17:23 ` [PATCH v2 0/6] KVM: x86: cleanup and speedup for APICv Paolo Bonzini
2017-02-07 21:52   ` Radim Krčmář
2017-02-08 10:04     ` Paolo Bonzini
2017-02-08 13:33       ` Radim Krčmář
2017-02-08 15:01         ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).