All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/3] kvm: x86: speedups for APICv
@ 2016-09-27 21:20 Paolo Bonzini
  2016-09-27 21:20 ` [PATCH 1/3] kvm: x86: avoid atomic operations on APICv vmentry Paolo Bonzini
                   ` (3 more replies)
  0 siblings, 4 replies; 29+ messages in thread
From: Paolo Bonzini @ 2016-09-27 21:20 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: yang.zhang.wz, feng.wu, mst, rkrcmar

These patches save 40 cycles on kvm-unit-tests "inl" tests, but I suspect
the effect is bigger on workloads that inject interrupts heavily.

The difficult one is patch 2.  I find the new code easier to follow than
the old one, but it doesn't mean it works. :)  The aim is for APICv to
not use KVM_REQ_EVENT at all for interrupts, therefore turning APICv's
weakness (having to look at PIR on every vmentry) into a strength
(because checking PIR.ON is cheaper than processing KVM_REQ_EVENT).
The others are just micro-optimization of the surrounding code.

Tested extremely lightly and obviously not for 4.9, just shooting it
out now.

Paolo

Paolo Bonzini (3):
  kvm: x86: avoid atomic operations on APICv vmentry
  kvm: x86: do not use KVM_REQ_EVENT for APICv interrupt injection
  KVM: x86: do not scan IRR twice on APICv vmentry

 arch/x86/include/asm/kvm_host.h |  2 +-
 arch/x86/kvm/lapic.c            | 31 +++++++++++++++++++------------
 arch/x86/kvm/lapic.h            |  4 ++--
 arch/x86/kvm/svm.c              |  2 +-
 arch/x86/kvm/vmx.c              | 30 ++++++++++++++++++------------
 arch/x86/kvm/x86.c              | 16 ++++++++++------
 6 files changed, 51 insertions(+), 34 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 1/3] kvm: x86: avoid atomic operations on APICv vmentry
  2016-09-27 21:20 [RFC PATCH 0/3] kvm: x86: speedups for APICv Paolo Bonzini
@ 2016-09-27 21:20 ` Paolo Bonzini
  2016-09-27 21:20 ` [PATCH 2/3] kvm: x86: do not use KVM_REQ_EVENT for APICv interrupt injection Paolo Bonzini
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 29+ messages in thread
From: Paolo Bonzini @ 2016-09-27 21:20 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: yang.zhang.wz, feng.wu, mst, rkrcmar

On some benchmarks (e.g. netperf with ioeventfd disabled), APICv
posted interrupts turn out to be slower than interrupt injection via
KVM_REQ_EVENT.

This patch optimizes a bit the IRR update, avoiding expensive atomic
operations in the common case where PI.ON=0 at vmentry or the PIR vector
is mostly zero.  This saves around 20 cycles per vmexit, as
measured by kvm-unit-tests' inl_from_qemu test (20 runs):

              | enable_apicv=1  |  enable_apicv=0
              | mean     stdev  |  mean     stdev
    ----------|-----------------|------------------
    before    | 5826     32.65  |  5765     47.09
    after     | 5809     43.42  |  5777     77.02

Of course, any change in the right column is just placebo effect. :)
The savings are bigger if interrupts are frequent.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/lapic.c | 6 ++++--
 arch/x86/kvm/vmx.c   | 3 ++-
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 23b99f305382..63a442aefc12 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -342,9 +342,11 @@ void __kvm_apic_update_irr(u32 *pir, void *regs)
 	u32 i, pir_val;
 
 	for (i = 0; i <= 7; i++) {
-		pir_val = xchg(&pir[i], 0);
-		if (pir_val)
+		pir_val = READ_ONCE(pir[i]);
+		if (pir_val) {
+			pir_val = xchg(&pir[i], 0);
 			*((u32 *)(regs + APIC_IRR + i * 0x10)) |= pir_val;
+		}
 	}
 }
 EXPORT_SYMBOL_GPL(__kvm_apic_update_irr);
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 2577183b40d9..b33eee395b00 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4854,7 +4854,8 @@ static void vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 
-	if (!pi_test_and_clear_on(&vmx->pi_desc))
+	if (!pi_test_on(&vmx->pi_desc) ||
+	    !pi_test_and_clear_on(&vmx->pi_desc))
 		return;
 
 	kvm_apic_update_irr(vcpu, vmx->pi_desc.pir);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 2/3] kvm: x86: do not use KVM_REQ_EVENT for APICv interrupt injection
  2016-09-27 21:20 [RFC PATCH 0/3] kvm: x86: speedups for APICv Paolo Bonzini
  2016-09-27 21:20 ` [PATCH 1/3] kvm: x86: avoid atomic operations on APICv vmentry Paolo Bonzini
@ 2016-09-27 21:20 ` Paolo Bonzini
  2016-09-27 23:07   ` Michael S. Tsirkin
                     ` (2 more replies)
  2016-09-27 21:20 ` [PATCH 3/3] KVM: x86: do not scan IRR twice on APICv vmentry Paolo Bonzini
  2016-09-29 19:55 ` [RFC PATCH 0/3] kvm: x86: speedups for APICv Radim Krčmář
  3 siblings, 3 replies; 29+ messages in thread
From: Paolo Bonzini @ 2016-09-27 21:20 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: yang.zhang.wz, feng.wu, mst, rkrcmar

Since bf9f6ac8d749 ("KVM: Update Posted-Interrupts Descriptor when vCPU
is blocked", 2015-09-18) the posted interrupt descriptor is checked
unconditionally for PIR.ON.  Therefore we don't need KVM_REQ_EVENT to
trigger the scan and, if NMIs or SMIs are not involved, we can avoid
the complicated event injection path.

However, there is a race between vmx_deliver_posted_interrupt and
vcpu_enter_guest.  Fix it by disabling interrupts before vcpu->mode is
set to IN_GUEST_MODE.

Calling kvm_vcpu_kick if PIR.ON=1 is also useless, though it has been
there since APICv was introduced.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/lapic.c | 2 --
 arch/x86/kvm/vmx.c   | 8 +++++---
 arch/x86/kvm/x86.c   | 9 +++++++--
 3 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 63a442aefc12..be8b7ad56dd1 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -356,8 +356,6 @@ void kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir)
 	struct kvm_lapic *apic = vcpu->arch.apic;
 
 	__kvm_apic_update_irr(pir, apic->regs);
-
-	kvm_make_request(KVM_REQ_EVENT, vcpu);
 }
 EXPORT_SYMBOL_GPL(kvm_apic_update_irr);
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index b33eee395b00..207b9aa32915 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4844,9 +4844,11 @@ static void vmx_deliver_posted_interrupt(struct kvm_vcpu *vcpu, int vector)
 	if (pi_test_and_set_pir(vector, &vmx->pi_desc))
 		return;
 
-	r = pi_test_and_set_on(&vmx->pi_desc);
-	kvm_make_request(KVM_REQ_EVENT, vcpu);
-	if (r || !kvm_vcpu_trigger_posted_interrupt(vcpu))
+	/* If a previous notification has sent the IPI, nothing to do.  */
+	if (pi_test_and_set_on(&vmx->pi_desc))
+		return;
+
+	if (!kvm_vcpu_trigger_posted_interrupt(vcpu))
 		kvm_vcpu_kick(vcpu);
 }
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3ee8a91a78c3..604cfbfc5bee 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6658,6 +6658,13 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 	kvm_x86_ops->prepare_guest_switch(vcpu);
 	if (vcpu->fpu_active)
 		kvm_load_guest_fpu(vcpu);
+
+	/*
+	 * Disable IRQs before setting IN_GUEST_MODE, so that
+	 * posted interrupts with vcpu->mode == IN_GUEST_MODE
+	 * always result in virtual interrupt delivery.
+	 */
+	local_irq_disable();
 	vcpu->mode = IN_GUEST_MODE;
 
 	srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
@@ -6671,8 +6678,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 	 */
 	smp_mb__after_srcu_read_unlock();
 
-	local_irq_disable();
-
 	if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests
 	    || need_resched() || signal_pending(current)) {
 		vcpu->mode = OUTSIDE_GUEST_MODE;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 3/3] KVM: x86: do not scan IRR twice on APICv vmentry
  2016-09-27 21:20 [RFC PATCH 0/3] kvm: x86: speedups for APICv Paolo Bonzini
  2016-09-27 21:20 ` [PATCH 1/3] kvm: x86: avoid atomic operations on APICv vmentry Paolo Bonzini
  2016-09-27 21:20 ` [PATCH 2/3] kvm: x86: do not use KVM_REQ_EVENT for APICv interrupt injection Paolo Bonzini
@ 2016-09-27 21:20 ` Paolo Bonzini
  2016-09-28 14:00   ` Michael S. Tsirkin
                     ` (2 more replies)
  2016-09-29 19:55 ` [RFC PATCH 0/3] kvm: x86: speedups for APICv Radim Krčmář
  3 siblings, 3 replies; 29+ messages in thread
From: Paolo Bonzini @ 2016-09-27 21:20 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: yang.zhang.wz, feng.wu, mst, rkrcmar

Calling apic_find_highest_irr results in IRR being scanned twice,
once in vmx_sync_pir_from_irr and once in apic_search_irr.  Change
vcpu_enter_guest to use sync_pir_from_irr (with a new argument to
trigger the RVI write), and let sync_pir_from_irr get the new RVI
directly from kvm_apic_update_irr.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |  2 +-
 arch/x86/kvm/lapic.c            | 25 ++++++++++++++++---------
 arch/x86/kvm/lapic.h            |  4 ++--
 arch/x86/kvm/svm.c              |  2 +-
 arch/x86/kvm/vmx.c              | 25 ++++++++++++++-----------
 arch/x86/kvm/x86.c              |  7 +++----
 6 files changed, 37 insertions(+), 28 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4b20f7304b9c..f73eea243567 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -941,7 +941,7 @@ struct kvm_x86_ops {
 	void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
 	void (*set_apic_access_page_addr)(struct kvm_vcpu *vcpu, hpa_t hpa);
 	void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
-	void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
+	void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu, bool sync_rvi);
 	int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
 	int (*get_tdp_level)(void);
 	u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index be8b7ad56dd1..85b79b90e3d0 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -317,7 +317,7 @@ static int find_highest_vector(void *bitmap)
 	     vec >= 0; vec -= APIC_VECTORS_PER_REG) {
 		reg = bitmap + REG_POS(vec);
 		if (*reg)
-			return fls(*reg) - 1 + vec;
+			return __fls(*reg) + vec;
 	}
 
 	return -1;
@@ -337,25 +337,32 @@ static u8 count_vectors(void *bitmap)
 	return count;
 }
 
-void __kvm_apic_update_irr(u32 *pir, void *regs)
+int __kvm_apic_update_irr(u32 *pir, void *regs)
 {
-	u32 i, pir_val;
+	u32 i, vec;
+	u32 pir_val, irr_val;
+	int max_irr = -1;
 
-	for (i = 0; i <= 7; i++) {
+	for (i = vec = 0; i <= 7; i++, vec += 32) {
 		pir_val = READ_ONCE(pir[i]);
+		irr_val = *((u32 *)(regs + APIC_IRR + i * 0x10));
 		if (pir_val) {
-			pir_val = xchg(&pir[i], 0);
-			*((u32 *)(regs + APIC_IRR + i * 0x10)) |= pir_val;
+			irr_val |= xchg(&pir[i], 0);
+			*((u32 *)(regs + APIC_IRR + i * 0x10)) = irr_val;
 		}
+		if (irr_val)
+			max_irr = __fls(irr_val) + vec;
 	}
+
+	return max_irr;
 }
 EXPORT_SYMBOL_GPL(__kvm_apic_update_irr);
 
-void kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir)
+int kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir)
 {
 	struct kvm_lapic *apic = vcpu->arch.apic;
 
-	__kvm_apic_update_irr(pir, apic->regs);
+	return __kvm_apic_update_irr(pir, apic->regs);
 }
 EXPORT_SYMBOL_GPL(kvm_apic_update_irr);
 
@@ -376,7 +383,7 @@ static inline int apic_find_highest_irr(struct kvm_lapic *apic)
 		return -1;
 
 	if (apic->vcpu->arch.apicv_active)
-		kvm_x86_ops->sync_pir_to_irr(apic->vcpu);
+		kvm_x86_ops->sync_pir_to_irr(apic->vcpu, false);
 	result = apic_search_irr(apic);
 	ASSERT(result == -1 || result >= 16);
 
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index f60d01c29d51..fc72828c3782 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -70,8 +70,8 @@ int kvm_lapic_reg_read(struct kvm_lapic *apic, u32 offset, int len,
 bool kvm_apic_match_dest(struct kvm_vcpu *vcpu, struct kvm_lapic *source,
 			   int short_hand, unsigned int dest, int dest_mode);
 
-void __kvm_apic_update_irr(u32 *pir, void *regs);
-void kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir);
+int __kvm_apic_update_irr(u32 *pir, void *regs);
+int kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir);
 int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq,
 		     struct dest_map *dest_map);
 int kvm_apic_local_deliver(struct kvm_lapic *apic, int lvt_type);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 9b4221471e3d..60e53fa2b554 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -4383,7 +4383,7 @@ static void svm_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)
 	return;
 }
 
-static void svm_sync_pir_to_irr(struct kvm_vcpu *vcpu)
+static void svm_sync_pir_to_irr(struct kvm_vcpu *vcpu, bool sync_rvi)
 {
 	return;
 }
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 207b9aa32915..1edefab54d01 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4852,17 +4852,6 @@ static void vmx_deliver_posted_interrupt(struct kvm_vcpu *vcpu, int vector)
 		kvm_vcpu_kick(vcpu);
 }
 
-static void vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
-{
-	struct vcpu_vmx *vmx = to_vmx(vcpu);
-
-	if (!pi_test_on(&vmx->pi_desc) ||
-	    !pi_test_and_clear_on(&vmx->pi_desc))
-		return;
-
-	kvm_apic_update_irr(vcpu, vmx->pi_desc.pir);
-}
-
 /*
  * Set up the vmcs's constant host-state fields, i.e., host-state fields that
  * will not change in the lifetime of the guest.
@@ -8609,6 +8598,20 @@ static void vmx_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr)
 	}
 }
 
+static void vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu, bool sync_rvi)
+{
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
+	int max_irr;
+
+	if (!pi_test_on(&vmx->pi_desc) ||
+	    !pi_test_and_clear_on(&vmx->pi_desc))
+		return;
+
+	max_irr = kvm_apic_update_irr(vcpu, vmx->pi_desc.pir);
+	if (sync_rvi)
+		vmx_hwapic_irr_update(vcpu, max_irr);
+}
+
 static void vmx_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)
 {
 	if (!kvm_vcpu_apicv_active(vcpu))
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 604cfbfc5bee..148e14fdc55d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2821,7 +2821,7 @@ static int kvm_vcpu_ioctl_get_lapic(struct kvm_vcpu *vcpu,
 				    struct kvm_lapic_state *s)
 {
 	if (vcpu->arch.apicv_active)
-		kvm_x86_ops->sync_pir_to_irr(vcpu);
+		kvm_x86_ops->sync_pir_to_irr(vcpu, false);
 
 	return kvm_apic_get_state(vcpu, s);
 }
@@ -6448,7 +6448,7 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
 		kvm_scan_ioapic_routes(vcpu, vcpu->arch.ioapic_handled_vectors);
 	else {
 		if (vcpu->arch.apicv_active)
-			kvm_x86_ops->sync_pir_to_irr(vcpu);
+			kvm_x86_ops->sync_pir_to_irr(vcpu, false);
 		kvm_ioapic_scan_entry(vcpu, vcpu->arch.ioapic_handled_vectors);
 	}
 	bitmap_or((ulong *)eoi_exit_bitmap, vcpu->arch.ioapic_handled_vectors,
@@ -6611,8 +6611,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 		 * virtual interrupt delivery.
 		 */
 		if (vcpu->arch.apicv_active)
-			kvm_x86_ops->hwapic_irr_update(vcpu,
-				kvm_lapic_find_highest_irr(vcpu));
+			kvm_x86_ops->sync_pir_to_irr(vcpu, true);
 	}
 
 	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] kvm: x86: do not use KVM_REQ_EVENT for APICv interrupt injection
  2016-09-27 21:20 ` [PATCH 2/3] kvm: x86: do not use KVM_REQ_EVENT for APICv interrupt injection Paolo Bonzini
@ 2016-09-27 23:07   ` Michael S. Tsirkin
  2016-09-28  8:21     ` Paolo Bonzini
  2016-09-28 10:04   ` Wu, Feng
  2016-10-14  7:12   ` Yang Zhang
  2 siblings, 1 reply; 29+ messages in thread
From: Michael S. Tsirkin @ 2016-09-27 23:07 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm, yang.zhang.wz, feng.wu, rkrcmar

On Tue, Sep 27, 2016 at 11:20:12PM +0200, Paolo Bonzini wrote:
> Since bf9f6ac8d749 ("KVM: Update Posted-Interrupts Descriptor when vCPU
> is blocked", 2015-09-18) the posted interrupt descriptor is checked
> unconditionally for PIR.ON.  Therefore we don't need KVM_REQ_EVENT to
> trigger the scan and, if NMIs or SMIs are not involved, we can avoid
> the complicated event injection path.
> 
> However, there is a race between vmx_deliver_posted_interrupt and
> vcpu_enter_guest.  Fix it by disabling interrupts before vcpu->mode is
> set to IN_GUEST_MODE.

Could you describe the race a bit more please?
I'm surprised that locally disabling irqs
fixes a race with a different CPUs.

> 
> Calling kvm_vcpu_kick if PIR.ON=1 is also useless, though it has been
> there since APICv was introduced.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/x86/kvm/lapic.c | 2 --
>  arch/x86/kvm/vmx.c   | 8 +++++---
>  arch/x86/kvm/x86.c   | 9 +++++++--
>  3 files changed, 12 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 63a442aefc12..be8b7ad56dd1 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -356,8 +356,6 @@ void kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir)
>  	struct kvm_lapic *apic = vcpu->arch.apic;
>  
>  	__kvm_apic_update_irr(pir, apic->regs);
> -
> -	kvm_make_request(KVM_REQ_EVENT, vcpu);
>  }
>  EXPORT_SYMBOL_GPL(kvm_apic_update_irr);
>  
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index b33eee395b00..207b9aa32915 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -4844,9 +4844,11 @@ static void vmx_deliver_posted_interrupt(struct kvm_vcpu *vcpu, int vector)
>  	if (pi_test_and_set_pir(vector, &vmx->pi_desc))
>  		return;
>  
> -	r = pi_test_and_set_on(&vmx->pi_desc);
> -	kvm_make_request(KVM_REQ_EVENT, vcpu);
> -	if (r || !kvm_vcpu_trigger_posted_interrupt(vcpu))

so the logic here was reversed - kick was not called
when PIR=OFF previously. It's a bug but what was it's effect?
I'm guessing - extra latency where VCPU won't get
woken up correctly?

> +	/* If a previous notification has sent the IPI, nothing to do.  */
> +	if (pi_test_and_set_on(&vmx->pi_desc))
> +		return;
> +
> +	if (!kvm_vcpu_trigger_posted_interrupt(vcpu))
>  		kvm_vcpu_kick(vcpu);
>  }
>  
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 3ee8a91a78c3..604cfbfc5bee 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -6658,6 +6658,13 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  	kvm_x86_ops->prepare_guest_switch(vcpu);
>  	if (vcpu->fpu_active)
>  		kvm_load_guest_fpu(vcpu);
> +
> +	/*
> +	 * Disable IRQs before setting IN_GUEST_MODE, so that
> +	 * posted interrupts with vcpu->mode == IN_GUEST_MODE
> +	 * always result in virtual interrupt delivery.
> +	 */
> +	local_irq_disable();
>  	vcpu->mode = IN_GUEST_MODE;
>  
>  	srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
> @@ -6671,8 +6678,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  	 */
>  	smp_mb__after_srcu_read_unlock();
>  
> -	local_irq_disable();
> -
>  	if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests
>  	    || need_resched() || signal_pending(current)) {
>  		vcpu->mode = OUTSIDE_GUEST_MODE;
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] kvm: x86: do not use KVM_REQ_EVENT for APICv interrupt injection
  2016-09-27 23:07   ` Michael S. Tsirkin
@ 2016-09-28  8:21     ` Paolo Bonzini
  2016-09-28 11:40       ` Wu, Feng
  2016-09-28 13:46       ` Michael S. Tsirkin
  0 siblings, 2 replies; 29+ messages in thread
From: Paolo Bonzini @ 2016-09-28  8:21 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: linux-kernel, kvm, yang.zhang.wz, feng.wu, rkrcmar



On 28/09/2016 01:07, Michael S. Tsirkin wrote:
> On Tue, Sep 27, 2016 at 11:20:12PM +0200, Paolo Bonzini wrote:
>> Since bf9f6ac8d749 ("KVM: Update Posted-Interrupts Descriptor when vCPU
>> is blocked", 2015-09-18) the posted interrupt descriptor is checked
>> unconditionally for PIR.ON.  Therefore we don't need KVM_REQ_EVENT to
>> trigger the scan and, if NMIs or SMIs are not involved, we can avoid
>> the complicated event injection path.
>>
>> However, there is a race between vmx_deliver_posted_interrupt and
>> vcpu_enter_guest.  Fix it by disabling interrupts before vcpu->mode is
>> set to IN_GUEST_MODE.
> 
> Could you describe the race a bit more please?
> I'm surprised that locally disabling irqs
> fixes a race with a different CPUs.

The posted interrupt IPI has a dummy handler in arch/x86/kernel/irq.c
(smp_kvm_posted_intr_ipi).  It only does something if it is received
while the guest is running.

So local_irq_disable has an interesting side effect.  Because the
interrupt will not be processed until the guest is entered,
local_irq_disable effectively switches the IRQ handler from the dummy
handler to the processor's posted interrupt handling.

So you want to do that before setting IN_GUEST_MODE, otherwise the IPI
sent by deliver_posted_interrupt is ignored.

However, the patch is wrong, because this bit:

        if (kvm_lapic_enabled(vcpu)) {
                /*
                 * Update architecture specific hints for APIC
                 * virtual interrupt delivery.
                 */
                if (vcpu->arch.apicv_active)
                        kvm_x86_ops->hwapic_irr_update(vcpu,
                                kvm_lapic_find_highest_irr(vcpu));
        }

also has to be moved after setting IN_GUEST_MODE.  Basically the order
for interrupt injection is:

(1)	set PIR
	smp_wmb()
(2)	set ON
	smp_mb()
(3)	read vcpu->mode
	if IN_GUEST_MODE
(4a)		send posted interrupt IPI
	else
(4b)		kick (i.e. cmpxchg vcpu->mode from IN_GUEST_MODE to
		      EXITING_GUEST_MODE and send reschedule IPI)

while the order for entering the guest must be the opposite.  The
numbers on the left identify the pairing between interrupt injection and
vcpu_entr_guest

(4a)	enable posted interrupt processing (i.e. disable interrupts!)
(3)	set vcpu->mode to IN_GUEST_MODE
	smp_mb()
(2)	read ON
	if ON then
(1)		read PIR
		sync PIR to IRR
(4b)	read vcpu->mode
	if vcpu->mode == EXITING_GUEST_MODE then
		cancel vmentry
(3/2/1)		# posted interrupts are processed on the next vmentry

Paolo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH 2/3] kvm: x86: do not use KVM_REQ_EVENT for APICv interrupt injection
  2016-09-27 21:20 ` [PATCH 2/3] kvm: x86: do not use KVM_REQ_EVENT for APICv interrupt injection Paolo Bonzini
  2016-09-27 23:07   ` Michael S. Tsirkin
@ 2016-09-28 10:04   ` Wu, Feng
  2016-09-28 10:16     ` Paolo Bonzini
  2016-10-14  7:12   ` Yang Zhang
  2 siblings, 1 reply; 29+ messages in thread
From: Wu, Feng @ 2016-09-28 10:04 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm; +Cc: yang.zhang.wz, mst, rkrcmar, Wu, Feng



> -----Original Message-----
> From: Paolo Bonzini [mailto:paolo.bonzini@gmail.com] On Behalf Of Paolo
> Bonzini
> Sent: Wednesday, September 28, 2016 5:20 AM
> To: linux-kernel@vger.kernel.org; kvm@vger.kernel.org
> Cc: yang.zhang.wz@gmail.com; Wu, Feng <feng.wu@intel.com>;
> mst@redhat.com; rkrcmar@redhat.com
> Subject: [PATCH 2/3] kvm: x86: do not use KVM_REQ_EVENT for APICv interrupt
> injection
> 
> Since bf9f6ac8d749 ("KVM: Update Posted-Interrupts Descriptor when vCPU
> is blocked", 2015-09-18) the posted interrupt descriptor is checked
> unconditionally for PIR.ON.  Therefore we don't need KVM_REQ_EVENT to
> trigger the scan and, if NMIs or SMIs are not involved, we can avoid
> the complicated event injection path.

But the following code still remains in the KVM_REQ_EVENT checking part:

if (kvm_lapic_enabled(vcpu)) {
	update_cr8_intercept(vcpu);
	kvm_lapic_sync_to_vapic(vcpu);
}

Does this matter?

Thanks,
Feng

> 
> However, there is a race between vmx_deliver_posted_interrupt and
> vcpu_enter_guest.  Fix it by disabling interrupts before vcpu->mode is
> set to IN_GUEST_MODE.
> 
> Calling kvm_vcpu_kick if PIR.ON=1 is also useless, though it has been
> there since APICv was introduced.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/x86/kvm/lapic.c | 2 --
>  arch/x86/kvm/vmx.c   | 8 +++++---
>  arch/x86/kvm/x86.c   | 9 +++++++--
>  3 files changed, 12 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 63a442aefc12..be8b7ad56dd1 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -356,8 +356,6 @@ void kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32
> *pir)
>  	struct kvm_lapic *apic = vcpu->arch.apic;
> 
>  	__kvm_apic_update_irr(pir, apic->regs);
> -
> -	kvm_make_request(KVM_REQ_EVENT, vcpu);
>  }
>  EXPORT_SYMBOL_GPL(kvm_apic_update_irr);
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index b33eee395b00..207b9aa32915 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -4844,9 +4844,11 @@ static void vmx_deliver_posted_interrupt(struct
> kvm_vcpu *vcpu, int vector)
>  	if (pi_test_and_set_pir(vector, &vmx->pi_desc))
>  		return;
> 
> -	r = pi_test_and_set_on(&vmx->pi_desc);
> -	kvm_make_request(KVM_REQ_EVENT, vcpu);
> -	if (r || !kvm_vcpu_trigger_posted_interrupt(vcpu))
> +	/* If a previous notification has sent the IPI, nothing to do.  */
> +	if (pi_test_and_set_on(&vmx->pi_desc))
> +		return;
> +
> +	if (!kvm_vcpu_trigger_posted_interrupt(vcpu))
>  		kvm_vcpu_kick(vcpu);
>  }
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 3ee8a91a78c3..604cfbfc5bee 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -6658,6 +6658,13 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  	kvm_x86_ops->prepare_guest_switch(vcpu);
>  	if (vcpu->fpu_active)
>  		kvm_load_guest_fpu(vcpu);
> +
> +	/*
> +	 * Disable IRQs before setting IN_GUEST_MODE, so that
> +	 * posted interrupts with vcpu->mode == IN_GUEST_MODE
> +	 * always result in virtual interrupt delivery.
> +	 */
> +	local_irq_disable();
>  	vcpu->mode = IN_GUEST_MODE;
> 
>  	srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
> @@ -6671,8 +6678,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  	 */
>  	smp_mb__after_srcu_read_unlock();
> 
> -	local_irq_disable();
> -
>  	if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests
>  	    || need_resched() || signal_pending(current)) {
>  		vcpu->mode = OUTSIDE_GUEST_MODE;
> --
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] kvm: x86: do not use KVM_REQ_EVENT for APICv interrupt injection
  2016-09-28 10:04   ` Wu, Feng
@ 2016-09-28 10:16     ` Paolo Bonzini
  2016-09-28 11:53       ` Wu, Feng
  0 siblings, 1 reply; 29+ messages in thread
From: Paolo Bonzini @ 2016-09-28 10:16 UTC (permalink / raw)
  To: Wu, Feng, linux-kernel, kvm; +Cc: yang.zhang.wz, mst, rkrcmar



On 28/09/2016 12:04, Wu, Feng wrote:
> 
> 
>> -----Original Message-----
>> From: Paolo Bonzini [mailto:paolo.bonzini@gmail.com] On Behalf Of Paolo
>> Bonzini
>> Sent: Wednesday, September 28, 2016 5:20 AM
>> To: linux-kernel@vger.kernel.org; kvm@vger.kernel.org
>> Cc: yang.zhang.wz@gmail.com; Wu, Feng <feng.wu@intel.com>;
>> mst@redhat.com; rkrcmar@redhat.com
>> Subject: [PATCH 2/3] kvm: x86: do not use KVM_REQ_EVENT for APICv interrupt
>> injection
>>
>> Since bf9f6ac8d749 ("KVM: Update Posted-Interrupts Descriptor when vCPU
>> is blocked", 2015-09-18) the posted interrupt descriptor is checked
>> unconditionally for PIR.ON.  Therefore we don't need KVM_REQ_EVENT to
>> trigger the scan and, if NMIs or SMIs are not involved, we can avoid
>> the complicated event injection path.
> 
> But the following code still remains in the KVM_REQ_EVENT checking part:
> 
> if (kvm_lapic_enabled(vcpu)) {
> 	update_cr8_intercept(vcpu);
> 	kvm_lapic_sync_to_vapic(vcpu);
> }
> 
> Does this matter?

Good question, but it doesn't matter for APICv because:

- update_cr8_intercept is disabled if APICv, see vmx.c:

        if (enable_apicv)
                kvm_x86_ops->update_cr8_intercept = NULL;

- kvm_lapic_sync_to_vapic's call to apic_sync_pv_eoi_to_guest is also
disabled if APICv:

        if (!pv_eoi_enabled(vcpu) ||
            apic->irr_pending ||
            apic->highest_isr_cache == -1 ||
            kvm_ioapic_handles_vector(apic, apic->highest_isr_cache))
                return;

(highest_isr_cache is always -1 for APICv)

- The TPR/ISR/IRR shadow that kvm_lapic_sync_to_vapic writes is only
read by the  paravirtualized TPR access code in the vAPIC ROM
(pc-bios/optionrom/kvmvapic.S in the QEMU tree). That code never runs if
you don't get TPR access vmexits, and indeed TPR access vmexits never
happen if KVM uses APICv (or even only the old-style TPR shadowing).

Paolo

> Thanks,
> Feng
> 
>>
>> However, there is a race between vmx_deliver_posted_interrupt and
>> vcpu_enter_guest.  Fix it by disabling interrupts before vcpu->mode is
>> set to IN_GUEST_MODE.
>>
>> Calling kvm_vcpu_kick if PIR.ON=1 is also useless, though it has been
>> there since APICv was introduced.
>>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>>  arch/x86/kvm/lapic.c | 2 --
>>  arch/x86/kvm/vmx.c   | 8 +++++---
>>  arch/x86/kvm/x86.c   | 9 +++++++--
>>  3 files changed, 12 insertions(+), 7 deletions(-)
>>
>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>> index 63a442aefc12..be8b7ad56dd1 100644
>> --- a/arch/x86/kvm/lapic.c
>> +++ b/arch/x86/kvm/lapic.c
>> @@ -356,8 +356,6 @@ void kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32
>> *pir)
>>  	struct kvm_lapic *apic = vcpu->arch.apic;
>>
>>  	__kvm_apic_update_irr(pir, apic->regs);
>> -
>> -	kvm_make_request(KVM_REQ_EVENT, vcpu);
>>  }
>>  EXPORT_SYMBOL_GPL(kvm_apic_update_irr);
>>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index b33eee395b00..207b9aa32915 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -4844,9 +4844,11 @@ static void vmx_deliver_posted_interrupt(struct
>> kvm_vcpu *vcpu, int vector)
>>  	if (pi_test_and_set_pir(vector, &vmx->pi_desc))
>>  		return;
>>
>> -	r = pi_test_and_set_on(&vmx->pi_desc);
>> -	kvm_make_request(KVM_REQ_EVENT, vcpu);
>> -	if (r || !kvm_vcpu_trigger_posted_interrupt(vcpu))
>> +	/* If a previous notification has sent the IPI, nothing to do.  */
>> +	if (pi_test_and_set_on(&vmx->pi_desc))
>> +		return;
>> +
>> +	if (!kvm_vcpu_trigger_posted_interrupt(vcpu))
>>  		kvm_vcpu_kick(vcpu);
>>  }
>>
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 3ee8a91a78c3..604cfbfc5bee 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -6658,6 +6658,13 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>>  	kvm_x86_ops->prepare_guest_switch(vcpu);
>>  	if (vcpu->fpu_active)
>>  		kvm_load_guest_fpu(vcpu);
>> +
>> +	/*
>> +	 * Disable IRQs before setting IN_GUEST_MODE, so that
>> +	 * posted interrupts with vcpu->mode == IN_GUEST_MODE
>> +	 * always result in virtual interrupt delivery.
>> +	 */
>> +	local_irq_disable();
>>  	vcpu->mode = IN_GUEST_MODE;
>>
>>  	srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
>> @@ -6671,8 +6678,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>>  	 */
>>  	smp_mb__after_srcu_read_unlock();
>>
>> -	local_irq_disable();
>> -
>>  	if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests
>>  	    || need_resched() || signal_pending(current)) {
>>  		vcpu->mode = OUTSIDE_GUEST_MODE;
>> --
>> 1.8.3.1
>>
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH 2/3] kvm: x86: do not use KVM_REQ_EVENT for APICv interrupt injection
  2016-09-28  8:21     ` Paolo Bonzini
@ 2016-09-28 11:40       ` Wu, Feng
  2016-09-28 11:50         ` Paolo Bonzini
  2016-09-28 13:46       ` Michael S. Tsirkin
  1 sibling, 1 reply; 29+ messages in thread
From: Wu, Feng @ 2016-09-28 11:40 UTC (permalink / raw)
  To: Paolo Bonzini, Michael S. Tsirkin
  Cc: linux-kernel, kvm, yang.zhang.wz, rkrcmar, Wu, Feng



> -----Original Message-----
> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> Sent: Wednesday, September 28, 2016 4:22 PM
> To: Michael S. Tsirkin <mst@redhat.com>
> Cc: linux-kernel@vger.kernel.org; kvm@vger.kernel.org;
> yang.zhang.wz@gmail.com; Wu, Feng <feng.wu@intel.com>;
> rkrcmar@redhat.com
> Subject: Re: [PATCH 2/3] kvm: x86: do not use KVM_REQ_EVENT for APICv
> interrupt injection
> 
> 
> 
> On 28/09/2016 01:07, Michael S. Tsirkin wrote:
> > On Tue, Sep 27, 2016 at 11:20:12PM +0200, Paolo Bonzini wrote:
> >> Since bf9f6ac8d749 ("KVM: Update Posted-Interrupts Descriptor when vCPU
> >> is blocked", 2015-09-18) the posted interrupt descriptor is checked
> >> unconditionally for PIR.ON.  Therefore we don't need KVM_REQ_EVENT to
> >> trigger the scan and, if NMIs or SMIs are not involved, we can avoid
> >> the complicated event injection path.
> >>
> >> However, there is a race between vmx_deliver_posted_interrupt and
> >> vcpu_enter_guest.  Fix it by disabling interrupts before vcpu->mode is
> >> set to IN_GUEST_MODE.
> >
> > Could you describe the race a bit more please?
> > I'm surprised that locally disabling irqs
> > fixes a race with a different CPUs.
> 
> The posted interrupt IPI has a dummy handler in arch/x86/kernel/irq.c
> (smp_kvm_posted_intr_ipi).  It only does something if it is received
> while the guest is running.
> 
> So local_irq_disable has an interesting side effect.  Because the
> interrupt will not be processed until the guest is entered,
> local_irq_disable effectively switches the IRQ handler from the dummy
> handler to the processor's posted interrupt handling.
> 
> So you want to do that before setting IN_GUEST_MODE, otherwise the IPI
> sent by deliver_posted_interrupt is ignored.

IIUIC, the issue you describe above is that IPI for posted-interrupts may be
issued between

vcpu->mode = IN_GUEST_MODE;

and

local_irq_disable();

But if that really happens, we will call kvm_vcpu_kick() in
vmx_deliver_posted_interrupt(), hence the vcpu->mode will be changed
to EXITING_GUEST_MODE, then we will goto cancel_injection in
vcpu_enter_guest, so the posted-interrupt will be delivered to guest
in the next vmentry. Seems I cannot see the problem. Do I miss something?

Thanks,
Feng

> 
> However, the patch is wrong, because this bit:
> 
>         if (kvm_lapic_enabled(vcpu)) {
>                 /*
>                  * Update architecture specific hints for APIC
>                  * virtual interrupt delivery.
>                  */
>                 if (vcpu->arch.apicv_active)
>                         kvm_x86_ops->hwapic_irr_update(vcpu,
>                                 kvm_lapic_find_highest_irr(vcpu));
>         }
> 
> also has to be moved after setting IN_GUEST_MODE.  Basically the order
> for interrupt injection is:
> 
> (1)	set PIR
> 	smp_wmb()
> (2)	set ON
> 	smp_mb()
> (3)	read vcpu->mode
> 	if IN_GUEST_MODE
> (4a)		send posted interrupt IPI
> 	else
> (4b)		kick (i.e. cmpxchg vcpu->mode from IN_GUEST_MODE to
> 		      EXITING_GUEST_MODE and send reschedule IPI)
> 
> while the order for entering the guest must be the opposite.  The
> numbers on the left identify the pairing between interrupt injection and
> vcpu_entr_guest
> 
> (4a)	enable posted interrupt processing (i.e. disable interrupts!)
> (3)	set vcpu->mode to IN_GUEST_MODE
> 	smp_mb()
> (2)	read ON
> 	if ON then
> (1)		read PIR
> 		sync PIR to IRR
> (4b)	read vcpu->mode
> 	if vcpu->mode == EXITING_GUEST_MODE then
> 		cancel vmentry
> (3/2/1)		# posted interrupts are processed on the next vmentry
> 
> Paolo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] kvm: x86: do not use KVM_REQ_EVENT for APICv interrupt injection
  2016-09-28 11:40       ` Wu, Feng
@ 2016-09-28 11:50         ` Paolo Bonzini
  2016-09-28 12:06           ` Wu, Feng
  2016-10-14  7:37           ` Yang Zhang
  0 siblings, 2 replies; 29+ messages in thread
From: Paolo Bonzini @ 2016-09-28 11:50 UTC (permalink / raw)
  To: Wu, Feng, Michael S. Tsirkin; +Cc: linux-kernel, kvm, yang.zhang.wz, rkrcmar



On 28/09/2016 13:40, Wu, Feng wrote:
> IIUIC, the issue you describe above is that IPI for posted-interrupts may be
> issued between
> 
> vcpu->mode = IN_GUEST_MODE;
> 
> and
> 
> local_irq_disable();
> 
> But if that really happens, we will call kvm_vcpu_kick() in
> vmx_deliver_posted_interrupt(), hence the vcpu->mode will be changed
> to EXITING_GUEST_MODE, then we will goto cancel_injection in
> vcpu_enter_guest, so the posted-interrupt will be delivered to guest
> in the next vmentry. Seems I cannot see the problem. Do I miss something?

No, if that happens kvm_trigger_posted_interrupt returns true, hence
kvm_vcpu_kick is not called.  With the fix, the IPI is processed as soon
as the guest enters non-root mode, and the interrupt is injected.


The other issue occurs when the IPI is sent between

                        kvm_x86_ops->hwapic_irr_update(vcpu,
                                kvm_lapic_find_highest_irr(vcpu));

and

	vcpu->mode = IN_GUEST_MODE;

In this case, kvm_vcpu_kick is called but it (correctly) doesn't do
anything because it sees vcpu->mode == OUTSIDE_GUEST_MODE.  Then the
guest is entered with PIR.ON, but the PI interrupt is not pending and
hence the interrupt is never delivered to the guest.  The fix for this
is to move the RVI update after IN_GUEST_MODE.  Then the source CPU uses
the posted interrupt IPI instead of kvm_cpu_kick, and everything works.

Paolo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH 2/3] kvm: x86: do not use KVM_REQ_EVENT for APICv interrupt injection
  2016-09-28 10:16     ` Paolo Bonzini
@ 2016-09-28 11:53       ` Wu, Feng
  2016-09-28 11:55         ` Paolo Bonzini
  0 siblings, 1 reply; 29+ messages in thread
From: Wu, Feng @ 2016-09-28 11:53 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm; +Cc: yang.zhang.wz, mst, rkrcmar, Wu, Feng



> -----Original Message-----
> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> Sent: Wednesday, September 28, 2016 6:17 PM
> To: Wu, Feng <feng.wu@intel.com>; linux-kernel@vger.kernel.org;
> kvm@vger.kernel.org
> Cc: yang.zhang.wz@gmail.com; mst@redhat.com; rkrcmar@redhat.com
> Subject: Re: [PATCH 2/3] kvm: x86: do not use KVM_REQ_EVENT for APICv
> interrupt injection
> 
> 
> 
> On 28/09/2016 12:04, Wu, Feng wrote:
> >
> >
> >> -----Original Message-----
> >> From: Paolo Bonzini [mailto:paolo.bonzini@gmail.com] On Behalf Of Paolo
> >> Bonzini
> >> Sent: Wednesday, September 28, 2016 5:20 AM
> >> To: linux-kernel@vger.kernel.org; kvm@vger.kernel.org
> >> Cc: yang.zhang.wz@gmail.com; Wu, Feng <feng.wu@intel.com>;
> >> mst@redhat.com; rkrcmar@redhat.com
> >> Subject: [PATCH 2/3] kvm: x86: do not use KVM_REQ_EVENT for APICv
> interrupt
> >> injection
> >>
> >> Since bf9f6ac8d749 ("KVM: Update Posted-Interrupts Descriptor when vCPU
> >> is blocked", 2015-09-18) the posted interrupt descriptor is checked
> >> unconditionally for PIR.ON.  Therefore we don't need KVM_REQ_EVENT to
> >> trigger the scan and, if NMIs or SMIs are not involved, we can avoid
> >> the complicated event injection path.
> >
> > But the following code still remains in the KVM_REQ_EVENT checking part:
> >
> > if (kvm_lapic_enabled(vcpu)) {
> > 	update_cr8_intercept(vcpu);
> > 	kvm_lapic_sync_to_vapic(vcpu);
> > }
> >
> > Does this matter?
> 
> Good question, but it doesn't matter for APICv because:
> 
> - update_cr8_intercept is disabled if APICv, see vmx.c:
> 
>         if (enable_apicv)
>                 kvm_x86_ops->update_cr8_intercept = NULL;

Which tree are you using, I am using linux kernel tree with Linux 4.8-rc7,
and I only see the following code in vmx.c

if(!cpu_has_vmx_tpr_shadow())
	kvm_x86_ops->update_cr8_intercept = NULL;

> 
> - kvm_lapic_sync_to_vapic's call to apic_sync_pv_eoi_to_guest is also
> disabled if APICv:
> 
>         if (!pv_eoi_enabled(vcpu) ||
>             apic->irr_pending ||
>             apic->highest_isr_cache == -1 ||
>             kvm_ioapic_handles_vector(apic, apic->highest_isr_cache))
>                 return;
> 
> (highest_isr_cache is always -1 for APICv)
> 
> - The TPR/ISR/IRR shadow that kvm_lapic_sync_to_vapic writes is only
> read by the  paravirtualized TPR access code in the vAPIC ROM
> (pc-bios/optionrom/kvmvapic.S in the QEMU tree). That code never runs if
> you don't get TPR access vmexits, and indeed TPR access vmexits never
> happen if KVM uses APICv (or even only the old-style TPR shadowing).

Good to know this, thanks for sharing!

Thanks,
Feng

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] kvm: x86: do not use KVM_REQ_EVENT for APICv interrupt injection
  2016-09-28 11:53       ` Wu, Feng
@ 2016-09-28 11:55         ` Paolo Bonzini
  2016-09-28 12:07           ` Wu, Feng
  0 siblings, 1 reply; 29+ messages in thread
From: Paolo Bonzini @ 2016-09-28 11:55 UTC (permalink / raw)
  To: Wu, Feng, linux-kernel, kvm; +Cc: yang.zhang.wz, mst, rkrcmar



On 28/09/2016 13:53, Wu, Feng wrote:
>> > - update_cr8_intercept is disabled if APICv, see vmx.c:
>> > 
>> >         if (enable_apicv)
>> >                 kvm_x86_ops->update_cr8_intercept = NULL;
> Which tree are you using, I am using linux kernel tree with Linux 4.8-rc7,
> and I only see the following code in vmx.c
> 
> if(!cpu_has_vmx_tpr_shadow())
> 	kvm_x86_ops->update_cr8_intercept = NULL;

You're right, it was an older tree.  On the other hand, 4.8-rc7 has this
in update_cr8_intercept:

        if (vcpu->arch.apicv_active)
                return;

:)

Paolo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH 2/3] kvm: x86: do not use KVM_REQ_EVENT for APICv interrupt injection
  2016-09-28 11:50         ` Paolo Bonzini
@ 2016-09-28 12:06           ` Wu, Feng
  2016-09-28 12:14             ` Paolo Bonzini
  2016-10-14  7:37           ` Yang Zhang
  1 sibling, 1 reply; 29+ messages in thread
From: Wu, Feng @ 2016-09-28 12:06 UTC (permalink / raw)
  To: Paolo Bonzini, Michael S. Tsirkin
  Cc: linux-kernel, kvm, yang.zhang.wz, rkrcmar, Wu, Feng



> -----Original Message-----
> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> Sent: Wednesday, September 28, 2016 7:50 PM
> To: Wu, Feng <feng.wu@intel.com>; Michael S. Tsirkin <mst@redhat.com>
> Cc: linux-kernel@vger.kernel.org; kvm@vger.kernel.org;
> yang.zhang.wz@gmail.com; rkrcmar@redhat.com
> Subject: Re: [PATCH 2/3] kvm: x86: do not use KVM_REQ_EVENT for APICv
> interrupt injection
> 
> 
> 
> On 28/09/2016 13:40, Wu, Feng wrote:
> > IIUIC, the issue you describe above is that IPI for posted-interrupts may be
> > issued between
> >
> > vcpu->mode = IN_GUEST_MODE;
> >
> > and
> >
> > local_irq_disable();
> >
> > But if that really happens, we will call kvm_vcpu_kick() in
> > vmx_deliver_posted_interrupt(), hence the vcpu->mode will be changed
> > to EXITING_GUEST_MODE, then we will goto cancel_injection in
> > vcpu_enter_guest, so the posted-interrupt will be delivered to guest
> > in the next vmentry. Seems I cannot see the problem. Do I miss something?
> 
> No, if that happens kvm_trigger_posted_interrupt returns true, hence
> kvm_vcpu_kick is not called.  

Oops, I missed the "!" before the function call ...

> With the fix, the IPI is processed as soon
> as the guest enters non-root mode, and the interrupt is injected.

Exactly!

> 
> 
> The other issue occurs when the IPI is sent between
> 
>                         kvm_x86_ops->hwapic_irr_update(vcpu,
>                                 kvm_lapic_find_highest_irr(vcpu));
> 
> and
> 
> 	vcpu->mode = IN_GUEST_MODE;
> 
> In this case, kvm_vcpu_kick is called but it (correctly) doesn't do
> anything because it sees vcpu->mode == OUTSIDE_GUEST_MODE.  Then the
> guest is entered with PIR.ON, but the PI interrupt is not pending and
> hence the interrupt is never delivered to the guest.  

Why "never", at least, the interrupt should be delivered to the guest in the next
vm-entry, right? I mean vm-entry -> vm-exit -> _vm-entry_ (interrupts will be
delivered at this vm-entery).

Thanks,
Feng

> The fix for this
> is to move the RVI update after IN_GUEST_MODE.  Then the source CPU uses
> the posted interrupt IPI instead of kvm_cpu_kick, and everything works.
> 
> Paolo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH 2/3] kvm: x86: do not use KVM_REQ_EVENT for APICv interrupt injection
  2016-09-28 11:55         ` Paolo Bonzini
@ 2016-09-28 12:07           ` Wu, Feng
  0 siblings, 0 replies; 29+ messages in thread
From: Wu, Feng @ 2016-09-28 12:07 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm; +Cc: yang.zhang.wz, mst, rkrcmar, Wu, Feng



> -----Original Message-----
> From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On
> Behalf Of Paolo Bonzini
> Sent: Wednesday, September 28, 2016 7:55 PM
> To: Wu, Feng <feng.wu@intel.com>; linux-kernel@vger.kernel.org;
> kvm@vger.kernel.org
> Cc: yang.zhang.wz@gmail.com; mst@redhat.com; rkrcmar@redhat.com
> Subject: Re: [PATCH 2/3] kvm: x86: do not use KVM_REQ_EVENT for APICv
> interrupt injection
> 
> 
> 
> On 28/09/2016 13:53, Wu, Feng wrote:
> >> > - update_cr8_intercept is disabled if APICv, see vmx.c:
> >> >
> >> >         if (enable_apicv)
> >> >                 kvm_x86_ops->update_cr8_intercept = NULL;
> > Which tree are you using, I am using linux kernel tree with Linux 4.8-rc7,
> > and I only see the following code in vmx.c
> >
> > if(!cpu_has_vmx_tpr_shadow())
> > 	kvm_x86_ops->update_cr8_intercept = NULL;
> 
> You're right, it was an older tree.  On the other hand, 4.8-rc7 has this
> in update_cr8_intercept:
> 
>         if (vcpu->arch.apicv_active)
>                 return;
> 
> :)

Oh, right! :)

Thanks,
Feng

> 
> Paolo
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] kvm: x86: do not use KVM_REQ_EVENT for APICv interrupt injection
  2016-09-28 12:06           ` Wu, Feng
@ 2016-09-28 12:14             ` Paolo Bonzini
  0 siblings, 0 replies; 29+ messages in thread
From: Paolo Bonzini @ 2016-09-28 12:14 UTC (permalink / raw)
  To: Wu, Feng, Michael S. Tsirkin; +Cc: linux-kernel, kvm, yang.zhang.wz, rkrcmar



On 28/09/2016 14:06, Wu, Feng wrote:
>> Then the
>> > guest is entered with PIR.ON, but the PI interrupt is not pending and
>> > hence the interrupt is never delivered to the guest.  
> Why "never", at least, the interrupt should be delivered to the guest in the next
> vm-entry, right? I mean vm-entry -> vm-exit -> _vm-entry_ (interrupts will be
> delivered at this vm-entery).

Sure, but you could in principle have a case where the vmexit never
happens (right now nohz_full CPUs have a 1 Hz timer tick, but that's
just a limitation of isolcpus).  When that happens, the interrupt might
never be delivered because it is not recorded in IRR.

Also, if the guest issues an EOI, the pending posted interrupt may be
reordered incorrectly with a lower-priority interrupt already in IRR.

But I'll re-check the wording in the commit message before posting the
non-RFC version.

Paolo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] kvm: x86: do not use KVM_REQ_EVENT for APICv interrupt injection
  2016-09-28  8:21     ` Paolo Bonzini
  2016-09-28 11:40       ` Wu, Feng
@ 2016-09-28 13:46       ` Michael S. Tsirkin
  2016-09-28 14:08         ` Paolo Bonzini
  1 sibling, 1 reply; 29+ messages in thread
From: Michael S. Tsirkin @ 2016-09-28 13:46 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm, yang.zhang.wz, feng.wu, rkrcmar

On Wed, Sep 28, 2016 at 10:21:41AM +0200, Paolo Bonzini wrote:
> 
> 
> On 28/09/2016 01:07, Michael S. Tsirkin wrote:
> > On Tue, Sep 27, 2016 at 11:20:12PM +0200, Paolo Bonzini wrote:
> >> Since bf9f6ac8d749 ("KVM: Update Posted-Interrupts Descriptor when vCPU
> >> is blocked", 2015-09-18) the posted interrupt descriptor is checked
> >> unconditionally for PIR.ON.  Therefore we don't need KVM_REQ_EVENT to
> >> trigger the scan and, if NMIs or SMIs are not involved, we can avoid
> >> the complicated event injection path.
> >>
> >> However, there is a race between vmx_deliver_posted_interrupt and
> >> vcpu_enter_guest.  Fix it by disabling interrupts before vcpu->mode is
> >> set to IN_GUEST_MODE.
> > 
> > Could you describe the race a bit more please?
> > I'm surprised that locally disabling irqs
> > fixes a race with a different CPUs.
> 
> The posted interrupt IPI has a dummy handler in arch/x86/kernel/irq.c
> (smp_kvm_posted_intr_ipi).  It only does something if it is received
> while the guest is running.
> 
> So local_irq_disable has an interesting side effect.  Because the
> interrupt will not be processed until the guest is entered,
> local_irq_disable effectively switches the IRQ handler from the dummy
> handler to the processor's posted interrupt handling.
> 
> So you want to do that before setting IN_GUEST_MODE, otherwise the IPI
> sent by deliver_posted_interrupt is ignored.
> 
> However, the patch is wrong, because this bit:
> 
>         if (kvm_lapic_enabled(vcpu)) {
>                 /*
>                  * Update architecture specific hints for APIC
>                  * virtual interrupt delivery.
>                  */
>                 if (vcpu->arch.apicv_active)
>                         kvm_x86_ops->hwapic_irr_update(vcpu,
>                                 kvm_lapic_find_highest_irr(vcpu));
>         }
> 
> also has to be moved after setting IN_GUEST_MODE.  Basically the order
> for interrupt injection is:
> 
> (1)	set PIR
> 	smp_wmb()

Empty on x86 btw but good for reasoning about barrier
pairing. So - where's the paired smp_rmb? See below.

> (2)	set ON
> 	smp_mb()

This one can be combined to smp_store_mb to save
a couple of cycles.

> (3)	read vcpu->mode
> 	if IN_GUEST_MODE
> (4a)		send posted interrupt IPI
> 	else
> (4b)		kick (i.e. cmpxchg vcpu->mode from IN_GUEST_MODE to
> 		      EXITING_GUEST_MODE and send reschedule IPI)
> 
> while the order for entering the guest must be the opposite.  The
> numbers on the left identify the pairing between interrupt injection and
> vcpu_entr_guest
> 
> (4a)	enable posted interrupt processing (i.e. disable interrupts!)
> (3)	set vcpu->mode to IN_GUEST_MODE
> 	smp_mb()

This one can be combined to smp_store_mb to save
a couple of cycles.



> (2)	read ON
> 	if ON then

do we need smp_rmb here?

> (1)		read PIR
> 		sync PIR to IRR
> (4b)	read vcpu->mode
> 	if vcpu->mode == EXITING_GUEST_MODE then
> 		cancel vmentry
> (3/2/1)		# posted interrupts are processed on the next vmentry
> 
> Paolo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/3] KVM: x86: do not scan IRR twice on APICv vmentry
  2016-09-27 21:20 ` [PATCH 3/3] KVM: x86: do not scan IRR twice on APICv vmentry Paolo Bonzini
@ 2016-09-28 14:00   ` Michael S. Tsirkin
  2016-09-28 14:44     ` Paolo Bonzini
  2016-09-29  2:51   ` Wu, Feng
  2016-10-14  7:32   ` Yang Zhang
  2 siblings, 1 reply; 29+ messages in thread
From: Michael S. Tsirkin @ 2016-09-28 14:00 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm, yang.zhang.wz, feng.wu, rkrcmar

On Tue, Sep 27, 2016 at 11:20:13PM +0200, Paolo Bonzini wrote:
> Calling apic_find_highest_irr results in IRR being scanned twice,
> once in vmx_sync_pir_from_irr and once in apic_search_irr.  Change
> vcpu_enter_guest to use sync_pir_from_irr (with a new argument to
> trigger the RVI write), and let sync_pir_from_irr get the new RVI
> directly from kvm_apic_update_irr.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Acked-by: Michael S. Tsirkin <mst@redhat.com>

> ---
>  arch/x86/include/asm/kvm_host.h |  2 +-
>  arch/x86/kvm/lapic.c            | 25 ++++++++++++++++---------
>  arch/x86/kvm/lapic.h            |  4 ++--
>  arch/x86/kvm/svm.c              |  2 +-
>  arch/x86/kvm/vmx.c              | 25 ++++++++++++++-----------
>  arch/x86/kvm/x86.c              |  7 +++----
>  6 files changed, 37 insertions(+), 28 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 4b20f7304b9c..f73eea243567 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -941,7 +941,7 @@ struct kvm_x86_ops {
>  	void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
>  	void (*set_apic_access_page_addr)(struct kvm_vcpu *vcpu, hpa_t hpa);
>  	void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
> -	void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
> +	void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu, bool sync_rvi);
>  	int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
>  	int (*get_tdp_level)(void);
>  	u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index be8b7ad56dd1..85b79b90e3d0 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -317,7 +317,7 @@ static int find_highest_vector(void *bitmap)
>  	     vec >= 0; vec -= APIC_VECTORS_PER_REG) {
>  		reg = bitmap + REG_POS(vec);
>  		if (*reg)
> -			return fls(*reg) - 1 + vec;
> +			return __fls(*reg) + vec;
>  	}
>  
>  	return -1;

We checked that *reg is != 0 so __fls is safe.
It's a correct micro-optimization but why stick it in this patch?

> @@ -337,25 +337,32 @@ static u8 count_vectors(void *bitmap)
>  	return count;
>  }
>  
> -void __kvm_apic_update_irr(u32 *pir, void *regs)
> +int __kvm_apic_update_irr(u32 *pir, void *regs)
>  {
> -	u32 i, pir_val;
> +	u32 i, vec;
> +	u32 pir_val, irr_val;
> +	int max_irr = -1;
>  
> -	for (i = 0; i <= 7; i++) {
> +	for (i = vec = 0; i <= 7; i++, vec += 32) {
>  		pir_val = READ_ONCE(pir[i]);
> +		irr_val = *((u32 *)(regs + APIC_IRR + i * 0x10));
>  		if (pir_val) {
> -			pir_val = xchg(&pir[i], 0);
> -			*((u32 *)(regs + APIC_IRR + i * 0x10)) |= pir_val;
> +			irr_val |= xchg(&pir[i], 0);
> +			*((u32 *)(regs + APIC_IRR + i * 0x10)) = irr_val;
>  		}
> +		if (irr_val)
> +			max_irr = __fls(irr_val) + vec;
>  	}
> +
> +	return max_irr;
>  }
>  EXPORT_SYMBOL_GPL(__kvm_apic_update_irr);
>  
> -void kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir)
> +int kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir)
>  {
>  	struct kvm_lapic *apic = vcpu->arch.apic;
>  
> -	__kvm_apic_update_irr(pir, apic->regs);
> +	return __kvm_apic_update_irr(pir, apic->regs);
>  }
>  EXPORT_SYMBOL_GPL(kvm_apic_update_irr);
>  
> @@ -376,7 +383,7 @@ static inline int apic_find_highest_irr(struct kvm_lapic *apic)
>  		return -1;
>  
>  	if (apic->vcpu->arch.apicv_active)
> -		kvm_x86_ops->sync_pir_to_irr(apic->vcpu);
> +		kvm_x86_ops->sync_pir_to_irr(apic->vcpu, false);
>  	result = apic_search_irr(apic);
>  	ASSERT(result == -1 || result >= 16);
>  
> diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
> index f60d01c29d51..fc72828c3782 100644
> --- a/arch/x86/kvm/lapic.h
> +++ b/arch/x86/kvm/lapic.h
> @@ -70,8 +70,8 @@ int kvm_lapic_reg_read(struct kvm_lapic *apic, u32 offset, int len,
>  bool kvm_apic_match_dest(struct kvm_vcpu *vcpu, struct kvm_lapic *source,
>  			   int short_hand, unsigned int dest, int dest_mode);
>  
> -void __kvm_apic_update_irr(u32 *pir, void *regs);
> -void kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir);
> +int __kvm_apic_update_irr(u32 *pir, void *regs);
> +int kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir);
>  int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq,
>  		     struct dest_map *dest_map);
>  int kvm_apic_local_deliver(struct kvm_lapic *apic, int lvt_type);
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 9b4221471e3d..60e53fa2b554 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -4383,7 +4383,7 @@ static void svm_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)
>  	return;
>  }
>  
> -static void svm_sync_pir_to_irr(struct kvm_vcpu *vcpu)
> +static void svm_sync_pir_to_irr(struct kvm_vcpu *vcpu, bool sync_rvi)
>  {
>  	return;
>  }
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 207b9aa32915..1edefab54d01 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -4852,17 +4852,6 @@ static void vmx_deliver_posted_interrupt(struct kvm_vcpu *vcpu, int vector)
>  		kvm_vcpu_kick(vcpu);
>  }
>  
> -static void vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
> -{
> -	struct vcpu_vmx *vmx = to_vmx(vcpu);
> -
> -	if (!pi_test_on(&vmx->pi_desc) ||
> -	    !pi_test_and_clear_on(&vmx->pi_desc))
> -		return;
> -
> -	kvm_apic_update_irr(vcpu, vmx->pi_desc.pir);
> -}
> -
>  /*
>   * Set up the vmcs's constant host-state fields, i.e., host-state fields that
>   * will not change in the lifetime of the guest.
> @@ -8609,6 +8598,20 @@ static void vmx_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr)
>  	}
>  }
>  
> +static void vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu, bool sync_rvi)
> +{
> +	struct vcpu_vmx *vmx = to_vmx(vcpu);
> +	int max_irr;
> +
> +	if (!pi_test_on(&vmx->pi_desc) ||
> +	    !pi_test_and_clear_on(&vmx->pi_desc))
> +		return;
> +
> +	max_irr = kvm_apic_update_irr(vcpu, vmx->pi_desc.pir);
> +	if (sync_rvi)
> +		vmx_hwapic_irr_update(vcpu, max_irr);
> +}
> +
>  static void vmx_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)
>  {
>  	if (!kvm_vcpu_apicv_active(vcpu))
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 604cfbfc5bee..148e14fdc55d 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2821,7 +2821,7 @@ static int kvm_vcpu_ioctl_get_lapic(struct kvm_vcpu *vcpu,
>  				    struct kvm_lapic_state *s)
>  {
>  	if (vcpu->arch.apicv_active)
> -		kvm_x86_ops->sync_pir_to_irr(vcpu);
> +		kvm_x86_ops->sync_pir_to_irr(vcpu, false);
>  
>  	return kvm_apic_get_state(vcpu, s);
>  }
> @@ -6448,7 +6448,7 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
>  		kvm_scan_ioapic_routes(vcpu, vcpu->arch.ioapic_handled_vectors);
>  	else {
>  		if (vcpu->arch.apicv_active)
> -			kvm_x86_ops->sync_pir_to_irr(vcpu);
> +			kvm_x86_ops->sync_pir_to_irr(vcpu, false);
>  		kvm_ioapic_scan_entry(vcpu, vcpu->arch.ioapic_handled_vectors);
>  	}
>  	bitmap_or((ulong *)eoi_exit_bitmap, vcpu->arch.ioapic_handled_vectors,
> @@ -6611,8 +6611,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  		 * virtual interrupt delivery.
>  		 */
>  		if (vcpu->arch.apicv_active)
> -			kvm_x86_ops->hwapic_irr_update(vcpu,
> -				kvm_lapic_find_highest_irr(vcpu));
> +			kvm_x86_ops->sync_pir_to_irr(vcpu, true);
>  	}
>  
>  	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
> -- 
> 1.8.3.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] kvm: x86: do not use KVM_REQ_EVENT for APICv interrupt injection
  2016-09-28 13:46       ` Michael S. Tsirkin
@ 2016-09-28 14:08         ` Paolo Bonzini
  0 siblings, 0 replies; 29+ messages in thread
From: Paolo Bonzini @ 2016-09-28 14:08 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: linux-kernel, kvm, yang.zhang.wz, feng.wu, rkrcmar



On 28/09/2016 15:46, Michael S. Tsirkin wrote:
> On Wed, Sep 28, 2016 at 10:21:41AM +0200, Paolo Bonzini wrote:
>> Basically the order for interrupt injection is:
>>
>> (1)	set PIR
>> 	smp_wmb()
> 
> Empty on x86 btw but good for reasoning about barrier
> pairing. So - where's the paired smp_rmb? See below.
> 
>> (2)	set ON
>> 	smp_mb()
> 
> This one can be combined to smp_store_mb to save
> a couple of cycles.

Yeah, this was very much a pseudo-code view.  In reality it's a
test_and_set_bit().  If ON=1 already there's no need to go on with 3/4.

>> (3)	read vcpu->mode
>> 	if IN_GUEST_MODE
>> (4a)		send posted interrupt IPI
>> 	else
>> (4b)		kick (i.e. cmpxchg vcpu->mode from IN_GUEST_MODE to
>> 		      EXITING_GUEST_MODE and send reschedule IPI)
>>
>> while the order for entering the guest must be the opposite.  The
>> numbers on the left identify the pairing between interrupt injection and
>> vcpu_entr_guest
>>
>> (4a)	enable posted interrupt processing (i.e. disable interrupts!)
>> (3)	set vcpu->mode to IN_GUEST_MODE
>> 	smp_mb()
> 
> This one can be combined to smp_store_mb to save
> a couple of cycles.

Here the actual code has smp_mb__after_srcu_unlock.

>> (2)	read ON
>> 	if ON then
> 
> do we need smp_rmb here?

Yes, we have test_and_clear here which has an implicit barrier.

Paolo

> 
>> (1)		read PIR
>> 		sync PIR to IRR
>> (4b)	read vcpu->mode
>> 	if vcpu->mode == EXITING_GUEST_MODE then
>> 		cancel vmentry
>> (3/2/1)		# posted interrupts are processed on the next vmentry
>>
>> Paolo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/3] KVM: x86: do not scan IRR twice on APICv vmentry
  2016-09-28 14:00   ` Michael S. Tsirkin
@ 2016-09-28 14:44     ` Paolo Bonzini
  0 siblings, 0 replies; 29+ messages in thread
From: Paolo Bonzini @ 2016-09-28 14:44 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: linux-kernel, kvm, yang.zhang.wz, feng.wu, rkrcmar



On 28/09/2016 16:00, Michael S. Tsirkin wrote:
> On Tue, Sep 27, 2016 at 11:20:13PM +0200, Paolo Bonzini wrote:
>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>> index be8b7ad56dd1..85b79b90e3d0 100644
>> --- a/arch/x86/kvm/lapic.c
>> +++ b/arch/x86/kvm/lapic.c
>> @@ -317,7 +317,7 @@ static int find_highest_vector(void *bitmap)
>>  	     vec >= 0; vec -= APIC_VECTORS_PER_REG) {
>>  		reg = bitmap + REG_POS(vec);
>>  		if (*reg)
>> -			return fls(*reg) - 1 + vec;
>> +			return __fls(*reg) + vec;
>>  	}
>>  
>>  	return -1;
> 
> We checked that *reg is != 0 so __fls is safe.
> It's a correct micro-optimization but why stick it in this patch?

Just because I'm using __fls below in __kvm_apic_update_irr.

Paolo

>> @@ -337,25 +337,32 @@ static u8 count_vectors(void *bitmap)
>>  	return count;
>>  }
>>  
>> -void __kvm_apic_update_irr(u32 *pir, void *regs)
>> +int __kvm_apic_update_irr(u32 *pir, void *regs)
>>  {
>> -	u32 i, pir_val;
>> +	u32 i, vec;
>> +	u32 pir_val, irr_val;
>> +	int max_irr = -1;
>>  
>> -	for (i = 0; i <= 7; i++) {
>> +	for (i = vec = 0; i <= 7; i++, vec += 32) {
>>  		pir_val = READ_ONCE(pir[i]);
>> +		irr_val = *((u32 *)(regs + APIC_IRR + i * 0x10));
>>  		if (pir_val) {
>> -			pir_val = xchg(&pir[i], 0);
>> -			*((u32 *)(regs + APIC_IRR + i * 0x10)) |= pir_val;
>> +			irr_val |= xchg(&pir[i], 0);
>> +			*((u32 *)(regs + APIC_IRR + i * 0x10)) = irr_val;
>>  		}
>> +		if (irr_val)
>> +			max_irr = __fls(irr_val) + vec;
>>  	}
>> +
>> +	return max_irr;
>>  }
>>  EXPORT_SYMBOL_GPL(__kvm_apic_update_irr);
>>  
>> -void kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir)
>> +int kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir)
>>  {
>>  	struct kvm_lapic *apic = vcpu->arch.apic;
>>  
>> -	__kvm_apic_update_irr(pir, apic->regs);
>> +	return __kvm_apic_update_irr(pir, apic->regs);
>>  }
>>  EXPORT_SYMBOL_GPL(kvm_apic_update_irr);
>>  
>> @@ -376,7 +383,7 @@ static inline int apic_find_highest_irr(struct kvm_lapic *apic)
>>  		return -1;
>>  
>>  	if (apic->vcpu->arch.apicv_active)
>> -		kvm_x86_ops->sync_pir_to_irr(apic->vcpu);
>> +		kvm_x86_ops->sync_pir_to_irr(apic->vcpu, false);
>>  	result = apic_search_irr(apic);
>>  	ASSERT(result == -1 || result >= 16);
>>  
>> diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
>> index f60d01c29d51..fc72828c3782 100644
>> --- a/arch/x86/kvm/lapic.h
>> +++ b/arch/x86/kvm/lapic.h
>> @@ -70,8 +70,8 @@ int kvm_lapic_reg_read(struct kvm_lapic *apic, u32 offset, int len,
>>  bool kvm_apic_match_dest(struct kvm_vcpu *vcpu, struct kvm_lapic *source,
>>  			   int short_hand, unsigned int dest, int dest_mode);
>>  
>> -void __kvm_apic_update_irr(u32 *pir, void *regs);
>> -void kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir);
>> +int __kvm_apic_update_irr(u32 *pir, void *regs);
>> +int kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir);
>>  int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq,
>>  		     struct dest_map *dest_map);
>>  int kvm_apic_local_deliver(struct kvm_lapic *apic, int lvt_type);
>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>> index 9b4221471e3d..60e53fa2b554 100644
>> --- a/arch/x86/kvm/svm.c
>> +++ b/arch/x86/kvm/svm.c
>> @@ -4383,7 +4383,7 @@ static void svm_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)
>>  	return;
>>  }
>>  
>> -static void svm_sync_pir_to_irr(struct kvm_vcpu *vcpu)
>> +static void svm_sync_pir_to_irr(struct kvm_vcpu *vcpu, bool sync_rvi)
>>  {
>>  	return;
>>  }
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index 207b9aa32915..1edefab54d01 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -4852,17 +4852,6 @@ static void vmx_deliver_posted_interrupt(struct kvm_vcpu *vcpu, int vector)
>>  		kvm_vcpu_kick(vcpu);
>>  }
>>  
>> -static void vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
>> -{
>> -	struct vcpu_vmx *vmx = to_vmx(vcpu);
>> -
>> -	if (!pi_test_on(&vmx->pi_desc) ||
>> -	    !pi_test_and_clear_on(&vmx->pi_desc))
>> -		return;
>> -
>> -	kvm_apic_update_irr(vcpu, vmx->pi_desc.pir);
>> -}
>> -
>>  /*
>>   * Set up the vmcs's constant host-state fields, i.e., host-state fields that
>>   * will not change in the lifetime of the guest.
>> @@ -8609,6 +8598,20 @@ static void vmx_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr)
>>  	}
>>  }
>>  
>> +static void vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu, bool sync_rvi)
>> +{
>> +	struct vcpu_vmx *vmx = to_vmx(vcpu);
>> +	int max_irr;
>> +
>> +	if (!pi_test_on(&vmx->pi_desc) ||
>> +	    !pi_test_and_clear_on(&vmx->pi_desc))
>> +		return;
>> +
>> +	max_irr = kvm_apic_update_irr(vcpu, vmx->pi_desc.pir);
>> +	if (sync_rvi)
>> +		vmx_hwapic_irr_update(vcpu, max_irr);
>> +}
>> +
>>  static void vmx_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)
>>  {
>>  	if (!kvm_vcpu_apicv_active(vcpu))
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 604cfbfc5bee..148e14fdc55d 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -2821,7 +2821,7 @@ static int kvm_vcpu_ioctl_get_lapic(struct kvm_vcpu *vcpu,
>>  				    struct kvm_lapic_state *s)
>>  {
>>  	if (vcpu->arch.apicv_active)
>> -		kvm_x86_ops->sync_pir_to_irr(vcpu);
>> +		kvm_x86_ops->sync_pir_to_irr(vcpu, false);
>>  
>>  	return kvm_apic_get_state(vcpu, s);
>>  }
>> @@ -6448,7 +6448,7 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
>>  		kvm_scan_ioapic_routes(vcpu, vcpu->arch.ioapic_handled_vectors);
>>  	else {
>>  		if (vcpu->arch.apicv_active)
>> -			kvm_x86_ops->sync_pir_to_irr(vcpu);
>> +			kvm_x86_ops->sync_pir_to_irr(vcpu, false);
>>  		kvm_ioapic_scan_entry(vcpu, vcpu->arch.ioapic_handled_vectors);
>>  	}
>>  	bitmap_or((ulong *)eoi_exit_bitmap, vcpu->arch.ioapic_handled_vectors,
>> @@ -6611,8 +6611,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>>  		 * virtual interrupt delivery.
>>  		 */
>>  		if (vcpu->arch.apicv_active)
>> -			kvm_x86_ops->hwapic_irr_update(vcpu,
>> -				kvm_lapic_find_highest_irr(vcpu));
>> +			kvm_x86_ops->sync_pir_to_irr(vcpu, true);
>>  	}
>>  
>>  	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
>> -- 
>> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH 3/3] KVM: x86: do not scan IRR twice on APICv vmentry
  2016-09-27 21:20 ` [PATCH 3/3] KVM: x86: do not scan IRR twice on APICv vmentry Paolo Bonzini
  2016-09-28 14:00   ` Michael S. Tsirkin
@ 2016-09-29  2:51   ` Wu, Feng
  2016-10-14  7:32   ` Yang Zhang
  2 siblings, 0 replies; 29+ messages in thread
From: Wu, Feng @ 2016-09-29  2:51 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm; +Cc: yang.zhang.wz, mst, rkrcmar, Wu, Feng



> -----Original Message-----
> From: Paolo Bonzini [mailto:paolo.bonzini@gmail.com] On Behalf Of Paolo
> Bonzini
> Sent: Wednesday, September 28, 2016 5:20 AM
> To: linux-kernel@vger.kernel.org; kvm@vger.kernel.org
> Cc: yang.zhang.wz@gmail.com; Wu, Feng <feng.wu@intel.com>;
> mst@redhat.com; rkrcmar@redhat.com
> Subject: [PATCH 3/3] KVM: x86: do not scan IRR twice on APICv vmentry
> 
> Calling apic_find_highest_irr results in IRR being scanned twice,
> once in vmx_sync_pir_from_irr and once in apic_search_irr.  Change
> vcpu_enter_guest to use sync_pir_from_irr (with a new argument to
> trigger the RVI write), and let sync_pir_from_irr get the new RVI
> directly from kvm_apic_update_irr.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Reviewed-by: Feng Wu <feng.wu@intel.com>

Thanks,
Feng

> ---
>  arch/x86/include/asm/kvm_host.h |  2 +-
>  arch/x86/kvm/lapic.c            | 25 ++++++++++++++++---------
>  arch/x86/kvm/lapic.h            |  4 ++--
>  arch/x86/kvm/svm.c              |  2 +-
>  arch/x86/kvm/vmx.c              | 25 ++++++++++++++-----------
>  arch/x86/kvm/x86.c              |  7 +++----
>  6 files changed, 37 insertions(+), 28 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h
> b/arch/x86/include/asm/kvm_host.h
> index 4b20f7304b9c..f73eea243567 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -941,7 +941,7 @@ struct kvm_x86_ops {
>  	void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
>  	void (*set_apic_access_page_addr)(struct kvm_vcpu *vcpu, hpa_t hpa);
>  	void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
> -	void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
> +	void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu, bool sync_rvi);
>  	int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
>  	int (*get_tdp_level)(void);
>  	u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index be8b7ad56dd1..85b79b90e3d0 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -317,7 +317,7 @@ static int find_highest_vector(void *bitmap)
>  	     vec >= 0; vec -= APIC_VECTORS_PER_REG) {
>  		reg = bitmap + REG_POS(vec);
>  		if (*reg)
> -			return fls(*reg) - 1 + vec;
> +			return __fls(*reg) + vec;
>  	}
> 
>  	return -1;
> @@ -337,25 +337,32 @@ static u8 count_vectors(void *bitmap)
>  	return count;
>  }
> 
> -void __kvm_apic_update_irr(u32 *pir, void *regs)
> +int __kvm_apic_update_irr(u32 *pir, void *regs)
>  {
> -	u32 i, pir_val;
> +	u32 i, vec;
> +	u32 pir_val, irr_val;
> +	int max_irr = -1;
> 
> -	for (i = 0; i <= 7; i++) {
> +	for (i = vec = 0; i <= 7; i++, vec += 32) {
>  		pir_val = READ_ONCE(pir[i]);
> +		irr_val = *((u32 *)(regs + APIC_IRR + i * 0x10));
>  		if (pir_val) {
> -			pir_val = xchg(&pir[i], 0);
> -			*((u32 *)(regs + APIC_IRR + i * 0x10)) |= pir_val;
> +			irr_val |= xchg(&pir[i], 0);
> +			*((u32 *)(regs + APIC_IRR + i * 0x10)) = irr_val;
>  		}
> +		if (irr_val)
> +			max_irr = __fls(irr_val) + vec;
>  	}
> +
> +	return max_irr;
>  }
>  EXPORT_SYMBOL_GPL(__kvm_apic_update_irr);
> 
> -void kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir)
> +int kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir)
>  {
>  	struct kvm_lapic *apic = vcpu->arch.apic;
> 
> -	__kvm_apic_update_irr(pir, apic->regs);
> +	return __kvm_apic_update_irr(pir, apic->regs);
>  }
>  EXPORT_SYMBOL_GPL(kvm_apic_update_irr);
> 
> @@ -376,7 +383,7 @@ static inline int apic_find_highest_irr(struct kvm_lapic
> *apic)
>  		return -1;
> 
>  	if (apic->vcpu->arch.apicv_active)
> -		kvm_x86_ops->sync_pir_to_irr(apic->vcpu);
> +		kvm_x86_ops->sync_pir_to_irr(apic->vcpu, false);
>  	result = apic_search_irr(apic);
>  	ASSERT(result == -1 || result >= 16);
> 
> diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
> index f60d01c29d51..fc72828c3782 100644
> --- a/arch/x86/kvm/lapic.h
> +++ b/arch/x86/kvm/lapic.h
> @@ -70,8 +70,8 @@ int kvm_lapic_reg_read(struct kvm_lapic *apic, u32 offset,
> int len,
>  bool kvm_apic_match_dest(struct kvm_vcpu *vcpu, struct kvm_lapic *source,
>  			   int short_hand, unsigned int dest, int dest_mode);
> 
> -void __kvm_apic_update_irr(u32 *pir, void *regs);
> -void kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir);
> +int __kvm_apic_update_irr(u32 *pir, void *regs);
> +int kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir);
>  int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq,
>  		     struct dest_map *dest_map);
>  int kvm_apic_local_deliver(struct kvm_lapic *apic, int lvt_type);
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 9b4221471e3d..60e53fa2b554 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -4383,7 +4383,7 @@ static void svm_load_eoi_exitmap(struct kvm_vcpu
> *vcpu, u64 *eoi_exit_bitmap)
>  	return;
>  }
> 
> -static void svm_sync_pir_to_irr(struct kvm_vcpu *vcpu)
> +static void svm_sync_pir_to_irr(struct kvm_vcpu *vcpu, bool sync_rvi)
>  {
>  	return;
>  }
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 207b9aa32915..1edefab54d01 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -4852,17 +4852,6 @@ static void vmx_deliver_posted_interrupt(struct
> kvm_vcpu *vcpu, int vector)
>  		kvm_vcpu_kick(vcpu);
>  }
> 
> -static void vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
> -{
> -	struct vcpu_vmx *vmx = to_vmx(vcpu);
> -
> -	if (!pi_test_on(&vmx->pi_desc) ||
> -	    !pi_test_and_clear_on(&vmx->pi_desc))
> -		return;
> -
> -	kvm_apic_update_irr(vcpu, vmx->pi_desc.pir);
> -}
> -
>  /*
>   * Set up the vmcs's constant host-state fields, i.e., host-state fields that
>   * will not change in the lifetime of the guest.
> @@ -8609,6 +8598,20 @@ static void vmx_hwapic_irr_update(struct kvm_vcpu
> *vcpu, int max_irr)
>  	}
>  }
> 
> +static void vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu, bool sync_rvi)
> +{
> +	struct vcpu_vmx *vmx = to_vmx(vcpu);
> +	int max_irr;
> +
> +	if (!pi_test_on(&vmx->pi_desc) ||
> +	    !pi_test_and_clear_on(&vmx->pi_desc))
> +		return;
> +
> +	max_irr = kvm_apic_update_irr(vcpu, vmx->pi_desc.pir);
> +	if (sync_rvi)
> +		vmx_hwapic_irr_update(vcpu, max_irr);
> +}
> +
>  static void vmx_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64
> *eoi_exit_bitmap)
>  {
>  	if (!kvm_vcpu_apicv_active(vcpu))
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 604cfbfc5bee..148e14fdc55d 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2821,7 +2821,7 @@ static int kvm_vcpu_ioctl_get_lapic(struct kvm_vcpu
> *vcpu,
>  				    struct kvm_lapic_state *s)
>  {
>  	if (vcpu->arch.apicv_active)
> -		kvm_x86_ops->sync_pir_to_irr(vcpu);
> +		kvm_x86_ops->sync_pir_to_irr(vcpu, false);
> 
>  	return kvm_apic_get_state(vcpu, s);
>  }
> @@ -6448,7 +6448,7 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
>  		kvm_scan_ioapic_routes(vcpu, vcpu-
> >arch.ioapic_handled_vectors);
>  	else {
>  		if (vcpu->arch.apicv_active)
> -			kvm_x86_ops->sync_pir_to_irr(vcpu);
> +			kvm_x86_ops->sync_pir_to_irr(vcpu, false);
>  		kvm_ioapic_scan_entry(vcpu, vcpu-
> >arch.ioapic_handled_vectors);
>  	}
>  	bitmap_or((ulong *)eoi_exit_bitmap, vcpu-
> >arch.ioapic_handled_vectors,
> @@ -6611,8 +6611,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  		 * virtual interrupt delivery.
>  		 */
>  		if (vcpu->arch.apicv_active)
> -			kvm_x86_ops->hwapic_irr_update(vcpu,
> -				kvm_lapic_find_highest_irr(vcpu));
> +			kvm_x86_ops->sync_pir_to_irr(vcpu, true);
>  	}
> 
>  	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
> --
> 1.8.3.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 0/3] kvm: x86: speedups for APICv
  2016-09-27 21:20 [RFC PATCH 0/3] kvm: x86: speedups for APICv Paolo Bonzini
                   ` (2 preceding siblings ...)
  2016-09-27 21:20 ` [PATCH 3/3] KVM: x86: do not scan IRR twice on APICv vmentry Paolo Bonzini
@ 2016-09-29 19:55 ` Radim Krčmář
  2016-09-29 21:41   ` Paolo Bonzini
  3 siblings, 1 reply; 29+ messages in thread
From: Radim Krčmář @ 2016-09-29 19:55 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm, yang.zhang.wz, feng.wu, mst

2016-09-27 23:20+0200, Paolo Bonzini:
> The difficult one is patch 2.  I find the new code easier to follow than
> the old one, but it doesn't mean it works. :)  The aim is for APICv to
> not use KVM_REQ_EVENT at all for interrupts, therefore turning APICv's
> weakness (having to look at PIR on every vmentry) into a strength
> (because checking PIR.ON is cheaper than processing KVM_REQ_EVENT).

Makes sense.

Another possible optimization: when delivering an IPI, don't write the
vector to PIR, but directly to VIRR.  If the guest is not in VMX
non-root mode, then vm entry will take care of the injection; in the
other case, we'll send POSTED_INTR_VECTOR.
It seems that we don't even have to set PI.ON -- SDM doesn't say it is
necessary to evaluate pending virtual interrupts after receiving the
notification interrupt.  If we have to set PI.ON, we can just skip the
PIR->VIRR sync as long as the VM doesn't have an assigned device,
because we know that PIR is empty.

And a more far-fetched one: if we know that PI.ON is set before vm
entry, we could just send POSTED_INTR_VECTOR self-IPI after masking
interrupts and let APICv copy PIR to IRR and deliver interrupts.
There are two possible drawbacks: Is the self-IPI overhead too big?
Would APICv IRR evaluation at vm entry take precedence, so we'd have big
interrupt priority inversion window?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 0/3] kvm: x86: speedups for APICv
  2016-09-29 19:55 ` [RFC PATCH 0/3] kvm: x86: speedups for APICv Radim Krčmář
@ 2016-09-29 21:41   ` Paolo Bonzini
  2016-09-30 13:23     ` Radim Krčmář
  2016-09-30 13:33     ` Radim Krčmář
  0 siblings, 2 replies; 29+ messages in thread
From: Paolo Bonzini @ 2016-09-29 21:41 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: linux-kernel, kvm, yang zhang wz, feng wu, mst


> Another possible optimization: when delivering an IPI, don't write the
> vector to PIR, but directly to VIRR.  If the guest is not in VMX
> non-root mode, then vm entry will take care of the injection; in the
> other case, we'll send POSTED_INTR_VECTOR.
> It seems that we don't even have to set PI.ON -- SDM doesn't say it is
> necessary to evaluate pending virtual interrupts after receiving the
> notification interrupt.  If we have to set PI.ON, we can just skip the
> PIR->VIRR sync as long as the VM doesn't have an assigned device,
> because we know that PIR is empty.

Nope, you cannot write to the APIC page while the VM is running.
(We're already reading the manual in such a way as to "allow" us to
write TMR while the VM is running, but that should not be extended.
For example the SDM doesn't say that the processor accesses VIRR with
atomic instructions, in fact it probably doesn't).

> And a more far-fetched one: if we know that PI.ON is set before vm
> entry, we could just send POSTED_INTR_VECTOR self-IPI after masking
> interrupts and let APICv copy PIR to IRR and deliver interrupts.
> There are two possible drawbacks: Is the self-IPI overhead too big?
> Would APICv IRR evaluation at vm entry take precedence, so we'd have big
> interrupt priority inversion window?

I don't think there is a risk of inverting interrupt priority, because
that race is always present.  But the overhead is probably too much, the
cost of the one xchg in __apic_update_irr is probably half of the whole
IRR update if the PI descriptor cacheline bounces.

Paolo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 0/3] kvm: x86: speedups for APICv
  2016-09-29 21:41   ` Paolo Bonzini
@ 2016-09-30 13:23     ` Radim Krčmář
  2016-09-30 13:33     ` Radim Krčmář
  1 sibling, 0 replies; 29+ messages in thread
From: Radim Krčmář @ 2016-09-30 13:23 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm, yang zhang wz, feng wu, mst

2016-09-29 17:41-0400, Paolo Bonzini:
>> Another possible optimization: when delivering an IPI, don't write the
>> vector to PIR, but directly to VIRR.  If the guest is not in VMX
>> non-root mode, then vm entry will take care of the injection; in the
>> other case, we'll send POSTED_INTR_VECTOR.
>> It seems that we don't even have to set PI.ON -- SDM doesn't say it is
>> necessary to evaluate pending virtual interrupts after receiving the
>> notification interrupt.  If we have to set PI.ON, we can just skip the
>> PIR->VIRR sync as long as the VM doesn't have an assigned device,
>> because we know that PIR is empty.
> 
> Nope, you cannot write to the APIC page while the VM is running.

True, thanks.

> (We're already reading the manual in such a way as to "allow" us to
> write TMR while the VM is running, but that should not be extended.
> For example the SDM doesn't say that the processor accesses VIRR with
> atomic instructions, in fact it probably doesn't).

Yeah, TMR is not used by the hardware ...

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 0/3] kvm: x86: speedups for APICv
  2016-09-29 21:41   ` Paolo Bonzini
  2016-09-30 13:23     ` Radim Krčmář
@ 2016-09-30 13:33     ` Radim Krčmář
  1 sibling, 0 replies; 29+ messages in thread
From: Radim Krčmář @ 2016-09-30 13:33 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm, yang zhang wz, feng wu, mst

2016-09-29 17:41-0400, Paolo Bonzini:
>> And a more far-fetched one: if we know that PI.ON is set before vm
>> entry, we could just send POSTED_INTR_VECTOR self-IPI after masking
>> interrupts and let APICv copy PIR to IRR and deliver interrupts.
>> There are two possible drawbacks: Is the self-IPI overhead too big?
>> Would APICv IRR evaluation at vm entry take precedence, so we'd have big
>> interrupt priority inversion window?
> 
> I don't think there is a risk of inverting interrupt priority, because
> that race is always present.  But the overhead is probably too much, the
> cost of the one xchg in __apic_update_irr is probably half of the whole
> IRR update if the PI descriptor cacheline bounces.

Yep, I just ran the vmexit kvm-unit-benchmark -- the cpuid and hypercall
tests are ~1000 cycles slower if I send the notification self-IPI, which
should be far more than PIR->IRR + vmcs_write(RVI, fls(IRR)).

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] kvm: x86: do not use KVM_REQ_EVENT for APICv interrupt injection
  2016-09-27 21:20 ` [PATCH 2/3] kvm: x86: do not use KVM_REQ_EVENT for APICv interrupt injection Paolo Bonzini
  2016-09-27 23:07   ` Michael S. Tsirkin
  2016-09-28 10:04   ` Wu, Feng
@ 2016-10-14  7:12   ` Yang Zhang
  2 siblings, 0 replies; 29+ messages in thread
From: Yang Zhang @ 2016-10-14  7:12 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm; +Cc: feng.wu, mst, rkrcmar

On 2016/9/28 5:20, Paolo Bonzini wrote:
> Since bf9f6ac8d749 ("KVM: Update Posted-Interrupts Descriptor when vCPU
> is blocked", 2015-09-18) the posted interrupt descriptor is checked
> unconditionally for PIR.ON.  Therefore we don't need KVM_REQ_EVENT to
> trigger the scan and, if NMIs or SMIs are not involved, we can avoid
> the complicated event injection path.
>
> However, there is a race between vmx_deliver_posted_interrupt and
> vcpu_enter_guest.  Fix it by disabling interrupts before vcpu->mode is
> set to IN_GUEST_MODE.
>
> Calling kvm_vcpu_kick if PIR.ON=1 is also useless, though it has been
> there since APICv was introduced.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/x86/kvm/lapic.c | 2 --
>  arch/x86/kvm/vmx.c   | 8 +++++---
>  arch/x86/kvm/x86.c   | 9 +++++++--
>  3 files changed, 12 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 63a442aefc12..be8b7ad56dd1 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -356,8 +356,6 @@ void kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir)
>  	struct kvm_lapic *apic = vcpu->arch.apic;
>
>  	__kvm_apic_update_irr(pir, apic->regs);
> -
> -	kvm_make_request(KVM_REQ_EVENT, vcpu);
>  }
>  EXPORT_SYMBOL_GPL(kvm_apic_update_irr);
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index b33eee395b00..207b9aa32915 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -4844,9 +4844,11 @@ static void vmx_deliver_posted_interrupt(struct kvm_vcpu *vcpu, int vector)
>  	if (pi_test_and_set_pir(vector, &vmx->pi_desc))
>  		return;
>
> -	r = pi_test_and_set_on(&vmx->pi_desc);
> -	kvm_make_request(KVM_REQ_EVENT, vcpu);

Hi Paolo

I remember that a request is necessary before vcpu kick. Otherwise, the 
interrupt cannot be serviced in time. In this case, if the posted 
interrupt delivery occurs between a and c:

vcpu_enter_guest:
a. check pending interupt
b. an interrupt is delivered from other 
vcpus(vmx_deliver_posted_interrupt() is called )
c.vcpu->mode = IN_GUEST_MODE;

Previously, the vcpu will aware there is a pending request(interrupt) 
after c:
     if (vcpu->request)
         goto cancel_injection

with this patch, since there is no request and vcpu will continue enter 
guest without handling the pending interrupt.(kvm_vcpu_kick does nothing 
since the mode isn't equal to IN_GUEST_MODE)

Can this case happen?

> -	if (r || !kvm_vcpu_trigger_posted_interrupt(vcpu))
> +	/* If a previous notification has sent the IPI, nothing to do.  */
> +	if (pi_test_and_set_on(&vmx->pi_desc))
> +		return;
> +
> +	if (!kvm_vcpu_trigger_posted_interrupt(vcpu))
>  		kvm_vcpu_kick(vcpu);
>  }
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 3ee8a91a78c3..604cfbfc5bee 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -6658,6 +6658,13 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  	kvm_x86_ops->prepare_guest_switch(vcpu);
>  	if (vcpu->fpu_active)
>  		kvm_load_guest_fpu(vcpu);
> +
> +	/*
> +	 * Disable IRQs before setting IN_GUEST_MODE, so that
> +	 * posted interrupts with vcpu->mode == IN_GUEST_MODE
> +	 * always result in virtual interrupt delivery.
> +	 */
> +	local_irq_disable();
>  	vcpu->mode = IN_GUEST_MODE;
>
>  	srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
> @@ -6671,8 +6678,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  	 */
>  	smp_mb__after_srcu_read_unlock();
>
> -	local_irq_disable();
> -
>  	if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests
>  	    || need_resched() || signal_pending(current)) {
>  		vcpu->mode = OUTSIDE_GUEST_MODE;
>


-- 
Yang
Alibaba Cloud Computing

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/3] KVM: x86: do not scan IRR twice on APICv vmentry
  2016-09-27 21:20 ` [PATCH 3/3] KVM: x86: do not scan IRR twice on APICv vmentry Paolo Bonzini
  2016-09-28 14:00   ` Michael S. Tsirkin
  2016-09-29  2:51   ` Wu, Feng
@ 2016-10-14  7:32   ` Yang Zhang
  2016-10-14  7:45     ` Paolo Bonzini
  2 siblings, 1 reply; 29+ messages in thread
From: Yang Zhang @ 2016-10-14  7:32 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm; +Cc: feng.wu, mst, rkrcmar

On 2016/9/28 5:20, Paolo Bonzini wrote:
> Calling apic_find_highest_irr results in IRR being scanned twice,
> once in vmx_sync_pir_from_irr and once in apic_search_irr.  Change
> vcpu_enter_guest to use sync_pir_from_irr (with a new argument to
> trigger the RVI write), and let sync_pir_from_irr get the new RVI
> directly from kvm_apic_update_irr.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/x86/include/asm/kvm_host.h |  2 +-
>  arch/x86/kvm/lapic.c            | 25 ++++++++++++++++---------
>  arch/x86/kvm/lapic.h            |  4 ++--
>  arch/x86/kvm/svm.c              |  2 +-
>  arch/x86/kvm/vmx.c              | 25 ++++++++++++++-----------
>  arch/x86/kvm/x86.c              |  7 +++----
>  6 files changed, 37 insertions(+), 28 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 4b20f7304b9c..f73eea243567 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -941,7 +941,7 @@ struct kvm_x86_ops {
>  	void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
>  	void (*set_apic_access_page_addr)(struct kvm_vcpu *vcpu, hpa_t hpa);
>  	void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
> -	void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
> +	void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu, bool sync_rvi);
>  	int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
>  	int (*get_tdp_level)(void);
>  	u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index be8b7ad56dd1..85b79b90e3d0 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -317,7 +317,7 @@ static int find_highest_vector(void *bitmap)
>  	     vec >= 0; vec -= APIC_VECTORS_PER_REG) {
>  		reg = bitmap + REG_POS(vec);
>  		if (*reg)
> -			return fls(*reg) - 1 + vec;
> +			return __fls(*reg) + vec;
>  	}
>
>  	return -1;
> @@ -337,25 +337,32 @@ static u8 count_vectors(void *bitmap)
>  	return count;
>  }
>
> -void __kvm_apic_update_irr(u32 *pir, void *regs)
> +int __kvm_apic_update_irr(u32 *pir, void *regs)
>  {
> -	u32 i, pir_val;
> +	u32 i, vec;
> +	u32 pir_val, irr_val;
> +	int max_irr = -1;
>
> -	for (i = 0; i <= 7; i++) {
> +	for (i = vec = 0; i <= 7; i++, vec += 32) {

how about ignore the first 32 vectors since they cannot be used as 
normal interrupt except nmi interrupt which is special handled.

>  		pir_val = READ_ONCE(pir[i]);
> +		irr_val = *((u32 *)(regs + APIC_IRR + i * 0x10));
>  		if (pir_val) {
> -			pir_val = xchg(&pir[i], 0);
> -			*((u32 *)(regs + APIC_IRR + i * 0x10)) |= pir_val;
> +			irr_val |= xchg(&pir[i], 0);
> +			*((u32 *)(regs + APIC_IRR + i * 0x10)) = irr_val;
>  		}
> +		if (irr_val)
> +			max_irr = __fls(irr_val) + vec;
>  	}
> +
> +	return max_irr;
>  }
>  EXPORT_SYMBOL_GPL(__kvm_apic_update_irr);
>
> -void kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir)
> +int kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir)
>  {
>  	struct kvm_lapic *apic = vcpu->arch.apic;
>
> -	__kvm_apic_update_irr(pir, apic->regs);
> +	return __kvm_apic_update_irr(pir, apic->regs);
>  }
>  EXPORT_SYMBOL_GPL(kvm_apic_update_irr);
>
> @@ -376,7 +383,7 @@ static inline int apic_find_highest_irr(struct kvm_lapic *apic)
>  		return -1;
>
>  	if (apic->vcpu->arch.apicv_active)
> -		kvm_x86_ops->sync_pir_to_irr(apic->vcpu);
> +		kvm_x86_ops->sync_pir_to_irr(apic->vcpu, false);
>  	result = apic_search_irr(apic);

You have got the max_irr in sync_pir_to_irr. It seems the 
apic_search_irr() is redundant here.

>  	ASSERT(result == -1 || result >= 16);

I think the result must >= 32. But it seems this code exists when kvm 
first merging into kernel.

>
> diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
> index f60d01c29d51..fc72828c3782 100644
> --- a/arch/x86/kvm/lapic.h
> +++ b/arch/x86/kvm/lapic.h
> @@ -70,8 +70,8 @@ int kvm_lapic_reg_read(struct kvm_lapic *apic, u32 offset, int len,
>  bool kvm_apic_match_dest(struct kvm_vcpu *vcpu, struct kvm_lapic *source,
>  			   int short_hand, unsigned int dest, int dest_mode);
>
> -void __kvm_apic_update_irr(u32 *pir, void *regs);
> -void kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir);
> +int __kvm_apic_update_irr(u32 *pir, void *regs);
> +int kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir);
>  int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq,
>  		     struct dest_map *dest_map);
>  int kvm_apic_local_deliver(struct kvm_lapic *apic, int lvt_type);
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 9b4221471e3d..60e53fa2b554 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -4383,7 +4383,7 @@ static void svm_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)
>  	return;
>  }
>
> -static void svm_sync_pir_to_irr(struct kvm_vcpu *vcpu)
> +static void svm_sync_pir_to_irr(struct kvm_vcpu *vcpu, bool sync_rvi)
>  {
>  	return;
>  }
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 207b9aa32915..1edefab54d01 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -4852,17 +4852,6 @@ static void vmx_deliver_posted_interrupt(struct kvm_vcpu *vcpu, int vector)
>  		kvm_vcpu_kick(vcpu);
>  }
>
> -static void vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
> -{
> -	struct vcpu_vmx *vmx = to_vmx(vcpu);
> -
> -	if (!pi_test_on(&vmx->pi_desc) ||
> -	    !pi_test_and_clear_on(&vmx->pi_desc))
> -		return;
> -
> -	kvm_apic_update_irr(vcpu, vmx->pi_desc.pir);
> -}
> -
>  /*
>   * Set up the vmcs's constant host-state fields, i.e., host-state fields that
>   * will not change in the lifetime of the guest.
> @@ -8609,6 +8598,20 @@ static void vmx_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr)
>  	}
>  }
>
> +static void vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu, bool sync_rvi)
> +{
> +	struct vcpu_vmx *vmx = to_vmx(vcpu);
> +	int max_irr;
> +
> +	if (!pi_test_on(&vmx->pi_desc) ||
> +	    !pi_test_and_clear_on(&vmx->pi_desc))
> +		return;
> +
> +	max_irr = kvm_apic_update_irr(vcpu, vmx->pi_desc.pir);
> +	if (sync_rvi)
> +		vmx_hwapic_irr_update(vcpu, max_irr);
> +}
> +
>  static void vmx_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)
>  {
>  	if (!kvm_vcpu_apicv_active(vcpu))
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 604cfbfc5bee..148e14fdc55d 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2821,7 +2821,7 @@ static int kvm_vcpu_ioctl_get_lapic(struct kvm_vcpu *vcpu,
>  				    struct kvm_lapic_state *s)
>  {
>  	if (vcpu->arch.apicv_active)
> -		kvm_x86_ops->sync_pir_to_irr(vcpu);
> +		kvm_x86_ops->sync_pir_to_irr(vcpu, false);
>
>  	return kvm_apic_get_state(vcpu, s);
>  }
> @@ -6448,7 +6448,7 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
>  		kvm_scan_ioapic_routes(vcpu, vcpu->arch.ioapic_handled_vectors);
>  	else {
>  		if (vcpu->arch.apicv_active)
> -			kvm_x86_ops->sync_pir_to_irr(vcpu);
> +			kvm_x86_ops->sync_pir_to_irr(vcpu, false);
>  		kvm_ioapic_scan_entry(vcpu, vcpu->arch.ioapic_handled_vectors);
>  	}
>  	bitmap_or((ulong *)eoi_exit_bitmap, vcpu->arch.ioapic_handled_vectors,
> @@ -6611,8 +6611,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  		 * virtual interrupt delivery.
>  		 */
>  		if (vcpu->arch.apicv_active)
> -			kvm_x86_ops->hwapic_irr_update(vcpu,
> -				kvm_lapic_find_highest_irr(vcpu));
> +			kvm_x86_ops->sync_pir_to_irr(vcpu, true);
>  	}
>
>  	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
>


-- 
Yang
Alibaba Cloud Computing

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] kvm: x86: do not use KVM_REQ_EVENT for APICv interrupt injection
  2016-09-28 11:50         ` Paolo Bonzini
  2016-09-28 12:06           ` Wu, Feng
@ 2016-10-14  7:37           ` Yang Zhang
  2016-10-14  8:12             ` Paolo Bonzini
  1 sibling, 1 reply; 29+ messages in thread
From: Yang Zhang @ 2016-10-14  7:37 UTC (permalink / raw)
  To: Paolo Bonzini, Wu, Feng, Michael S. Tsirkin; +Cc: linux-kernel, kvm, rkrcmar

On 2016/9/28 19:50, Paolo Bonzini wrote:
>
>
> On 28/09/2016 13:40, Wu, Feng wrote:
>> IIUIC, the issue you describe above is that IPI for posted-interrupts may be
>> issued between
>>
>> vcpu->mode = IN_GUEST_MODE;
>>
>> and
>>
>> local_irq_disable();
>>
>> But if that really happens, we will call kvm_vcpu_kick() in
>> vmx_deliver_posted_interrupt(), hence the vcpu->mode will be changed
>> to EXITING_GUEST_MODE, then we will goto cancel_injection in
>> vcpu_enter_guest, so the posted-interrupt will be delivered to guest
>> in the next vmentry. Seems I cannot see the problem. Do I miss something?
>
> No, if that happens kvm_trigger_posted_interrupt returns true, hence
> kvm_vcpu_kick is not called.  With the fix, the IPI is processed as soon
> as the guest enters non-root mode, and the interrupt is injected.
>
>
> The other issue occurs when the IPI is sent between
>
>                         kvm_x86_ops->hwapic_irr_update(vcpu,
>                                 kvm_lapic_find_highest_irr(vcpu));
>
> and
>
> 	vcpu->mode = IN_GUEST_MODE;
>
> In this case, kvm_vcpu_kick is called but it (correctly) doesn't do
> anything because it sees vcpu->mode == OUTSIDE_GUEST_MODE.  Then the
> guest is entered with PIR.ON, but the PI interrupt is not pending and
> hence the interrupt is never delivered to the guest.  The fix for this
> is to move the RVI update after IN_GUEST_MODE.  Then the source CPU uses
> the posted interrupt IPI instead of kvm_cpu_kick, and everything works.

Please ignore my previous reply. It seems you already aware the issue 
and get the resolution to fix it.:-)


-- 
Yang
Alibaba Cloud Computing

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/3] KVM: x86: do not scan IRR twice on APICv vmentry
  2016-10-14  7:32   ` Yang Zhang
@ 2016-10-14  7:45     ` Paolo Bonzini
  0 siblings, 0 replies; 29+ messages in thread
From: Paolo Bonzini @ 2016-10-14  7:45 UTC (permalink / raw)
  To: Yang Zhang, linux-kernel, kvm; +Cc: feng.wu, mst, rkrcmar



On 14/10/2016 09:32, Yang Zhang wrote:
>>
>> -void __kvm_apic_update_irr(u32 *pir, void *regs)
>> +int __kvm_apic_update_irr(u32 *pir, void *regs)
>>  {
>> -    u32 i, pir_val;
>> +    u32 i, vec;
>> +    u32 pir_val, irr_val;
>> +    int max_irr = -1;
>>
>> -    for (i = 0; i <= 7; i++) {
>> +    for (i = vec = 0; i <= 7; i++, vec += 32) {
> 
> how about ignore the first 32 vectors since they cannot be used as
> normal interrupt except nmi interrupt which is special handled.

I think that, while they should not be used as normal interrupts,
there's nothing that prevents you from misusing them as normal interrupts.

Paolo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] kvm: x86: do not use KVM_REQ_EVENT for APICv interrupt injection
  2016-10-14  7:37           ` Yang Zhang
@ 2016-10-14  8:12             ` Paolo Bonzini
  0 siblings, 0 replies; 29+ messages in thread
From: Paolo Bonzini @ 2016-10-14  8:12 UTC (permalink / raw)
  To: Yang Zhang, Wu, Feng, Michael S. Tsirkin; +Cc: linux-kernel, kvm, rkrcmar



On 14/10/2016 09:37, Yang Zhang wrote:
> Please ignore my previous reply. It seems you already aware the issue
> and get the resolution to fix it.:-)

Thanks anyway for confirming!  The final version of the patches is on
the way.

Paolo

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2016-10-14  8:12 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-27 21:20 [RFC PATCH 0/3] kvm: x86: speedups for APICv Paolo Bonzini
2016-09-27 21:20 ` [PATCH 1/3] kvm: x86: avoid atomic operations on APICv vmentry Paolo Bonzini
2016-09-27 21:20 ` [PATCH 2/3] kvm: x86: do not use KVM_REQ_EVENT for APICv interrupt injection Paolo Bonzini
2016-09-27 23:07   ` Michael S. Tsirkin
2016-09-28  8:21     ` Paolo Bonzini
2016-09-28 11:40       ` Wu, Feng
2016-09-28 11:50         ` Paolo Bonzini
2016-09-28 12:06           ` Wu, Feng
2016-09-28 12:14             ` Paolo Bonzini
2016-10-14  7:37           ` Yang Zhang
2016-10-14  8:12             ` Paolo Bonzini
2016-09-28 13:46       ` Michael S. Tsirkin
2016-09-28 14:08         ` Paolo Bonzini
2016-09-28 10:04   ` Wu, Feng
2016-09-28 10:16     ` Paolo Bonzini
2016-09-28 11:53       ` Wu, Feng
2016-09-28 11:55         ` Paolo Bonzini
2016-09-28 12:07           ` Wu, Feng
2016-10-14  7:12   ` Yang Zhang
2016-09-27 21:20 ` [PATCH 3/3] KVM: x86: do not scan IRR twice on APICv vmentry Paolo Bonzini
2016-09-28 14:00   ` Michael S. Tsirkin
2016-09-28 14:44     ` Paolo Bonzini
2016-09-29  2:51   ` Wu, Feng
2016-10-14  7:32   ` Yang Zhang
2016-10-14  7:45     ` Paolo Bonzini
2016-09-29 19:55 ` [RFC PATCH 0/3] kvm: x86: speedups for APICv Radim Krčmář
2016-09-29 21:41   ` Paolo Bonzini
2016-09-30 13:23     ` Radim Krčmář
2016-09-30 13:33     ` Radim Krčmář

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.