All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v11 0/3] x86, apicv: Add APIC virtualization support
@ 2013-01-16 10:21 Yang Zhang
  2013-01-16 10:21 ` [PATCH v11 1/3] x86, apicv: add APICv register " Yang Zhang
                   ` (3 more replies)
  0 siblings, 4 replies; 34+ messages in thread
From: Yang Zhang @ 2013-01-16 10:21 UTC (permalink / raw)
  To: kvm; +Cc: gleb, haitao.shan, mtosatti, xiantao.zhang, Yang Zhang

From: Yang Zhang <yang.z.zhang@Intel.com>

APIC virtualization is a new feature which can eliminate most of VM exit
when vcpu handle a interrupt:

APIC register virtualization:
        APIC read access doesn't cause APIC-access VM exits.
        APIC write becomes trap-like.

Virtual interrupt delivery:
        Virtual interrupt delivery avoids KVM to inject vAPIC interrupts
        manually, which is fully taken care of by the hardware.

Please refer to Intel SDM volume 3, chapter 29 for more details.
Changes v10 to v11:
 * Use two new msr bitmaps for guest that enabling x2apic mode:
   Since msr bitmap is shared by all guests, it will break guest that
   not using x2apic when updating the global msr bitmap. To solve this,
   we use two new msr bitmap for guest which using x2apic.

Changes v9 to v10:
 * Enable virtualize x2apic mode when guest is using x2apic and apicv:
   There is no point to enable x2apic mode when apicv is disabled.
 * Grep ioapic_lock when traversing ioapic entry to set eoi exit bitmap
 * Rebased on top of KVM upstream

Changes v8 to v9:
 * Update eoi exit bitmap by vcpu itself.
 * Enable virtualize x2apic mode when guest is using x2apic.
 * Rebase on top of KVM upstream

Changes v7 to v8:
 * According Marcelo's suggestion, add comments for irr_pending and isr_count,
   since the two valiables have different meaning when using apicv.
 * Set highest bit in vISR to SVI after migation.
 * Use spinlock to access eoi exit bitmap synchronously.
 * Enable virtualize x2apic mode when guest is using x2apic
 * Rebased on top of KVM upstream.

Changes v6 to v7:
 * fix a bug when set exit bitmap.
 * Rebased on top of KVM upstream.

Changes v5 to v6:
 * minor adjustments according gleb's comments
 * Rebased on top of KVM upstream.

Changes v4 to v5:
 * Set eoi exit bitmap when an interrupt has notifier registered.
 * Use request to track ioapic entry's modification.
 * Rebased on top of KVM upstream.

Changes v3 to v4:
 * use one option to control both register virtualization and virtual interrupt
   delivery.
 * Update eoi exit bitmap when programing ioapic or programing apic's id/dfr/ldr.
 * Rebased on top of KVM upstream.

Changes v2 to v3:
 * Drop Posted Interrupt patch from v3.
   According Gleb's suggestion, we will use global vector for all VCPUs as notification
   event vector. So we will rewrite the Posted Interrupt patch. And resend it later.
 * Use TMR to set the eoi exiting bitmap. We only want to set eoi exiting bitmap for
   those interrupt which is level trigger or has notifier in EOI write path. So TMR is
   enough to distinguish the interrupt trigger mode.
 * Simplify some code according Gleb's comments.
 * rebased on top of KVM upstream.

Changes v1 to v2:
 * Add Posted Interrupt support in this series patch.
 * Since there is a notifer hook in vAPIC EOI for PIT interrupt. So always Set PIT
   interrupt in eoi exit bitmap to force vmexit when EOI to interrupt.
 * Rebased on top of KVM upstream

Yang Zhang (3):
  x86, apicv: add APICv register virtualization support
  x86, apicv: add virtual x2apic support
  x86, apicv: add virtual interrupt delivery support

 arch/ia64/kvm/lapic.h           |    6 +
 arch/x86/include/asm/kvm_host.h |    8 +
 arch/x86/include/asm/vmx.h      |   14 ++
 arch/x86/kvm/irq.c              |   56 ++++++-
 arch/x86/kvm/lapic.c            |   99 +++++++----
 arch/x86/kvm/lapic.h            |   25 +++
 arch/x86/kvm/svm.c              |   31 ++++
 arch/x86/kvm/vmx.c              |  366 ++++++++++++++++++++++++++++++++++++++-
 arch/x86/kvm/x86.c              |   11 +-
 arch/x86/kvm/x86.h              |    2 +
 include/linux/kvm_host.h        |    3 +
 virt/kvm/ioapic.c               |   38 ++++
 virt/kvm/ioapic.h               |    1 +
 virt/kvm/irq_comm.c             |   25 +++
 virt/kvm/kvm_main.c             |    5 +
 15 files changed, 641 insertions(+), 49 deletions(-)


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v11 1/3] x86, apicv: add APICv register virtualization support
  2013-01-16 10:21 [PATCH v11 0/3] x86, apicv: Add APIC virtualization support Yang Zhang
@ 2013-01-16 10:21 ` Yang Zhang
  2013-01-16 10:21 ` [PATCH v11 2/3] x86, apicv: add virtual x2apic support Yang Zhang
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 34+ messages in thread
From: Yang Zhang @ 2013-01-16 10:21 UTC (permalink / raw)
  To: kvm; +Cc: gleb, haitao.shan, mtosatti, xiantao.zhang, Yang Zhang, Kevin Tian

- APIC read doesn't cause VM-Exit
- APIC write becomes trap-like

Signed-off-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Yang Zhang <yang.z.zhang@intel.com>
---
 arch/x86/include/asm/vmx.h |    2 ++
 arch/x86/kvm/lapic.c       |   15 +++++++++++++++
 arch/x86/kvm/lapic.h       |    2 ++
 arch/x86/kvm/vmx.c         |   33 ++++++++++++++++++++++++++++++++-
 4 files changed, 51 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index e385df9..44c3f7e 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -66,6 +66,7 @@
 #define EXIT_REASON_EPT_MISCONFIG       49
 #define EXIT_REASON_WBINVD              54
 #define EXIT_REASON_XSETBV              55
+#define EXIT_REASON_APIC_WRITE          56
 #define EXIT_REASON_INVPCID             58
 
 #define VMX_EXIT_REASONS \
@@ -141,6 +142,7 @@
 #define SECONDARY_EXEC_ENABLE_VPID              0x00000020
 #define SECONDARY_EXEC_WBINVD_EXITING		0x00000040
 #define SECONDARY_EXEC_UNRESTRICTED_GUEST	0x00000080
+#define SECONDARY_EXEC_APIC_REGISTER_VIRT       0x00000100
 #define SECONDARY_EXEC_PAUSE_LOOP_EXITING	0x00000400
 #define SECONDARY_EXEC_ENABLE_INVPCID		0x00001000
 
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 9392f52..0664c13 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1212,6 +1212,21 @@ void kvm_lapic_set_eoi(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_lapic_set_eoi);
 
+/* emulate APIC access in a trap manner */
+void kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset)
+{
+	u32 val = 0;
+
+	/* hw has done the conditional check and inst decode */
+	offset &= 0xff0;
+
+	apic_reg_read(vcpu->arch.apic, offset, 4, &val);
+
+	/* TODO: optimize to just emulate side effect w/o one more write */
+	apic_reg_write(vcpu->arch.apic, offset, val);
+}
+EXPORT_SYMBOL_GPL(kvm_apic_write_nodecode);
+
 void kvm_free_lapic(struct kvm_vcpu *vcpu)
 {
 	struct kvm_lapic *apic = vcpu->arch.apic;
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index e5ebf9f..9a8ee22 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -64,6 +64,8 @@ int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu);
 u64 kvm_get_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu);
 void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu, u64 data);
 
+void kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset);
+
 void kvm_lapic_set_vapic_addr(struct kvm_vcpu *vcpu, gpa_t vapic_addr);
 void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu);
 void kvm_lapic_sync_to_vapic(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index dd2a85c..0403634 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -84,6 +84,9 @@ module_param(vmm_exclusive, bool, S_IRUGO);
 static bool __read_mostly fasteoi = 1;
 module_param(fasteoi, bool, S_IRUGO);
 
+static bool __read_mostly enable_apicv_reg = 1;
+module_param(enable_apicv_reg, bool, S_IRUGO);
+
 /*
  * If nested=1, nested virtualization is supported, i.e., guests may use
  * VMX and be a hypervisor for its own guests. If nested=0, guests may not
@@ -764,6 +767,12 @@ static inline bool cpu_has_vmx_virtualize_apic_accesses(void)
 		SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
 }
 
+static inline bool cpu_has_vmx_apic_register_virt(void)
+{
+	return vmcs_config.cpu_based_2nd_exec_ctrl &
+		SECONDARY_EXEC_APIC_REGISTER_VIRT;
+}
+
 static inline bool cpu_has_vmx_flexpriority(void)
 {
 	return cpu_has_vmx_tpr_shadow() &&
@@ -2540,7 +2549,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
 			SECONDARY_EXEC_UNRESTRICTED_GUEST |
 			SECONDARY_EXEC_PAUSE_LOOP_EXITING |
 			SECONDARY_EXEC_RDTSCP |
-			SECONDARY_EXEC_ENABLE_INVPCID;
+			SECONDARY_EXEC_ENABLE_INVPCID |
+			SECONDARY_EXEC_APIC_REGISTER_VIRT;
 		if (adjust_vmx_controls(min2, opt2,
 					MSR_IA32_VMX_PROCBASED_CTLS2,
 					&_cpu_based_2nd_exec_control) < 0)
@@ -2551,6 +2561,11 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
 				SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES))
 		_cpu_based_exec_control &= ~CPU_BASED_TPR_SHADOW;
 #endif
+
+	if (!(_cpu_based_exec_control & CPU_BASED_TPR_SHADOW))
+		_cpu_based_2nd_exec_control &= ~(
+				SECONDARY_EXEC_APIC_REGISTER_VIRT);
+
 	if (_cpu_based_2nd_exec_control & SECONDARY_EXEC_ENABLE_EPT) {
 		/* CR3 accesses and invlpg don't need to cause VM Exits when EPT
 		   enabled */
@@ -2748,6 +2763,9 @@ static __init int hardware_setup(void)
 	if (!cpu_has_vmx_ple())
 		ple_gap = 0;
 
+	if (!cpu_has_vmx_apic_register_virt())
+		enable_apicv_reg = 0;
+
 	if (nested)
 		nested_vmx_setup_ctls_msrs();
 
@@ -3828,6 +3846,8 @@ static u32 vmx_secondary_exec_control(struct vcpu_vmx *vmx)
 		exec_control &= ~SECONDARY_EXEC_UNRESTRICTED_GUEST;
 	if (!ple_gap)
 		exec_control &= ~SECONDARY_EXEC_PAUSE_LOOP_EXITING;
+	if (!enable_apicv_reg)
+		exec_control &= ~SECONDARY_EXEC_APIC_REGISTER_VIRT;
 	return exec_control;
 }
 
@@ -4789,6 +4809,16 @@ static int handle_apic_access(struct kvm_vcpu *vcpu)
 	return emulate_instruction(vcpu, 0) == EMULATE_DONE;
 }
 
+static int handle_apic_write(struct kvm_vcpu *vcpu)
+{
+	unsigned long exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
+	u32 offset = exit_qualification & 0xfff;
+
+	/* APIC-write VM exit is trap-like and thus no need to adjust IP */
+	kvm_apic_write_nodecode(vcpu, offset);
+	return 1;
+}
+
 static int handle_task_switch(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
@@ -5723,6 +5753,7 @@ static int (*const kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
 	[EXIT_REASON_VMON]                    = handle_vmon,
 	[EXIT_REASON_TPR_BELOW_THRESHOLD]     = handle_tpr_below_threshold,
 	[EXIT_REASON_APIC_ACCESS]             = handle_apic_access,
+	[EXIT_REASON_APIC_WRITE]              = handle_apic_write,
 	[EXIT_REASON_WBINVD]                  = handle_wbinvd,
 	[EXIT_REASON_XSETBV]                  = handle_xsetbv,
 	[EXIT_REASON_TASK_SWITCH]             = handle_task_switch,
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v11 2/3] x86, apicv: add virtual x2apic support
  2013-01-16 10:21 [PATCH v11 0/3] x86, apicv: Add APIC virtualization support Yang Zhang
  2013-01-16 10:21 ` [PATCH v11 1/3] x86, apicv: add APICv register " Yang Zhang
@ 2013-01-16 10:21 ` Yang Zhang
  2013-01-17 13:22   ` Gleb Natapov
  2013-01-21 19:59   ` Marcelo Tosatti
  2013-01-16 10:21 ` [PATCH v11 3/3] x86, apicv: add virtual interrupt delivery support Yang Zhang
  2013-01-16 10:27 ` [PATCH v11 0/3] x86, apicv: Add APIC virtualization support Gleb Natapov
  3 siblings, 2 replies; 34+ messages in thread
From: Yang Zhang @ 2013-01-16 10:21 UTC (permalink / raw)
  To: kvm; +Cc: gleb, haitao.shan, mtosatti, xiantao.zhang, Yang Zhang, Kevin Tian

From: Yang Zhang <yang.z.zhang@Intel.com>

basically to benefit from apicv, we need to enable virtualized x2apic mode.
Currently, we only enable it when guest is really using x2apic.

Also, clear MSR bitmap for corresponding x2apic MSRs when guest enabled x2apic:
    0x800 - 0x8ff: no read intercept for apicv register virtualization,
                   except APIC ID and TMCCT which need software's assistance to
		   get right value.

Signed-off-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
---
 arch/x86/include/asm/kvm_host.h |    1 +
 arch/x86/include/asm/vmx.h      |    1 +
 arch/x86/kvm/lapic.c            |   20 ++--
 arch/x86/kvm/lapic.h            |    5 +
 arch/x86/kvm/svm.c              |    6 +
 arch/x86/kvm/vmx.c              |  204 +++++++++++++++++++++++++++++++++++----
 6 files changed, 209 insertions(+), 28 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c431b33..35aa8e6 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -697,6 +697,7 @@ struct kvm_x86_ops {
 	void (*enable_nmi_window)(struct kvm_vcpu *vcpu);
 	void (*enable_irq_window)(struct kvm_vcpu *vcpu);
 	void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);
+	void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
 	int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
 	int (*get_tdp_level)(void);
 	u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 44c3f7e..0a54df0 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -139,6 +139,7 @@
 #define SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES 0x00000001
 #define SECONDARY_EXEC_ENABLE_EPT               0x00000002
 #define SECONDARY_EXEC_RDTSCP			0x00000008
+#define SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE   0x00000010
 #define SECONDARY_EXEC_ENABLE_VPID              0x00000020
 #define SECONDARY_EXEC_WBINVD_EXITING		0x00000040
 #define SECONDARY_EXEC_UNRESTRICTED_GUEST	0x00000080
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 0664c13..f39aee3 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -140,11 +140,6 @@ static inline int apic_enabled(struct kvm_lapic *apic)
 	(LVT_MASK | APIC_MODE_MASK | APIC_INPUT_POLARITY | \
 	 APIC_LVT_REMOTE_IRR | APIC_LVT_LEVEL_TRIGGER)
 
-static inline int apic_x2apic_mode(struct kvm_lapic *apic)
-{
-	return apic->vcpu->arch.apic_base & X2APIC_ENABLE;
-}
-
 static inline int kvm_apic_id(struct kvm_lapic *apic)
 {
 	return (kvm_apic_get_reg(apic, APIC_ID) >> 24) & 0xff;
@@ -1323,12 +1318,17 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value)
 	if (!kvm_vcpu_is_bsp(apic->vcpu))
 		value &= ~MSR_IA32_APICBASE_BSP;
 
-	vcpu->arch.apic_base = value;
-	if (apic_x2apic_mode(apic)) {
-		u32 id = kvm_apic_id(apic);
-		u32 ldr = ((id >> 4) << 16) | (1 << (id & 0xf));
-		kvm_apic_set_ldr(apic, ldr);
+	if ((vcpu->arch.apic_base ^ value) & X2APIC_ENABLE) {
+		if (value & X2APIC_ENABLE) {
+			u32 id = kvm_apic_id(apic);
+			u32 ldr = ((id >> 4) << 16) | (1 << (id & 0xf));
+			kvm_apic_set_ldr(apic, ldr);
+			kvm_x86_ops->set_virtual_x2apic_mode(vcpu, true);
+		} else
+			kvm_x86_ops->set_virtual_x2apic_mode(vcpu, false);
 	}
+
+	vcpu->arch.apic_base = value;
 	apic->base_address = apic->vcpu->arch.apic_base &
 			     MSR_IA32_APICBASE_BASE;
 
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 9a8ee22..22a5397 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -126,4 +126,9 @@ static inline int kvm_lapic_enabled(struct kvm_vcpu *vcpu)
 	return kvm_apic_present(vcpu) && kvm_apic_sw_enabled(vcpu->arch.apic);
 }
 
+static inline int apic_x2apic_mode(struct kvm_lapic *apic)
+{
+	return apic->vcpu->arch.apic_base & X2APIC_ENABLE;
+}
+
 #endif
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index d29d3cd..38407e9 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3571,6 +3571,11 @@ static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
 		set_cr_intercept(svm, INTERCEPT_CR8_WRITE);
 }
 
+static void svm_set_virtual_x2apic_mode(struct kvm_vcpu *vcpu, bool set)
+{
+	return;
+}
+
 static int svm_nmi_allowed(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
@@ -4290,6 +4295,7 @@ static struct kvm_x86_ops svm_x86_ops = {
 	.enable_nmi_window = enable_nmi_window,
 	.enable_irq_window = enable_irq_window,
 	.update_cr8_intercept = update_cr8_intercept,
+	.set_virtual_x2apic_mode = svm_set_virtual_x2apic_mode,
 
 	.set_tss_addr = svm_set_tss_addr,
 	.get_tdp_level = get_npt_level,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 0403634..526955d 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -643,6 +643,8 @@ static unsigned long *vmx_io_bitmap_a;
 static unsigned long *vmx_io_bitmap_b;
 static unsigned long *vmx_msr_bitmap_legacy;
 static unsigned long *vmx_msr_bitmap_longmode;
+static unsigned long *vmx_msr_bitmap_legacy_x2apic;
+static unsigned long *vmx_msr_bitmap_longmode_x2apic;
 
 static bool cpu_has_load_ia32_efer;
 static bool cpu_has_load_perf_global_ctrl;
@@ -767,6 +769,12 @@ static inline bool cpu_has_vmx_virtualize_apic_accesses(void)
 		SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
 }
 
+static inline bool cpu_has_vmx_virtualize_x2apic_mode(void)
+{
+	return vmcs_config.cpu_based_2nd_exec_ctrl &
+		SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE;
+}
+
 static inline bool cpu_has_vmx_apic_register_virt(void)
 {
 	return vmcs_config.cpu_based_2nd_exec_ctrl &
@@ -1830,6 +1838,24 @@ static void move_msr_up(struct vcpu_vmx *vmx, int from, int to)
 	vmx->guest_msrs[from] = tmp;
 }
 
+static void vmx_set_msr_bitmap(struct kvm_vcpu *vcpu)
+{
+	unsigned long *msr_bitmap;
+
+	if (apic_x2apic_mode(vcpu->arch.apic))
+		if (is_long_mode(vcpu))
+			msr_bitmap = vmx_msr_bitmap_longmode_x2apic;
+		else
+			msr_bitmap = vmx_msr_bitmap_legacy_x2apic;
+	else
+		if (is_long_mode(vcpu))
+			msr_bitmap = vmx_msr_bitmap_longmode;
+		else
+			msr_bitmap = vmx_msr_bitmap_legacy;
+
+	vmcs_write64(MSR_BITMAP, __pa(msr_bitmap));
+}
+
 /*
  * Set up the vmcs to automatically save and restore system
  * msrs.  Don't touch the 64-bit msrs if the guest is in legacy
@@ -1838,7 +1864,6 @@ static void move_msr_up(struct vcpu_vmx *vmx, int from, int to)
 static void setup_msrs(struct vcpu_vmx *vmx)
 {
 	int save_nmsrs, index;
-	unsigned long *msr_bitmap;
 
 	save_nmsrs = 0;
 #ifdef CONFIG_X86_64
@@ -1870,14 +1895,8 @@ static void setup_msrs(struct vcpu_vmx *vmx)
 
 	vmx->save_nmsrs = save_nmsrs;
 
-	if (cpu_has_vmx_msr_bitmap()) {
-		if (is_long_mode(&vmx->vcpu))
-			msr_bitmap = vmx_msr_bitmap_longmode;
-		else
-			msr_bitmap = vmx_msr_bitmap_legacy;
-
-		vmcs_write64(MSR_BITMAP, __pa(msr_bitmap));
-	}
+	if (cpu_has_vmx_msr_bitmap())
+		vmx_set_msr_bitmap(&vmx->vcpu);
 }
 
 /*
@@ -2543,6 +2562,7 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
 	if (_cpu_based_exec_control & CPU_BASED_ACTIVATE_SECONDARY_CONTROLS) {
 		min2 = 0;
 		opt2 = SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES |
+			SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE |
 			SECONDARY_EXEC_WBINVD_EXITING |
 			SECONDARY_EXEC_ENABLE_VPID |
 			SECONDARY_EXEC_ENABLE_EPT |
@@ -3724,7 +3744,10 @@ static void free_vpid(struct vcpu_vmx *vmx)
 	spin_unlock(&vmx_vpid_lock);
 }
 
-static void __vmx_disable_intercept_for_msr(unsigned long *msr_bitmap, u32 msr)
+#define MSR_TYPE_R	1
+#define MSR_TYPE_W	2
+static void __vmx_disable_intercept_for_msr(unsigned long *msr_bitmap,
+						u32 msr, int type)
 {
 	int f = sizeof(unsigned long);
 
@@ -3737,20 +3760,99 @@ static void __vmx_disable_intercept_for_msr(unsigned long *msr_bitmap, u32 msr)
 	 * We can control MSRs 0x00000000-0x00001fff and 0xc0000000-0xc0001fff.
 	 */
 	if (msr <= 0x1fff) {
-		__clear_bit(msr, msr_bitmap + 0x000 / f); /* read-low */
-		__clear_bit(msr, msr_bitmap + 0x800 / f); /* write-low */
+		if (type & MSR_TYPE_R)
+			/* read-low */
+			__clear_bit(msr, msr_bitmap + 0x000 / f);
+
+		if (type & MSR_TYPE_W)
+			/* write-low */
+			__clear_bit(msr, msr_bitmap + 0x800 / f);
+
 	} else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff)) {
 		msr &= 0x1fff;
-		__clear_bit(msr, msr_bitmap + 0x400 / f); /* read-high */
-		__clear_bit(msr, msr_bitmap + 0xc00 / f); /* write-high */
+		if (type & MSR_TYPE_R)
+			/* read-high */
+			__clear_bit(msr, msr_bitmap + 0x400 / f);
+
+		if (type & MSR_TYPE_W)
+			/* write-high */
+			__clear_bit(msr, msr_bitmap + 0xc00 / f);
+
+	}
+}
+
+static void __vmx_enable_intercept_for_msr(unsigned long *msr_bitmap,
+						u32 msr, int type)
+{
+	int f = sizeof(unsigned long);
+
+	if (!cpu_has_vmx_msr_bitmap())
+		return;
+
+	/*
+	 * See Intel PRM Vol. 3, 20.6.9 (MSR-Bitmap Address). Early manuals
+	 * have the write-low and read-high bitmap offsets the wrong way round.
+	 * We can control MSRs 0x00000000-0x00001fff and 0xc0000000-0xc0001fff.
+	 */
+	if (msr <= 0x1fff) {
+		if (type & MSR_TYPE_R)
+			/* read-low */
+			__set_bit(msr, msr_bitmap + 0x000 / f);
+
+		if (type & MSR_TYPE_W)
+			/* write-low */
+			__set_bit(msr, msr_bitmap + 0x800 / f);
+
+	} else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff)) {
+		msr &= 0x1fff;
+		if (type & MSR_TYPE_R)
+			/* read-high */
+			__set_bit(msr, msr_bitmap + 0x400 / f);
+
+		if (type & MSR_TYPE_W)
+			/* write-high */
+			__set_bit(msr, msr_bitmap + 0xc00 / f);
+
 	}
 }
 
 static void vmx_disable_intercept_for_msr(u32 msr, bool longmode_only)
 {
 	if (!longmode_only)
-		__vmx_disable_intercept_for_msr(vmx_msr_bitmap_legacy, msr);
-	__vmx_disable_intercept_for_msr(vmx_msr_bitmap_longmode, msr);
+		__vmx_disable_intercept_for_msr(vmx_msr_bitmap_legacy,
+						msr, MSR_TYPE_R | MSR_TYPE_W);
+	__vmx_disable_intercept_for_msr(vmx_msr_bitmap_longmode,
+						msr, MSR_TYPE_R | MSR_TYPE_W);
+}
+
+static void vmx_intercept_for_msr_read_x2apic(u32 msr, bool set)
+{
+	if (set) {
+		__vmx_enable_intercept_for_msr(vmx_msr_bitmap_legacy_x2apic,
+				msr, MSR_TYPE_R);
+		__vmx_enable_intercept_for_msr(vmx_msr_bitmap_longmode_x2apic,
+				msr, MSR_TYPE_R);
+	} else {
+		__vmx_disable_intercept_for_msr(vmx_msr_bitmap_legacy_x2apic,
+				msr, MSR_TYPE_R);
+		__vmx_disable_intercept_for_msr(vmx_msr_bitmap_longmode_x2apic,
+				msr, MSR_TYPE_R);
+	}
+}
+
+static void vmx_intercept_for_msr_write_x2apic(u32 msr, bool set)
+{
+	if (set) {
+		__vmx_enable_intercept_for_msr(vmx_msr_bitmap_legacy_x2apic,
+				msr, MSR_TYPE_W);
+		__vmx_enable_intercept_for_msr(vmx_msr_bitmap_longmode_x2apic,
+				msr, MSR_TYPE_W);
+	} else {
+		__vmx_disable_intercept_for_msr(vmx_msr_bitmap_legacy_x2apic,
+				msr, MSR_TYPE_W);
+		__vmx_disable_intercept_for_msr(vmx_msr_bitmap_longmode_x2apic,
+				msr, MSR_TYPE_W);
+	}
 }
 
 /*
@@ -3848,6 +3950,7 @@ static u32 vmx_secondary_exec_control(struct vcpu_vmx *vmx)
 		exec_control &= ~SECONDARY_EXEC_PAUSE_LOOP_EXITING;
 	if (!enable_apicv_reg)
 		exec_control &= ~SECONDARY_EXEC_APIC_REGISTER_VIRT;
+	exec_control &= ~SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE;
 	return exec_control;
 }
 
@@ -6103,6 +6206,53 @@ static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
 	vmcs_write32(TPR_THRESHOLD, irr);
 }
 
+static void vmx_set_virtual_x2apic_mode(struct kvm_vcpu *vcpu, bool set)
+{
+	u32 exec_control, sec_exec_control;
+	int msr;
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+	/* There is not point to enable virtualize x2apic without enable
+	 * apicv*/
+	if (!cpu_has_vmx_virtualize_x2apic_mode() || !enable_apicv_reg)
+		return;
+
+	if (set) {
+		exec_control = vmcs_read32(CPU_BASED_VM_EXEC_CONTROL);
+		/* virtualize x2apic mode relies on tpr shadow */
+		if (!(exec_control & CPU_BASED_TPR_SHADOW))
+			return;
+	}
+
+	sec_exec_control = vmcs_read32(SECONDARY_VM_EXEC_CONTROL);
+
+	if (set) {
+		sec_exec_control &= ~SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
+		sec_exec_control |= SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE;
+	} else {
+		sec_exec_control &= ~SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE;
+		if (vm_need_virtualize_apic_accesses(vmx->vcpu.kvm))
+			sec_exec_control |=
+					SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
+	}
+	vmcs_write32(SECONDARY_VM_EXEC_CONTROL, sec_exec_control);
+
+	if (set) {
+		for (msr = 0x800; msr <= 0x8ff; msr++)
+			vmx_intercept_for_msr_read_x2apic(msr, false);
+
+		/* According SDM, in x2apic mode, the whole id reg is used.
+		 * But in KVM, it only use the highest eight bits. Need to
+		 * intercept it */
+		vmx_intercept_for_msr_read_x2apic(0x802, true);
+		/* TMCCT */
+		vmx_intercept_for_msr_read_x2apic(0x839, true);
+		/* TPR */
+		vmx_intercept_for_msr_write_x2apic(0x808, false);
+	}
+	vmx_set_msr_bitmap(vcpu);
+}
+
 static void vmx_complete_atomic_exit(struct vcpu_vmx *vmx)
 {
 	u32 exit_intr_info;
@@ -7366,6 +7516,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
 	.enable_nmi_window = enable_nmi_window,
 	.enable_irq_window = enable_irq_window,
 	.update_cr8_intercept = update_cr8_intercept,
+	.set_virtual_x2apic_mode = vmx_set_virtual_x2apic_mode,
 
 	.set_tss_addr = vmx_set_tss_addr,
 	.get_tdp_level = get_ept_level,
@@ -7419,11 +7570,19 @@ static int __init vmx_init(void)
 	if (!vmx_msr_bitmap_legacy)
 		goto out1;
 
+	vmx_msr_bitmap_legacy_x2apic =
+				(unsigned long *)__get_free_page(GFP_KERNEL);
+	if (!vmx_msr_bitmap_legacy_x2apic)
+		goto out2;
 
 	vmx_msr_bitmap_longmode = (unsigned long *)__get_free_page(GFP_KERNEL);
 	if (!vmx_msr_bitmap_longmode)
-		goto out2;
+		goto out3;
 
+	vmx_msr_bitmap_longmode_x2apic =
+				(unsigned long *)__get_free_page(GFP_KERNEL);
+	if (!vmx_msr_bitmap_longmode_x2apic)
+		goto out4;
 
 	/*
 	 * Allow direct access to the PC debug port (it is often used for I/O
@@ -7456,6 +7615,11 @@ static int __init vmx_init(void)
 	vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_ESP, false);
 	vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_EIP, false);
 
+	memcpy(vmx_msr_bitmap_legacy_x2apic,
+			vmx_msr_bitmap_legacy, PAGE_SIZE);
+	memcpy(vmx_msr_bitmap_longmode_x2apic,
+			vmx_msr_bitmap_longmode, PAGE_SIZE);
+
 	if (enable_ept) {
 		kvm_mmu_set_mask_ptes(0ull,
 			(enable_ept_ad_bits) ? VMX_EPT_ACCESS_BIT : 0ull,
@@ -7468,8 +7632,10 @@ static int __init vmx_init(void)
 
 	return 0;
 
-out3:
+out4:
 	free_page((unsigned long)vmx_msr_bitmap_longmode);
+out3:
+	free_page((unsigned long)vmx_msr_bitmap_legacy_x2apic);
 out2:
 	free_page((unsigned long)vmx_msr_bitmap_legacy);
 out1:
@@ -7481,6 +7647,8 @@ out:
 
 static void __exit vmx_exit(void)
 {
+	free_page((unsigned long)vmx_msr_bitmap_legacy_x2apic);
+	free_page((unsigned long)vmx_msr_bitmap_longmode_x2apic);
 	free_page((unsigned long)vmx_msr_bitmap_legacy);
 	free_page((unsigned long)vmx_msr_bitmap_longmode);
 	free_page((unsigned long)vmx_io_bitmap_b);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v11 3/3] x86, apicv: add virtual interrupt delivery support
  2013-01-16 10:21 [PATCH v11 0/3] x86, apicv: Add APIC virtualization support Yang Zhang
  2013-01-16 10:21 ` [PATCH v11 1/3] x86, apicv: add APICv register " Yang Zhang
  2013-01-16 10:21 ` [PATCH v11 2/3] x86, apicv: add virtual x2apic support Yang Zhang
@ 2013-01-16 10:21 ` Yang Zhang
  2013-01-17  1:26   ` Zhang, Yang Z
  2013-01-21 21:05   ` Marcelo Tosatti
  2013-01-16 10:27 ` [PATCH v11 0/3] x86, apicv: Add APIC virtualization support Gleb Natapov
  3 siblings, 2 replies; 34+ messages in thread
From: Yang Zhang @ 2013-01-16 10:21 UTC (permalink / raw)
  To: kvm; +Cc: gleb, haitao.shan, mtosatti, xiantao.zhang, Yang Zhang, Kevin Tian

From: Yang Zhang <yang.z.zhang@Intel.com>

Virtual interrupt delivery avoids KVM to inject vAPIC interrupts
manually, which is fully taken care of by the hardware. This needs
some special awareness into existing interrupr injection path:

- for pending interrupt, instead of direct injection, we may need
  update architecture specific indicators before resuming to guest.

- A pending interrupt, which is masked by ISR, should be also
  considered in above update action, since hardware will decide
  when to inject it at right time. Current has_interrupt and
  get_interrupt only returns a valid vector from injection p.o.v.

Signed-off-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
---
 arch/ia64/kvm/lapic.h           |    6 ++
 arch/x86/include/asm/kvm_host.h |    7 ++
 arch/x86/include/asm/vmx.h      |   11 +++
 arch/x86/kvm/irq.c              |   56 +++++++++++-
 arch/x86/kvm/lapic.c            |   69 +++++++++------
 arch/x86/kvm/lapic.h            |   23 +++++
 arch/x86/kvm/svm.c              |   25 +++++
 arch/x86/kvm/vmx.c              |  184 +++++++++++++++++++++++++++++++++++++--
 arch/x86/kvm/x86.c              |   11 ++-
 arch/x86/kvm/x86.h              |    2 +
 include/linux/kvm_host.h        |    3 +
 virt/kvm/ioapic.c               |   38 ++++++++
 virt/kvm/ioapic.h               |    1 +
 virt/kvm/irq_comm.c             |   25 +++++
 virt/kvm/kvm_main.c             |    5 +
 15 files changed, 421 insertions(+), 45 deletions(-)

diff --git a/arch/ia64/kvm/lapic.h b/arch/ia64/kvm/lapic.h
index c5f92a9..c3e2935 100644
--- a/arch/ia64/kvm/lapic.h
+++ b/arch/ia64/kvm/lapic.h
@@ -27,4 +27,10 @@ int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq);
 #define kvm_apic_present(x) (true)
 #define kvm_lapic_enabled(x) (true)
 
+static inline bool kvm_apic_vid_enabled(void)
+{
+	/* IA64 has no apicv supporting, do nothing here */
+	return false;
+}
+
 #endif
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 35aa8e6..d90bfa8 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -697,6 +697,12 @@ struct kvm_x86_ops {
 	void (*enable_nmi_window)(struct kvm_vcpu *vcpu);
 	void (*enable_irq_window)(struct kvm_vcpu *vcpu);
 	void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);
+	int (*has_virtual_interrupt_delivery)(void);
+	void (*update_apic_irq)(struct kvm_vcpu *vcpu, int max_irr);
+	void (*update_eoi_exitmap)(struct kvm_vcpu *vcpu);
+	void (*set_svi)(int isr);
+	void (*check_ioapic_entry)(struct kvm_vcpu *vcpu,
+				   struct kvm_lapic_irq *irq);
 	void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
 	int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
 	int (*get_tdp_level)(void);
@@ -992,6 +998,7 @@ int kvm_age_hva(struct kvm *kvm, unsigned long hva);
 int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
 void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
 int cpuid_maxphyaddr(struct kvm_vcpu *vcpu);
+int kvm_cpu_has_injectable_intr(struct kvm_vcpu *v);
 int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu);
 int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu);
 int kvm_cpu_get_interrupt(struct kvm_vcpu *v);
diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 0a54df0..694586c 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -62,6 +62,7 @@
 #define EXIT_REASON_MCE_DURING_VMENTRY  41
 #define EXIT_REASON_TPR_BELOW_THRESHOLD 43
 #define EXIT_REASON_APIC_ACCESS         44
+#define EXIT_REASON_EOI_INDUCED         45
 #define EXIT_REASON_EPT_VIOLATION       48
 #define EXIT_REASON_EPT_MISCONFIG       49
 #define EXIT_REASON_WBINVD              54
@@ -144,6 +145,7 @@
 #define SECONDARY_EXEC_WBINVD_EXITING		0x00000040
 #define SECONDARY_EXEC_UNRESTRICTED_GUEST	0x00000080
 #define SECONDARY_EXEC_APIC_REGISTER_VIRT       0x00000100
+#define SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY    0x00000200
 #define SECONDARY_EXEC_PAUSE_LOOP_EXITING	0x00000400
 #define SECONDARY_EXEC_ENABLE_INVPCID		0x00001000
 
@@ -181,6 +183,7 @@ enum vmcs_field {
 	GUEST_GS_SELECTOR               = 0x0000080a,
 	GUEST_LDTR_SELECTOR             = 0x0000080c,
 	GUEST_TR_SELECTOR               = 0x0000080e,
+	GUEST_INTR_STATUS               = 0x00000810,
 	HOST_ES_SELECTOR                = 0x00000c00,
 	HOST_CS_SELECTOR                = 0x00000c02,
 	HOST_SS_SELECTOR                = 0x00000c04,
@@ -208,6 +211,14 @@ enum vmcs_field {
 	APIC_ACCESS_ADDR_HIGH		= 0x00002015,
 	EPT_POINTER                     = 0x0000201a,
 	EPT_POINTER_HIGH                = 0x0000201b,
+	EOI_EXIT_BITMAP0                = 0x0000201c,
+	EOI_EXIT_BITMAP0_HIGH           = 0x0000201d,
+	EOI_EXIT_BITMAP1                = 0x0000201e,
+	EOI_EXIT_BITMAP1_HIGH           = 0x0000201f,
+	EOI_EXIT_BITMAP2                = 0x00002020,
+	EOI_EXIT_BITMAP2_HIGH           = 0x00002021,
+	EOI_EXIT_BITMAP3                = 0x00002022,
+	EOI_EXIT_BITMAP3_HIGH           = 0x00002023,
 	GUEST_PHYSICAL_ADDRESS          = 0x00002400,
 	GUEST_PHYSICAL_ADDRESS_HIGH     = 0x00002401,
 	VMCS_LINK_POINTER               = 0x00002800,
diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
index b111aee..45a254e 100644
--- a/arch/x86/kvm/irq.c
+++ b/arch/x86/kvm/irq.c
@@ -38,6 +38,38 @@ int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
 EXPORT_SYMBOL(kvm_cpu_has_pending_timer);
 
 /*
+ * check if there is pending interrupt from
+ * non-APIC source without intack.
+ */
+static int kvm_cpu_has_extint(struct kvm_vcpu *v)
+{
+	if (kvm_apic_accept_pic_intr(v))
+		return pic_irqchip(v->kvm)->output;	/* PIC */
+	else
+		return 0;
+}
+
+/*
+ * check if there is injectable interrupt:
+ * when virtual interrupt delivery enabled,
+ * interrupt from apic will handled by hardware,
+ * we don't need to check it here.
+ */
+int kvm_cpu_has_injectable_intr(struct kvm_vcpu *v)
+{
+	if (!irqchip_in_kernel(v->kvm))
+		return v->arch.interrupt.pending;
+
+	if (kvm_cpu_has_extint(v))
+		return 1;
+
+	if (kvm_apic_vid_enabled())
+		return 0;
+
+	return kvm_apic_has_interrupt(v) != -1; /* LAPIC */
+}
+
+/*
  * check if there is pending interrupt without
  * intack.
  */
@@ -46,27 +78,41 @@ int kvm_cpu_has_interrupt(struct kvm_vcpu *v)
 	if (!irqchip_in_kernel(v->kvm))
 		return v->arch.interrupt.pending;
 
-	if (kvm_apic_accept_pic_intr(v) && pic_irqchip(v->kvm)->output)
-		return pic_irqchip(v->kvm)->output;	/* PIC */
+	if (kvm_cpu_has_extint(v))
+		return 1;
 
 	return kvm_apic_has_interrupt(v) != -1;	/* LAPIC */
 }
 EXPORT_SYMBOL_GPL(kvm_cpu_has_interrupt);
 
 /*
+ * Read pending interrupt(from non-APIC source)
+ * vector and intack.
+ */
+static int kvm_cpu_get_extint(struct kvm_vcpu *v)
+{
+	if (kvm_cpu_has_extint(v))
+		return kvm_pic_read_irq(v->kvm); /* PIC */
+	return -1;
+}
+
+/*
  * Read pending interrupt vector and intack.
  */
 int kvm_cpu_get_interrupt(struct kvm_vcpu *v)
 {
+	int vector;
+
 	if (!irqchip_in_kernel(v->kvm))
 		return v->arch.interrupt.nr;
 
-	if (kvm_apic_accept_pic_intr(v) && pic_irqchip(v->kvm)->output)
-		return kvm_pic_read_irq(v->kvm);	/* PIC */
+	vector = kvm_cpu_get_extint(v);
+
+	if (kvm_apic_vid_enabled() || vector != -1)
+		return vector;			/* PIC */
 
 	return kvm_get_apic_interrupt(v);	/* APIC */
 }
-EXPORT_SYMBOL_GPL(kvm_cpu_get_interrupt);
 
 void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu)
 {
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index f39aee3..d96515e 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -145,23 +145,6 @@ static inline int kvm_apic_id(struct kvm_lapic *apic)
 	return (kvm_apic_get_reg(apic, APIC_ID) >> 24) & 0xff;
 }
 
-static inline u16 apic_cluster_id(struct kvm_apic_map *map, u32 ldr)
-{
-	u16 cid;
-	ldr >>= 32 - map->ldr_bits;
-	cid = (ldr >> map->cid_shift) & map->cid_mask;
-
-	BUG_ON(cid >= ARRAY_SIZE(map->logical_map));
-
-	return cid;
-}
-
-static inline u16 apic_logical_id(struct kvm_apic_map *map, u32 ldr)
-{
-	ldr >>= (32 - map->ldr_bits);
-	return ldr & map->lid_mask;
-}
-
 static void recalculate_apic_map(struct kvm *kvm)
 {
 	struct kvm_apic_map *new, *old = NULL;
@@ -225,6 +208,8 @@ out:
 
 	if (old)
 		kfree_rcu(old, rcu);
+
+	ioapic_update_eoi_exitmap(kvm);
 }
 
 static inline void kvm_apic_set_id(struct kvm_lapic *apic, u8 id)
@@ -340,6 +325,8 @@ static inline int apic_find_highest_irr(struct kvm_lapic *apic)
 {
 	int result;
 
+	/* Note that irr_pending is just a hint. It will be always
+	 * true with virtual interrupt delivery enabled. */
 	if (!apic->irr_pending)
 		return -1;
 
@@ -456,6 +443,8 @@ static void pv_eoi_clr_pending(struct kvm_vcpu *vcpu)
 static inline int apic_find_highest_isr(struct kvm_lapic *apic)
 {
 	int result;
+
+	/* Note that isr_count is always 1 with vid enabled*/
 	if (!apic->isr_count)
 		return -1;
 	if (likely(apic->highest_isr_cache != -1))
@@ -735,6 +724,19 @@ int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1, struct kvm_vcpu *vcpu2)
 	return vcpu1->arch.apic_arb_prio - vcpu2->arch.apic_arb_prio;
 }
 
+static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
+{
+	if (!(kvm_apic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI) &&
+	    kvm_ioapic_handles_vector(apic->vcpu->kvm, vector)) {
+		int trigger_mode;
+		if (apic_test_vector(vector, apic->regs + APIC_TMR))
+			trigger_mode = IOAPIC_LEVEL_TRIG;
+		else
+			trigger_mode = IOAPIC_EDGE_TRIG;
+		kvm_ioapic_update_eoi(apic->vcpu->kvm, vector, trigger_mode);
+	}
+}
+
 static int apic_set_eoi(struct kvm_lapic *apic)
 {
 	int vector = apic_find_highest_isr(apic);
@@ -751,19 +753,26 @@ static int apic_set_eoi(struct kvm_lapic *apic)
 	apic_clear_isr(vector, apic);
 	apic_update_ppr(apic);
 
-	if (!(kvm_apic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI) &&
-	    kvm_ioapic_handles_vector(apic->vcpu->kvm, vector)) {
-		int trigger_mode;
-		if (apic_test_vector(vector, apic->regs + APIC_TMR))
-			trigger_mode = IOAPIC_LEVEL_TRIG;
-		else
-			trigger_mode = IOAPIC_EDGE_TRIG;
-		kvm_ioapic_update_eoi(apic->vcpu->kvm, vector, trigger_mode);
-	}
+	kvm_ioapic_send_eoi(apic, vector);
 	kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
 	return vector;
 }
 
+/*
+ * this interface assumes a trap-like exit, which has already finished
+ * desired side effect including vISR and vPPR update.
+ */
+void kvm_apic_set_eoi_accelerated(struct kvm_vcpu *vcpu, int vector)
+{
+	struct kvm_lapic *apic = vcpu->arch.apic;
+
+	trace_kvm_eoi(apic, vector);
+
+	kvm_ioapic_send_eoi(apic, vector);
+	kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
+}
+EXPORT_SYMBOL_GPL(kvm_apic_set_eoi_accelerated);
+
 static void apic_send_ipi(struct kvm_lapic *apic)
 {
 	u32 icr_low = kvm_apic_get_reg(apic, APIC_ICR);
@@ -1374,8 +1383,8 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu)
 		apic_set_reg(apic, APIC_ISR + 0x10 * i, 0);
 		apic_set_reg(apic, APIC_TMR + 0x10 * i, 0);
 	}
-	apic->irr_pending = false;
-	apic->isr_count = 0;
+	apic->irr_pending = kvm_apic_vid_enabled();
+	apic->isr_count = kvm_apic_vid_enabled();
 	apic->highest_isr_cache = -1;
 	update_divide_count(apic);
 	atomic_set(&apic->lapic_timer.pending, 0);
@@ -1590,8 +1599,10 @@ void kvm_apic_post_state_restore(struct kvm_vcpu *vcpu,
 	update_divide_count(apic);
 	start_apic_timer(apic);
 	apic->irr_pending = true;
-	apic->isr_count = count_vectors(apic->regs + APIC_ISR);
+	apic->isr_count = kvm_apic_vid_enabled() ?
+				1 : count_vectors(apic->regs + APIC_ISR);
 	apic->highest_isr_cache = -1;
+	kvm_x86_ops->set_svi(apic_find_highest_isr(apic));
 	kvm_make_request(KVM_REQ_EVENT, vcpu);
 }
 
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 22a5397..109e37c 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -65,6 +65,7 @@ u64 kvm_get_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu);
 void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu, u64 data);
 
 void kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset);
+void kvm_apic_set_eoi_accelerated(struct kvm_vcpu *vcpu, int vector);
 
 void kvm_lapic_set_vapic_addr(struct kvm_vcpu *vcpu, gpa_t vapic_addr);
 void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu);
@@ -131,4 +132,26 @@ static inline int apic_x2apic_mode(struct kvm_lapic *apic)
 	return apic->vcpu->arch.apic_base & X2APIC_ENABLE;
 }
 
+static inline bool kvm_apic_vid_enabled(void)
+{
+	return kvm_x86_ops->has_virtual_interrupt_delivery();
+}
+
+static inline u16 apic_cluster_id(struct kvm_apic_map *map, u32 ldr)
+{
+	u16 cid;
+	ldr >>= 32 - map->ldr_bits;
+	cid = (ldr >> map->cid_shift) & map->cid_mask;
+
+	BUG_ON(cid >= ARRAY_SIZE(map->logical_map));
+
+	return cid;
+}
+
+static inline u16 apic_logical_id(struct kvm_apic_map *map, u32 ldr)
+{
+	ldr >>= (32 - map->ldr_bits);
+	return ldr & map->lid_mask;
+}
+
 #endif
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 38407e9..53c31cb 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3576,6 +3576,27 @@ static void svm_set_virtual_x2apic_mode(struct kvm_vcpu *vcpu, bool set)
 	return;
 }
 
+static int svm_has_virtual_interrupt_delivery(void)
+{
+	return 0;
+}
+
+static void svm_update_eoi_exitmap(struct kvm_vcpu *vcpu)
+{
+	return;
+}
+
+static void svm_set_svi(int isr)
+{
+	return;
+}
+
+static void svm_check_ioapic_entry(struct kvm_vcpu *vcpu,
+				   struct kvm_lapic_irq *irq)
+{
+	return;
+}
+
 static int svm_nmi_allowed(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
@@ -4296,6 +4317,10 @@ static struct kvm_x86_ops svm_x86_ops = {
 	.enable_irq_window = enable_irq_window,
 	.update_cr8_intercept = update_cr8_intercept,
 	.set_virtual_x2apic_mode = svm_set_virtual_x2apic_mode,
+	.has_virtual_interrupt_delivery = svm_has_virtual_interrupt_delivery,
+	.update_eoi_exitmap = svm_update_eoi_exitmap,
+	.set_svi = svm_set_svi,
+	.check_ioapic_entry = svm_check_ioapic_entry,
 
 	.set_tss_addr = svm_set_tss_addr,
 	.get_tdp_level = get_npt_level,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 526955d..8b9752a 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -84,8 +84,8 @@ module_param(vmm_exclusive, bool, S_IRUGO);
 static bool __read_mostly fasteoi = 1;
 module_param(fasteoi, bool, S_IRUGO);
 
-static bool __read_mostly enable_apicv_reg = 1;
-module_param(enable_apicv_reg, bool, S_IRUGO);
+static bool __read_mostly enable_apicv_reg_vid = 1;
+module_param(enable_apicv_reg_vid, bool, S_IRUGO);
 
 /*
  * If nested=1, nested virtualization is supported, i.e., guests may use
@@ -433,6 +433,8 @@ struct vcpu_vmx {
 
 	bool rdtscp_enabled;
 
+	u64 eoi_exit_bitmap[4];
+
 	/* Support for a guest hypervisor (nested VMX) */
 	struct nested_vmx nested;
 };
@@ -781,6 +783,12 @@ static inline bool cpu_has_vmx_apic_register_virt(void)
 		SECONDARY_EXEC_APIC_REGISTER_VIRT;
 }
 
+static inline bool cpu_has_vmx_virtual_intr_delivery(void)
+{
+	return vmcs_config.cpu_based_2nd_exec_ctrl &
+		SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY;
+}
+
 static inline bool cpu_has_vmx_flexpriority(void)
 {
 	return cpu_has_vmx_tpr_shadow() &&
@@ -2570,7 +2578,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
 			SECONDARY_EXEC_PAUSE_LOOP_EXITING |
 			SECONDARY_EXEC_RDTSCP |
 			SECONDARY_EXEC_ENABLE_INVPCID |
-			SECONDARY_EXEC_APIC_REGISTER_VIRT;
+			SECONDARY_EXEC_APIC_REGISTER_VIRT |
+			SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY;
 		if (adjust_vmx_controls(min2, opt2,
 					MSR_IA32_VMX_PROCBASED_CTLS2,
 					&_cpu_based_2nd_exec_control) < 0)
@@ -2584,7 +2593,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
 
 	if (!(_cpu_based_exec_control & CPU_BASED_TPR_SHADOW))
 		_cpu_based_2nd_exec_control &= ~(
-				SECONDARY_EXEC_APIC_REGISTER_VIRT);
+				SECONDARY_EXEC_APIC_REGISTER_VIRT |
+				SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY);
 
 	if (_cpu_based_2nd_exec_control & SECONDARY_EXEC_ENABLE_EPT) {
 		/* CR3 accesses and invlpg don't need to cause VM Exits when EPT
@@ -2783,8 +2793,14 @@ static __init int hardware_setup(void)
 	if (!cpu_has_vmx_ple())
 		ple_gap = 0;
 
-	if (!cpu_has_vmx_apic_register_virt())
-		enable_apicv_reg = 0;
+	if (!cpu_has_vmx_apic_register_virt() ||
+				!cpu_has_vmx_virtual_intr_delivery())
+		enable_apicv_reg_vid = 0;
+
+	if (enable_apicv_reg_vid)
+		kvm_x86_ops->update_cr8_intercept = NULL;
+	else
+		kvm_x86_ops->update_apic_irq = NULL;
 
 	if (nested)
 		nested_vmx_setup_ctls_msrs();
@@ -3948,8 +3964,9 @@ static u32 vmx_secondary_exec_control(struct vcpu_vmx *vmx)
 		exec_control &= ~SECONDARY_EXEC_UNRESTRICTED_GUEST;
 	if (!ple_gap)
 		exec_control &= ~SECONDARY_EXEC_PAUSE_LOOP_EXITING;
-	if (!enable_apicv_reg)
-		exec_control &= ~SECONDARY_EXEC_APIC_REGISTER_VIRT;
+	if (!enable_apicv_reg_vid)
+		exec_control &= ~(SECONDARY_EXEC_APIC_REGISTER_VIRT |
+				  SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY);
 	exec_control &= ~SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE;
 	return exec_control;
 }
@@ -3995,6 +4012,15 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
 				vmx_secondary_exec_control(vmx));
 	}
 
+	if (enable_apicv_reg_vid) {
+		vmcs_write64(EOI_EXIT_BITMAP0, 0);
+		vmcs_write64(EOI_EXIT_BITMAP1, 0);
+		vmcs_write64(EOI_EXIT_BITMAP2, 0);
+		vmcs_write64(EOI_EXIT_BITMAP3, 0);
+
+		vmcs_write16(GUEST_INTR_STATUS, 0);
+	}
+
 	if (ple_gap) {
 		vmcs_write32(PLE_GAP, ple_gap);
 		vmcs_write32(PLE_WINDOW, ple_window);
@@ -4912,6 +4938,16 @@ static int handle_apic_access(struct kvm_vcpu *vcpu)
 	return emulate_instruction(vcpu, 0) == EMULATE_DONE;
 }
 
+static int handle_apic_eoi_induced(struct kvm_vcpu *vcpu)
+{
+	unsigned long exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
+	int vector = exit_qualification & 0xff;
+
+	/* EOI-induced VM exit is trap-like and thus no need to adjust IP */
+	kvm_apic_set_eoi_accelerated(vcpu, vector);
+	return 1;
+}
+
 static int handle_apic_write(struct kvm_vcpu *vcpu)
 {
 	unsigned long exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
@@ -5857,6 +5893,7 @@ static int (*const kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
 	[EXIT_REASON_TPR_BELOW_THRESHOLD]     = handle_tpr_below_threshold,
 	[EXIT_REASON_APIC_ACCESS]             = handle_apic_access,
 	[EXIT_REASON_APIC_WRITE]              = handle_apic_write,
+	[EXIT_REASON_EOI_INDUCED]             = handle_apic_eoi_induced,
 	[EXIT_REASON_WBINVD]                  = handle_wbinvd,
 	[EXIT_REASON_XSETBV]                  = handle_xsetbv,
 	[EXIT_REASON_TASK_SWITCH]             = handle_task_switch,
@@ -6214,7 +6251,7 @@ static void vmx_set_virtual_x2apic_mode(struct kvm_vcpu *vcpu, bool set)
 
 	/* There is not point to enable virtualize x2apic without enable
 	 * apicv*/
-	if (!cpu_has_vmx_virtualize_x2apic_mode() || !enable_apicv_reg)
+	if (!cpu_has_vmx_virtualize_x2apic_mode() || !enable_apicv_reg_vid)
 		return;
 
 	if (set) {
@@ -6253,6 +6290,130 @@ static void vmx_set_virtual_x2apic_mode(struct kvm_vcpu *vcpu, bool set)
 	vmx_set_msr_bitmap(vcpu);
 }
 
+static int vmx_has_virtual_interrupt_delivery(void)
+{
+	return enable_apicv_reg_vid;
+}
+
+static void vmx_set_svi(int isr)
+{
+	u16 status;
+	u8 old;
+
+	if (!enable_apicv_reg_vid)
+		return;
+
+	if (isr == -1)
+		isr = 0;
+
+	status = vmcs_read16(GUEST_INTR_STATUS);
+	old = status >> 8;
+	if (isr != old) {
+		status &= 0xff;
+		status |= isr << 8;
+		vmcs_write16(GUEST_INTR_STATUS, status);
+	}
+}
+
+static void vmx_set_rvi(int vector)
+{
+	u16 status;
+	u8 old;
+
+	status = vmcs_read16(GUEST_INTR_STATUS);
+	old = (u8)status & 0xff;
+	if ((u8)vector != old) {
+		status &= ~0xff;
+		status |= (u8)vector;
+		vmcs_write16(GUEST_INTR_STATUS, status);
+	}
+}
+
+static void vmx_update_apic_irq(struct kvm_vcpu *vcpu, int max_irr)
+{
+	if (max_irr == -1)
+		return;
+
+	vmx_set_rvi(max_irr);
+}
+
+static void set_eoi_exitmap_one(struct kvm_vcpu *vcpu,
+				u32 vector)
+{
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+	if (WARN_ONCE((vector > 255),
+		"KVM VMX: vector (%d) out of range\n", vector))
+		return;
+
+	__set_bit(vector, (unsigned long *)vmx->eoi_exit_bitmap);
+}
+
+static void vmx_check_ioapic_entry(struct kvm_vcpu *vcpu,
+				   struct kvm_lapic_irq *irq)
+{
+	struct kvm_lapic **dst;
+	struct kvm_apic_map *map;
+	unsigned long bitmap = 1;
+	int i;
+
+	rcu_read_lock();
+	map = rcu_dereference(vcpu->kvm->arch.apic_map);
+
+	if (unlikely(!map)) {
+		set_eoi_exitmap_one(vcpu, irq->vector);
+		goto out;
+	}
+
+	if (irq->dest_mode == 0) { /* physical mode */
+		if (irq->delivery_mode == APIC_DM_LOWEST ||
+				irq->dest_id == 0xff) {
+			set_eoi_exitmap_one(vcpu, irq->vector);
+			goto out;
+		}
+		dst = &map->phys_map[irq->dest_id & 0xff];
+	} else {
+		u32 mda = irq->dest_id << (32 - map->ldr_bits);
+
+		dst = map->logical_map[apic_cluster_id(map, mda)];
+
+		bitmap = apic_logical_id(map, mda);
+	}
+
+	for_each_set_bit(i, &bitmap, 16) {
+		if (!dst[i])
+			continue;
+		if (dst[i]->vcpu == vcpu) {
+			set_eoi_exitmap_one(vcpu, irq->vector);
+			break;
+		}
+	}
+
+out:
+	rcu_read_unlock();
+}
+
+static void vmx_load_eoi_exitmap(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+	vmcs_write64(EOI_EXIT_BITMAP0, vmx->eoi_exit_bitmap[0]);
+	vmcs_write64(EOI_EXIT_BITMAP1, vmx->eoi_exit_bitmap[1]);
+	vmcs_write64(EOI_EXIT_BITMAP2, vmx->eoi_exit_bitmap[2]);
+	vmcs_write64(EOI_EXIT_BITMAP3, vmx->eoi_exit_bitmap[3]);
+}
+
+static void vmx_update_eoi_exitmap(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+	/* clear eoi exit bitmap */
+	memset(vmx->eoi_exit_bitmap, 0, 32);
+
+	set_eoi_exitmap(vcpu);
+	vmx_load_eoi_exitmap(vcpu);
+}
+
 static void vmx_complete_atomic_exit(struct vcpu_vmx *vmx)
 {
 	u32 exit_intr_info;
@@ -7517,6 +7678,11 @@ static struct kvm_x86_ops vmx_x86_ops = {
 	.enable_irq_window = enable_irq_window,
 	.update_cr8_intercept = update_cr8_intercept,
 	.set_virtual_x2apic_mode = vmx_set_virtual_x2apic_mode,
+	.has_virtual_interrupt_delivery = vmx_has_virtual_interrupt_delivery,
+	.check_ioapic_entry = vmx_check_ioapic_entry,
+	.update_apic_irq = vmx_update_apic_irq,
+	.update_eoi_exitmap = vmx_update_eoi_exitmap,
+	.set_svi = vmx_set_svi,
 
 	.set_tss_addr = vmx_set_tss_addr,
 	.get_tdp_level = get_ept_level,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1c9c834..1f5b5ac 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5527,7 +5527,7 @@ static void inject_pending_event(struct kvm_vcpu *vcpu)
 			vcpu->arch.nmi_injected = true;
 			kvm_x86_ops->set_nmi(vcpu);
 		}
-	} else if (kvm_cpu_has_interrupt(vcpu)) {
+	} else if (kvm_cpu_has_injectable_intr(vcpu)) {
 		if (kvm_x86_ops->interrupt_allowed(vcpu)) {
 			kvm_queue_interrupt(vcpu, kvm_cpu_get_interrupt(vcpu),
 					    false);
@@ -5648,6 +5648,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 			kvm_handle_pmu_event(vcpu);
 		if (kvm_check_request(KVM_REQ_PMI, vcpu))
 			kvm_deliver_pmi(vcpu);
+		if (kvm_check_request(KVM_REQ_EOIBITMAP, vcpu))
+			kvm_x86_ops->update_eoi_exitmap(vcpu);
 	}
 
 	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
@@ -5656,10 +5658,15 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 		/* enable NMI/IRQ window open exits if needed */
 		if (vcpu->arch.nmi_pending)
 			kvm_x86_ops->enable_nmi_window(vcpu);
-		else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
+		else if (kvm_cpu_has_injectable_intr(vcpu) || req_int_win)
 			kvm_x86_ops->enable_irq_window(vcpu);
 
 		if (kvm_lapic_enabled(vcpu)) {
+			/* update architecture specific hints for APIC
+			 * virtual interrupt delivery */
+			if (kvm_x86_ops->update_apic_irq)
+				kvm_x86_ops->update_apic_irq(vcpu,
+					      kvm_lapic_find_highest_irr(vcpu));
 			update_cr8_intercept(vcpu);
 			kvm_lapic_sync_to_vapic(vcpu);
 		}
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index e224f7a..e8e7b6a 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -122,6 +122,8 @@ int kvm_write_guest_virt_system(struct x86_emulate_ctxt *ctxt,
 	gva_t addr, void *val, unsigned int bytes,
 	struct x86_exception *exception);
 
+void set_eoi_exitmap(struct kvm_vcpu *vcpu);
+
 extern u64 host_xcr0;
 
 extern struct static_key kvm_no_apic_vcpu;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index cbe0d68..bc0e261 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -122,6 +122,7 @@ static inline bool is_error_page(struct page *page)
 #define KVM_REQ_WATCHDOG          18
 #define KVM_REQ_MASTERCLOCK_UPDATE 19
 #define KVM_REQ_MCLOCK_INPROGRESS 20
+#define KVM_REQ_EOIBITMAP         21
 
 #define KVM_USERSPACE_IRQ_SOURCE_ID		0
 #define KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID	1
@@ -537,6 +538,7 @@ void kvm_put_guest_fpu(struct kvm_vcpu *vcpu);
 void kvm_flush_remote_tlbs(struct kvm *kvm);
 void kvm_reload_remote_mmus(struct kvm *kvm);
 void kvm_make_mclock_inprogress_request(struct kvm *kvm);
+void kvm_make_update_eoibitmap_request(struct kvm *kvm);
 
 long kvm_arch_dev_ioctl(struct file *filp,
 			unsigned int ioctl, unsigned long arg);
@@ -690,6 +692,7 @@ int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level);
 int kvm_set_irq_inatomic(struct kvm *kvm, int irq_source_id, u32 irq, int level);
 int kvm_set_msi(struct kvm_kernel_irq_routing_entry *irq_entry, struct kvm *kvm,
 		int irq_source_id, int level);
+bool kvm_irq_has_notifier(struct kvm *kvm, unsigned irqchip, unsigned pin);
 void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin);
 void kvm_register_irq_ack_notifier(struct kvm *kvm,
 				   struct kvm_irq_ack_notifier *kian);
diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index f3abbef..d747263 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -35,6 +35,7 @@
 #include <linux/hrtimer.h>
 #include <linux/io.h>
 #include <linux/slab.h>
+#include <linux/export.h>
 #include <asm/processor.h>
 #include <asm/page.h>
 #include <asm/current.h>
@@ -115,6 +116,42 @@ static void update_handled_vectors(struct kvm_ioapic *ioapic)
 	smp_wmb();
 }
 
+void set_eoi_exitmap(struct kvm_vcpu *vcpu)
+{
+	struct kvm_ioapic *ioapic = vcpu->kvm->arch.vioapic;
+	union kvm_ioapic_redirect_entry *e;
+	struct kvm_lapic_irq irqe;
+	int index;
+
+	spin_lock(&ioapic->lock);
+	/* traverse ioapic entry to set eoi exit bitmap*/
+	for (index = 0; index < IOAPIC_NUM_PINS; index++) {
+		e = &ioapic->redirtbl[index];
+		if (!e->fields.mask &&
+			(e->fields.trig_mode == IOAPIC_LEVEL_TRIG ||
+			 kvm_irq_has_notifier(ioapic->kvm, KVM_IRQCHIP_IOAPIC,
+				 index))) {
+			irqe.dest_id = e->fields.dest_id;
+			irqe.vector = e->fields.vector;
+			irqe.dest_mode = e->fields.dest_mode;
+			irqe.delivery_mode = e->fields.delivery_mode << 8;
+			kvm_x86_ops->check_ioapic_entry(vcpu, &irqe);
+
+		}
+	}
+	spin_unlock(&ioapic->lock);
+}
+EXPORT_SYMBOL_GPL(set_eoi_exitmap);
+
+void ioapic_update_eoi_exitmap(struct kvm *kvm)
+{
+	struct kvm_ioapic *ioapic = kvm->arch.vioapic;
+
+	if (!kvm_apic_vid_enabled() || !ioapic)
+		return;
+	kvm_make_update_eoibitmap_request(kvm);
+}
+
 static void ioapic_write_indirect(struct kvm_ioapic *ioapic, u32 val)
 {
 	unsigned index;
@@ -156,6 +193,7 @@ static void ioapic_write_indirect(struct kvm_ioapic *ioapic, u32 val)
 		if (e->fields.trig_mode == IOAPIC_LEVEL_TRIG
 		    && ioapic->irr & (1 << index))
 			ioapic_service(ioapic, index);
+		ioapic_update_eoi_exitmap(ioapic->kvm);
 		break;
 	}
 }
diff --git a/virt/kvm/ioapic.h b/virt/kvm/ioapic.h
index a30abfe..e8d67cf 100644
--- a/virt/kvm/ioapic.h
+++ b/virt/kvm/ioapic.h
@@ -82,5 +82,6 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
 		struct kvm_lapic_irq *irq);
 int kvm_get_ioapic(struct kvm *kvm, struct kvm_ioapic_state *state);
 int kvm_set_ioapic(struct kvm *kvm, struct kvm_ioapic_state *state);
+void ioapic_update_eoi_exitmap(struct kvm *kvm);
 
 #endif
diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index 656fa45..84195f8 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -22,6 +22,7 @@
 
 #include <linux/kvm_host.h>
 #include <linux/slab.h>
+#include <linux/export.h>
 #include <trace/events/kvm.h>
 
 #include <asm/msidef.h>
@@ -237,6 +238,28 @@ int kvm_set_irq_inatomic(struct kvm *kvm, int irq_source_id, u32 irq, int level)
 	return ret;
 }
 
+bool kvm_irq_has_notifier(struct kvm *kvm, unsigned irqchip, unsigned pin)
+{
+	struct kvm_irq_ack_notifier *kian;
+	struct hlist_node *n;
+	int gsi;
+
+	rcu_read_lock();
+	gsi = rcu_dereference(kvm->irq_routing)->chip[irqchip][pin];
+	if (gsi != -1)
+		hlist_for_each_entry_rcu(kian, n, &kvm->irq_ack_notifier_list,
+					 link)
+			if (kian->gsi == gsi) {
+				rcu_read_unlock();
+				return true;
+			}
+
+	rcu_read_unlock();
+
+	return false;
+}
+EXPORT_SYMBOL_GPL(kvm_irq_has_notifier);
+
 void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin)
 {
 	struct kvm_irq_ack_notifier *kian;
@@ -261,6 +284,7 @@ void kvm_register_irq_ack_notifier(struct kvm *kvm,
 	mutex_lock(&kvm->irq_lock);
 	hlist_add_head_rcu(&kian->link, &kvm->irq_ack_notifier_list);
 	mutex_unlock(&kvm->irq_lock);
+	ioapic_update_eoi_exitmap(kvm);
 }
 
 void kvm_unregister_irq_ack_notifier(struct kvm *kvm,
@@ -270,6 +294,7 @@ void kvm_unregister_irq_ack_notifier(struct kvm *kvm,
 	hlist_del_init_rcu(&kian->link);
 	mutex_unlock(&kvm->irq_lock);
 	synchronize_rcu();
+	ioapic_update_eoi_exitmap(kvm);
 }
 
 int kvm_request_irq_source_id(struct kvm *kvm)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index e45c20c..cc465c6 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -217,6 +217,11 @@ void kvm_make_mclock_inprogress_request(struct kvm *kvm)
 	make_all_cpus_request(kvm, KVM_REQ_MCLOCK_INPROGRESS);
 }
 
+void kvm_make_update_eoibitmap_request(struct kvm *kvm)
+{
+	make_all_cpus_request(kvm, KVM_REQ_EOIBITMAP);
+}
+
 int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
 {
 	struct page *page;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v11 0/3] x86, apicv: Add APIC virtualization support
  2013-01-16 10:21 [PATCH v11 0/3] x86, apicv: Add APIC virtualization support Yang Zhang
                   ` (2 preceding siblings ...)
  2013-01-16 10:21 ` [PATCH v11 3/3] x86, apicv: add virtual interrupt delivery support Yang Zhang
@ 2013-01-16 10:27 ` Gleb Natapov
  3 siblings, 0 replies; 34+ messages in thread
From: Gleb Natapov @ 2013-01-16 10:27 UTC (permalink / raw)
  To: Yang Zhang; +Cc: kvm, haitao.shan, mtosatti, xiantao.zhang

Please be patient. Nor me neither Marcelo reviewed previous submission.
There is not point sending a new one.

On Wed, Jan 16, 2013 at 06:21:09PM +0800, Yang Zhang wrote:
> From: Yang Zhang <yang.z.zhang@Intel.com>
> 
> APIC virtualization is a new feature which can eliminate most of VM exit
> when vcpu handle a interrupt:
> 
> APIC register virtualization:
>         APIC read access doesn't cause APIC-access VM exits.
>         APIC write becomes trap-like.
> 
> Virtual interrupt delivery:
>         Virtual interrupt delivery avoids KVM to inject vAPIC interrupts
>         manually, which is fully taken care of by the hardware.
> 
> Please refer to Intel SDM volume 3, chapter 29 for more details.
> Changes v10 to v11:
>  * Use two new msr bitmaps for guest that enabling x2apic mode:
>    Since msr bitmap is shared by all guests, it will break guest that
>    not using x2apic when updating the global msr bitmap. To solve this,
>    we use two new msr bitmap for guest which using x2apic.
> 
> Changes v9 to v10:
>  * Enable virtualize x2apic mode when guest is using x2apic and apicv:
>    There is no point to enable x2apic mode when apicv is disabled.
>  * Grep ioapic_lock when traversing ioapic entry to set eoi exit bitmap
>  * Rebased on top of KVM upstream
> 
> Changes v8 to v9:
>  * Update eoi exit bitmap by vcpu itself.
>  * Enable virtualize x2apic mode when guest is using x2apic.
>  * Rebase on top of KVM upstream
> 
> Changes v7 to v8:
>  * According Marcelo's suggestion, add comments for irr_pending and isr_count,
>    since the two valiables have different meaning when using apicv.
>  * Set highest bit in vISR to SVI after migation.
>  * Use spinlock to access eoi exit bitmap synchronously.
>  * Enable virtualize x2apic mode when guest is using x2apic
>  * Rebased on top of KVM upstream.
> 
> Changes v6 to v7:
>  * fix a bug when set exit bitmap.
>  * Rebased on top of KVM upstream.
> 
> Changes v5 to v6:
>  * minor adjustments according gleb's comments
>  * Rebased on top of KVM upstream.
> 
> Changes v4 to v5:
>  * Set eoi exit bitmap when an interrupt has notifier registered.
>  * Use request to track ioapic entry's modification.
>  * Rebased on top of KVM upstream.
> 
> Changes v3 to v4:
>  * use one option to control both register virtualization and virtual interrupt
>    delivery.
>  * Update eoi exit bitmap when programing ioapic or programing apic's id/dfr/ldr.
>  * Rebased on top of KVM upstream.
> 
> Changes v2 to v3:
>  * Drop Posted Interrupt patch from v3.
>    According Gleb's suggestion, we will use global vector for all VCPUs as notification
>    event vector. So we will rewrite the Posted Interrupt patch. And resend it later.
>  * Use TMR to set the eoi exiting bitmap. We only want to set eoi exiting bitmap for
>    those interrupt which is level trigger or has notifier in EOI write path. So TMR is
>    enough to distinguish the interrupt trigger mode.
>  * Simplify some code according Gleb's comments.
>  * rebased on top of KVM upstream.
> 
> Changes v1 to v2:
>  * Add Posted Interrupt support in this series patch.
>  * Since there is a notifer hook in vAPIC EOI for PIT interrupt. So always Set PIT
>    interrupt in eoi exit bitmap to force vmexit when EOI to interrupt.
>  * Rebased on top of KVM upstream
> 
> Yang Zhang (3):
>   x86, apicv: add APICv register virtualization support
>   x86, apicv: add virtual x2apic support
>   x86, apicv: add virtual interrupt delivery support
> 
>  arch/ia64/kvm/lapic.h           |    6 +
>  arch/x86/include/asm/kvm_host.h |    8 +
>  arch/x86/include/asm/vmx.h      |   14 ++
>  arch/x86/kvm/irq.c              |   56 ++++++-
>  arch/x86/kvm/lapic.c            |   99 +++++++----
>  arch/x86/kvm/lapic.h            |   25 +++
>  arch/x86/kvm/svm.c              |   31 ++++
>  arch/x86/kvm/vmx.c              |  366 ++++++++++++++++++++++++++++++++++++++-
>  arch/x86/kvm/x86.c              |   11 +-
>  arch/x86/kvm/x86.h              |    2 +
>  include/linux/kvm_host.h        |    3 +
>  virt/kvm/ioapic.c               |   38 ++++
>  virt/kvm/ioapic.h               |    1 +
>  virt/kvm/irq_comm.c             |   25 +++
>  virt/kvm/kvm_main.c             |    5 +
>  15 files changed, 641 insertions(+), 49 deletions(-)

--
			Gleb.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [PATCH v11 3/3] x86, apicv: add virtual interrupt delivery support
  2013-01-16 10:21 ` [PATCH v11 3/3] x86, apicv: add virtual interrupt delivery support Yang Zhang
@ 2013-01-17  1:26   ` Zhang, Yang Z
  2013-01-20 12:51     ` Gleb Natapov
  2013-01-21 21:05   ` Marcelo Tosatti
  1 sibling, 1 reply; 34+ messages in thread
From: Zhang, Yang Z @ 2013-01-17  1:26 UTC (permalink / raw)
  To: gleb, kvm; +Cc: Shan, Haitao, mtosatti, Zhang, Xiantao, Tian, Kevin

Previous patch is stale. Resend the new patch.
The only change is clear EOI and SELF-IPI reg in msr bitmap when vid is enabled.

------------------------
From: Yang Zhang <yang.z.zhang@Intel.com>

Virtual interrupt delivery avoids KVM to inject vAPIC interrupts
manually, which is fully taken care of by the hardware. This needs
some special awareness into existing interrupr injection path:

- for pending interrupt, instead of direct injection, we may need
  update architecture specific indicators before resuming to guest.

- A pending interrupt, which is masked by ISR, should be also
  considered in above update action, since hardware will decide
  when to inject it at right time. Current has_interrupt and
  get_interrupt only returns a valid vector from injection p.o.v.

Signed-off-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
---
 arch/ia64/kvm/lapic.h           |    6 ++
 arch/x86/include/asm/kvm_host.h |    7 ++
 arch/x86/include/asm/vmx.h      |   11 +++
 arch/x86/kvm/irq.c              |   56 +++++++++++-
 arch/x86/kvm/lapic.c            |   69 ++++++++------
 arch/x86/kvm/lapic.h            |   23 +++++
 arch/x86/kvm/svm.c              |   25 +++++
 arch/x86/kvm/vmx.c              |  188 +++++++++++++++++++++++++++++++++++++--
 arch/x86/kvm/x86.c              |   11 ++-
 arch/x86/kvm/x86.h              |    2 +
 include/linux/kvm_host.h        |    3 +
 virt/kvm/ioapic.c               |   38 ++++++++
 virt/kvm/ioapic.h               |    1 +
 virt/kvm/irq_comm.c             |   25 +++++
 virt/kvm/kvm_main.c             |    5 +
 15 files changed, 425 insertions(+), 45 deletions(-)

diff --git a/arch/ia64/kvm/lapic.h b/arch/ia64/kvm/lapic.h
index c5f92a9..c3e2935 100644
--- a/arch/ia64/kvm/lapic.h
+++ b/arch/ia64/kvm/lapic.h
@@ -27,4 +27,10 @@ int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq);
 #define kvm_apic_present(x) (true)
 #define kvm_lapic_enabled(x) (true)
 
+static inline bool kvm_apic_vid_enabled(void)
+{
+	/* IA64 has no apicv supporting, do nothing here */
+	return false;
+}
+
 #endif
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 35aa8e6..d90bfa8 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -697,6 +697,12 @@ struct kvm_x86_ops {
 	void (*enable_nmi_window)(struct kvm_vcpu *vcpu);
 	void (*enable_irq_window)(struct kvm_vcpu *vcpu);
 	void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);
+	int (*has_virtual_interrupt_delivery)(void);
+	void (*update_apic_irq)(struct kvm_vcpu *vcpu, int max_irr);
+	void (*update_eoi_exitmap)(struct kvm_vcpu *vcpu);
+	void (*set_svi)(int isr);
+	void (*check_ioapic_entry)(struct kvm_vcpu *vcpu,
+				   struct kvm_lapic_irq *irq);
 	void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
 	int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
 	int (*get_tdp_level)(void);
@@ -992,6 +998,7 @@ int kvm_age_hva(struct kvm *kvm, unsigned long hva);
 int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
 void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
 int cpuid_maxphyaddr(struct kvm_vcpu *vcpu);
+int kvm_cpu_has_injectable_intr(struct kvm_vcpu *v);
 int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu);
 int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu);
 int kvm_cpu_get_interrupt(struct kvm_vcpu *v);
diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 0a54df0..694586c 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -62,6 +62,7 @@
 #define EXIT_REASON_MCE_DURING_VMENTRY  41
 #define EXIT_REASON_TPR_BELOW_THRESHOLD 43
 #define EXIT_REASON_APIC_ACCESS         44
+#define EXIT_REASON_EOI_INDUCED         45
 #define EXIT_REASON_EPT_VIOLATION       48
 #define EXIT_REASON_EPT_MISCONFIG       49
 #define EXIT_REASON_WBINVD              54
@@ -144,6 +145,7 @@
 #define SECONDARY_EXEC_WBINVD_EXITING		0x00000040
 #define SECONDARY_EXEC_UNRESTRICTED_GUEST	0x00000080
 #define SECONDARY_EXEC_APIC_REGISTER_VIRT       0x00000100
+#define SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY    0x00000200
 #define SECONDARY_EXEC_PAUSE_LOOP_EXITING	0x00000400
 #define SECONDARY_EXEC_ENABLE_INVPCID		0x00001000
 
@@ -181,6 +183,7 @@ enum vmcs_field {
 	GUEST_GS_SELECTOR               = 0x0000080a,
 	GUEST_LDTR_SELECTOR             = 0x0000080c,
 	GUEST_TR_SELECTOR               = 0x0000080e,
+	GUEST_INTR_STATUS               = 0x00000810,
 	HOST_ES_SELECTOR                = 0x00000c00,
 	HOST_CS_SELECTOR                = 0x00000c02,
 	HOST_SS_SELECTOR                = 0x00000c04,
@@ -208,6 +211,14 @@ enum vmcs_field {
 	APIC_ACCESS_ADDR_HIGH		= 0x00002015,
 	EPT_POINTER                     = 0x0000201a,
 	EPT_POINTER_HIGH                = 0x0000201b,
+	EOI_EXIT_BITMAP0                = 0x0000201c,
+	EOI_EXIT_BITMAP0_HIGH           = 0x0000201d,
+	EOI_EXIT_BITMAP1                = 0x0000201e,
+	EOI_EXIT_BITMAP1_HIGH           = 0x0000201f,
+	EOI_EXIT_BITMAP2                = 0x00002020,
+	EOI_EXIT_BITMAP2_HIGH           = 0x00002021,
+	EOI_EXIT_BITMAP3                = 0x00002022,
+	EOI_EXIT_BITMAP3_HIGH           = 0x00002023,
 	GUEST_PHYSICAL_ADDRESS          = 0x00002400,
 	GUEST_PHYSICAL_ADDRESS_HIGH     = 0x00002401,
 	VMCS_LINK_POINTER               = 0x00002800,
diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
index b111aee..45a254e 100644
--- a/arch/x86/kvm/irq.c
+++ b/arch/x86/kvm/irq.c
@@ -38,6 +38,38 @@ int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
 EXPORT_SYMBOL(kvm_cpu_has_pending_timer);
 
 /*
+ * check if there is pending interrupt from
+ * non-APIC source without intack.
+ */
+static int kvm_cpu_has_extint(struct kvm_vcpu *v)
+{
+	if (kvm_apic_accept_pic_intr(v))
+		return pic_irqchip(v->kvm)->output;	/* PIC */
+	else
+		return 0;
+}
+
+/*
+ * check if there is injectable interrupt:
+ * when virtual interrupt delivery enabled,
+ * interrupt from apic will handled by hardware,
+ * we don't need to check it here.
+ */
+int kvm_cpu_has_injectable_intr(struct kvm_vcpu *v)
+{
+	if (!irqchip_in_kernel(v->kvm))
+		return v->arch.interrupt.pending;
+
+	if (kvm_cpu_has_extint(v))
+		return 1;
+
+	if (kvm_apic_vid_enabled())
+		return 0;
+
+	return kvm_apic_has_interrupt(v) != -1; /* LAPIC */
+}
+
+/*
  * check if there is pending interrupt without
  * intack.
  */
@@ -46,27 +78,41 @@ int kvm_cpu_has_interrupt(struct kvm_vcpu *v)
 	if (!irqchip_in_kernel(v->kvm))
 		return v->arch.interrupt.pending;
 
-	if (kvm_apic_accept_pic_intr(v) && pic_irqchip(v->kvm)->output)
-		return pic_irqchip(v->kvm)->output;	/* PIC */
+	if (kvm_cpu_has_extint(v))
+		return 1;
 
 	return kvm_apic_has_interrupt(v) != -1;	/* LAPIC */
 }
 EXPORT_SYMBOL_GPL(kvm_cpu_has_interrupt);
 
 /*
+ * Read pending interrupt(from non-APIC source)
+ * vector and intack.
+ */
+static int kvm_cpu_get_extint(struct kvm_vcpu *v)
+{
+	if (kvm_cpu_has_extint(v))
+		return kvm_pic_read_irq(v->kvm); /* PIC */
+	return -1;
+}
+
+/*
  * Read pending interrupt vector and intack.
  */
 int kvm_cpu_get_interrupt(struct kvm_vcpu *v)
 {
+	int vector;
+
 	if (!irqchip_in_kernel(v->kvm))
 		return v->arch.interrupt.nr;
 
-	if (kvm_apic_accept_pic_intr(v) && pic_irqchip(v->kvm)->output)
-		return kvm_pic_read_irq(v->kvm);	/* PIC */
+	vector = kvm_cpu_get_extint(v);
+
+	if (kvm_apic_vid_enabled() || vector != -1)
+		return vector;			/* PIC */
 
 	return kvm_get_apic_interrupt(v);	/* APIC */
 }
-EXPORT_SYMBOL_GPL(kvm_cpu_get_interrupt);
 
 void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu)
 {
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index f39aee3..d96515e 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -145,23 +145,6 @@ static inline int kvm_apic_id(struct kvm_lapic *apic)
 	return (kvm_apic_get_reg(apic, APIC_ID) >> 24) & 0xff;
 }
 
-static inline u16 apic_cluster_id(struct kvm_apic_map *map, u32 ldr)
-{
-	u16 cid;
-	ldr >>= 32 - map->ldr_bits;
-	cid = (ldr >> map->cid_shift) & map->cid_mask;
-
-	BUG_ON(cid >= ARRAY_SIZE(map->logical_map));
-
-	return cid;
-}
-
-static inline u16 apic_logical_id(struct kvm_apic_map *map, u32 ldr)
-{
-	ldr >>= (32 - map->ldr_bits);
-	return ldr & map->lid_mask;
-}
-
 static void recalculate_apic_map(struct kvm *kvm)
 {
 	struct kvm_apic_map *new, *old = NULL;
@@ -225,6 +208,8 @@ out:
 
 	if (old)
 		kfree_rcu(old, rcu);
+
+	ioapic_update_eoi_exitmap(kvm);
 }
 
 static inline void kvm_apic_set_id(struct kvm_lapic *apic, u8 id)
@@ -340,6 +325,8 @@ static inline int apic_find_highest_irr(struct kvm_lapic *apic)
 {
 	int result;
 
+	/* Note that irr_pending is just a hint. It will be always
+	 * true with virtual interrupt delivery enabled. */
 	if (!apic->irr_pending)
 		return -1;
 
@@ -456,6 +443,8 @@ static void pv_eoi_clr_pending(struct kvm_vcpu *vcpu)
 static inline int apic_find_highest_isr(struct kvm_lapic *apic)
 {
 	int result;
+
+	/* Note that isr_count is always 1 with vid enabled*/
 	if (!apic->isr_count)
 		return -1;
 	if (likely(apic->highest_isr_cache != -1))
@@ -735,6 +724,19 @@ int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1, struct kvm_vcpu *vcpu2)
 	return vcpu1->arch.apic_arb_prio - vcpu2->arch.apic_arb_prio;
 }
 
+static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
+{
+	if (!(kvm_apic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI) &&
+	    kvm_ioapic_handles_vector(apic->vcpu->kvm, vector)) {
+		int trigger_mode;
+		if (apic_test_vector(vector, apic->regs + APIC_TMR))
+			trigger_mode = IOAPIC_LEVEL_TRIG;
+		else
+			trigger_mode = IOAPIC_EDGE_TRIG;
+		kvm_ioapic_update_eoi(apic->vcpu->kvm, vector, trigger_mode);
+	}
+}
+
 static int apic_set_eoi(struct kvm_lapic *apic)
 {
 	int vector = apic_find_highest_isr(apic);
@@ -751,19 +753,26 @@ static int apic_set_eoi(struct kvm_lapic *apic)
 	apic_clear_isr(vector, apic);
 	apic_update_ppr(apic);
 
-	if (!(kvm_apic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI) &&
-	    kvm_ioapic_handles_vector(apic->vcpu->kvm, vector)) {
-		int trigger_mode;
-		if (apic_test_vector(vector, apic->regs + APIC_TMR))
-			trigger_mode = IOAPIC_LEVEL_TRIG;
-		else
-			trigger_mode = IOAPIC_EDGE_TRIG;
-		kvm_ioapic_update_eoi(apic->vcpu->kvm, vector, trigger_mode);
-	}
+	kvm_ioapic_send_eoi(apic, vector);
 	kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
 	return vector;
 }
 
+/*
+ * this interface assumes a trap-like exit, which has already finished
+ * desired side effect including vISR and vPPR update.
+ */
+void kvm_apic_set_eoi_accelerated(struct kvm_vcpu *vcpu, int vector)
+{
+	struct kvm_lapic *apic = vcpu->arch.apic;
+
+	trace_kvm_eoi(apic, vector);
+
+	kvm_ioapic_send_eoi(apic, vector);
+	kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
+}
+EXPORT_SYMBOL_GPL(kvm_apic_set_eoi_accelerated);
+
 static void apic_send_ipi(struct kvm_lapic *apic)
 {
 	u32 icr_low = kvm_apic_get_reg(apic, APIC_ICR);
@@ -1374,8 +1383,8 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu)
 		apic_set_reg(apic, APIC_ISR + 0x10 * i, 0);
 		apic_set_reg(apic, APIC_TMR + 0x10 * i, 0);
 	}
-	apic->irr_pending = false;
-	apic->isr_count = 0;
+	apic->irr_pending = kvm_apic_vid_enabled();
+	apic->isr_count = kvm_apic_vid_enabled();
 	apic->highest_isr_cache = -1;
 	update_divide_count(apic);
 	atomic_set(&apic->lapic_timer.pending, 0);
@@ -1590,8 +1599,10 @@ void kvm_apic_post_state_restore(struct kvm_vcpu *vcpu,
 	update_divide_count(apic);
 	start_apic_timer(apic);
 	apic->irr_pending = true;
-	apic->isr_count = count_vectors(apic->regs + APIC_ISR);
+	apic->isr_count = kvm_apic_vid_enabled() ?
+				1 : count_vectors(apic->regs + APIC_ISR);
 	apic->highest_isr_cache = -1;
+	kvm_x86_ops->set_svi(apic_find_highest_isr(apic));
 	kvm_make_request(KVM_REQ_EVENT, vcpu);
 }
 
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 22a5397..109e37c 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -65,6 +65,7 @@ u64 kvm_get_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu);
 void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu, u64 data);
 
 void kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset);
+void kvm_apic_set_eoi_accelerated(struct kvm_vcpu *vcpu, int vector);
 
 void kvm_lapic_set_vapic_addr(struct kvm_vcpu *vcpu, gpa_t vapic_addr);
 void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu);
@@ -131,4 +132,26 @@ static inline int apic_x2apic_mode(struct kvm_lapic *apic)
 	return apic->vcpu->arch.apic_base & X2APIC_ENABLE;
 }
 
+static inline bool kvm_apic_vid_enabled(void)
+{
+	return kvm_x86_ops->has_virtual_interrupt_delivery();
+}
+
+static inline u16 apic_cluster_id(struct kvm_apic_map *map, u32 ldr)
+{
+	u16 cid;
+	ldr >>= 32 - map->ldr_bits;
+	cid = (ldr >> map->cid_shift) & map->cid_mask;
+
+	BUG_ON(cid >= ARRAY_SIZE(map->logical_map));
+
+	return cid;
+}
+
+static inline u16 apic_logical_id(struct kvm_apic_map *map, u32 ldr)
+{
+	ldr >>= (32 - map->ldr_bits);
+	return ldr & map->lid_mask;
+}
+
 #endif
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 38407e9..53c31cb 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3576,6 +3576,27 @@ static void svm_set_virtual_x2apic_mode(struct kvm_vcpu *vcpu, bool set)
 	return;
 }
 
+static int svm_has_virtual_interrupt_delivery(void)
+{
+	return 0;
+}
+
+static void svm_update_eoi_exitmap(struct kvm_vcpu *vcpu)
+{
+	return;
+}
+
+static void svm_set_svi(int isr)
+{
+	return;
+}
+
+static void svm_check_ioapic_entry(struct kvm_vcpu *vcpu,
+				   struct kvm_lapic_irq *irq)
+{
+	return;
+}
+
 static int svm_nmi_allowed(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
@@ -4296,6 +4317,10 @@ static struct kvm_x86_ops svm_x86_ops = {
 	.enable_irq_window = enable_irq_window,
 	.update_cr8_intercept = update_cr8_intercept,
 	.set_virtual_x2apic_mode = svm_set_virtual_x2apic_mode,
+	.has_virtual_interrupt_delivery = svm_has_virtual_interrupt_delivery,
+	.update_eoi_exitmap = svm_update_eoi_exitmap,
+	.set_svi = svm_set_svi,
+	.check_ioapic_entry = svm_check_ioapic_entry,
 
 	.set_tss_addr = svm_set_tss_addr,
 	.get_tdp_level = get_npt_level,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 526955d..b0de04f 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -84,8 +84,8 @@ module_param(vmm_exclusive, bool, S_IRUGO);
 static bool __read_mostly fasteoi = 1;
 module_param(fasteoi, bool, S_IRUGO);
 
-static bool __read_mostly enable_apicv_reg = 1;
-module_param(enable_apicv_reg, bool, S_IRUGO);
+static bool __read_mostly enable_apicv_reg_vid = 1;
+module_param(enable_apicv_reg_vid, bool, S_IRUGO);
 
 /*
  * If nested=1, nested virtualization is supported, i.e., guests may use
@@ -433,6 +433,8 @@ struct vcpu_vmx {
 
 	bool rdtscp_enabled;
 
+	u64 eoi_exit_bitmap[4];
+
 	/* Support for a guest hypervisor (nested VMX) */
 	struct nested_vmx nested;
 };
@@ -781,6 +783,12 @@ static inline bool cpu_has_vmx_apic_register_virt(void)
 		SECONDARY_EXEC_APIC_REGISTER_VIRT;
 }
 
+static inline bool cpu_has_vmx_virtual_intr_delivery(void)
+{
+	return vmcs_config.cpu_based_2nd_exec_ctrl &
+		SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY;
+}
+
 static inline bool cpu_has_vmx_flexpriority(void)
 {
 	return cpu_has_vmx_tpr_shadow() &&
@@ -2570,7 +2578,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
 			SECONDARY_EXEC_PAUSE_LOOP_EXITING |
 			SECONDARY_EXEC_RDTSCP |
 			SECONDARY_EXEC_ENABLE_INVPCID |
-			SECONDARY_EXEC_APIC_REGISTER_VIRT;
+			SECONDARY_EXEC_APIC_REGISTER_VIRT |
+			SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY;
 		if (adjust_vmx_controls(min2, opt2,
 					MSR_IA32_VMX_PROCBASED_CTLS2,
 					&_cpu_based_2nd_exec_control) < 0)
@@ -2584,7 +2593,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
 
 	if (!(_cpu_based_exec_control & CPU_BASED_TPR_SHADOW))
 		_cpu_based_2nd_exec_control &= ~(
-				SECONDARY_EXEC_APIC_REGISTER_VIRT);
+				SECONDARY_EXEC_APIC_REGISTER_VIRT |
+				SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY);
 
 	if (_cpu_based_2nd_exec_control & SECONDARY_EXEC_ENABLE_EPT) {
 		/* CR3 accesses and invlpg don't need to cause VM Exits when EPT
@@ -2783,8 +2793,14 @@ static __init int hardware_setup(void)
 	if (!cpu_has_vmx_ple())
 		ple_gap = 0;
 
-	if (!cpu_has_vmx_apic_register_virt())
-		enable_apicv_reg = 0;
+	if (!cpu_has_vmx_apic_register_virt() ||
+				!cpu_has_vmx_virtual_intr_delivery())
+		enable_apicv_reg_vid = 0;
+
+	if (enable_apicv_reg_vid)
+		kvm_x86_ops->update_cr8_intercept = NULL;
+	else
+		kvm_x86_ops->update_apic_irq = NULL;
 
 	if (nested)
 		nested_vmx_setup_ctls_msrs();
@@ -3948,8 +3964,9 @@ static u32 vmx_secondary_exec_control(struct vcpu_vmx *vmx)
 		exec_control &= ~SECONDARY_EXEC_UNRESTRICTED_GUEST;
 	if (!ple_gap)
 		exec_control &= ~SECONDARY_EXEC_PAUSE_LOOP_EXITING;
-	if (!enable_apicv_reg)
-		exec_control &= ~SECONDARY_EXEC_APIC_REGISTER_VIRT;
+	if (!enable_apicv_reg_vid)
+		exec_control &= ~(SECONDARY_EXEC_APIC_REGISTER_VIRT |
+				  SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY);
 	exec_control &= ~SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE;
 	return exec_control;
 }
@@ -3995,6 +4012,15 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
 				vmx_secondary_exec_control(vmx));
 	}
 
+	if (enable_apicv_reg_vid) {
+		vmcs_write64(EOI_EXIT_BITMAP0, 0);
+		vmcs_write64(EOI_EXIT_BITMAP1, 0);
+		vmcs_write64(EOI_EXIT_BITMAP2, 0);
+		vmcs_write64(EOI_EXIT_BITMAP3, 0);
+
+		vmcs_write16(GUEST_INTR_STATUS, 0);
+	}
+
 	if (ple_gap) {
 		vmcs_write32(PLE_GAP, ple_gap);
 		vmcs_write32(PLE_WINDOW, ple_window);
@@ -4912,6 +4938,16 @@ static int handle_apic_access(struct kvm_vcpu *vcpu)
 	return emulate_instruction(vcpu, 0) == EMULATE_DONE;
 }
 
+static int handle_apic_eoi_induced(struct kvm_vcpu *vcpu)
+{
+	unsigned long exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
+	int vector = exit_qualification & 0xff;
+
+	/* EOI-induced VM exit is trap-like and thus no need to adjust IP */
+	kvm_apic_set_eoi_accelerated(vcpu, vector);
+	return 1;
+}
+
 static int handle_apic_write(struct kvm_vcpu *vcpu)
 {
 	unsigned long exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
@@ -5857,6 +5893,7 @@ static int (*const kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
 	[EXIT_REASON_TPR_BELOW_THRESHOLD]     = handle_tpr_below_threshold,
 	[EXIT_REASON_APIC_ACCESS]             = handle_apic_access,
 	[EXIT_REASON_APIC_WRITE]              = handle_apic_write,
+	[EXIT_REASON_EOI_INDUCED]             = handle_apic_eoi_induced,
 	[EXIT_REASON_WBINVD]                  = handle_wbinvd,
 	[EXIT_REASON_XSETBV]                  = handle_xsetbv,
 	[EXIT_REASON_TASK_SWITCH]             = handle_task_switch,
@@ -6214,7 +6251,7 @@ static void vmx_set_virtual_x2apic_mode(struct kvm_vcpu *vcpu, bool set)
 
 	/* There is not point to enable virtualize x2apic without enable
 	 * apicv*/
-	if (!cpu_has_vmx_virtualize_x2apic_mode() || !enable_apicv_reg)
+	if (!cpu_has_vmx_virtualize_x2apic_mode() || !enable_apicv_reg_vid)
 		return;
 
 	if (set) {
@@ -6249,10 +6286,138 @@ static void vmx_set_virtual_x2apic_mode(struct kvm_vcpu *vcpu, bool set)
 		vmx_intercept_for_msr_read_x2apic(0x839, true);
 		/* TPR */
 		vmx_intercept_for_msr_write_x2apic(0x808, false);
+		/* EOI */
+		vmx_intercept_for_msr_write_x2apic(0x80b, false);
+		/* SELF-IPI */
+		vmx_intercept_for_msr_write_x2apic(0x83f, false);
 	}
 	vmx_set_msr_bitmap(vcpu);
 }
 
+static int vmx_has_virtual_interrupt_delivery(void)
+{
+	return enable_apicv_reg_vid;
+}
+
+static void vmx_set_svi(int isr)
+{
+	u16 status;
+	u8 old;
+
+	if (!enable_apicv_reg_vid)
+		return;
+
+	if (isr == -1)
+		isr = 0;
+
+	status = vmcs_read16(GUEST_INTR_STATUS);
+	old = status >> 8;
+	if (isr != old) {
+		status &= 0xff;
+		status |= isr << 8;
+		vmcs_write16(GUEST_INTR_STATUS, status);
+	}
+}
+
+static void vmx_set_rvi(int vector)
+{
+	u16 status;
+	u8 old;
+
+	status = vmcs_read16(GUEST_INTR_STATUS);
+	old = (u8)status & 0xff;
+	if ((u8)vector != old) {
+		status &= ~0xff;
+		status |= (u8)vector;
+		vmcs_write16(GUEST_INTR_STATUS, status);
+	}
+}
+
+static void vmx_update_apic_irq(struct kvm_vcpu *vcpu, int max_irr)
+{
+	if (max_irr == -1)
+		return;
+
+	vmx_set_rvi(max_irr);
+}
+
+static void set_eoi_exitmap_one(struct kvm_vcpu *vcpu,
+				u32 vector)
+{
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+	if (WARN_ONCE((vector > 255),
+		"KVM VMX: vector (%d) out of range\n", vector))
+		return;
+
+	__set_bit(vector, (unsigned long *)vmx->eoi_exit_bitmap);
+}
+
+static void vmx_check_ioapic_entry(struct kvm_vcpu *vcpu,
+				   struct kvm_lapic_irq *irq)
+{
+	struct kvm_lapic **dst;
+	struct kvm_apic_map *map;
+	unsigned long bitmap = 1;
+	int i;
+
+	rcu_read_lock();
+	map = rcu_dereference(vcpu->kvm->arch.apic_map);
+
+	if (unlikely(!map)) {
+		set_eoi_exitmap_one(vcpu, irq->vector);
+		goto out;
+	}
+
+	if (irq->dest_mode == 0) { /* physical mode */
+		if (irq->delivery_mode == APIC_DM_LOWEST ||
+				irq->dest_id == 0xff) {
+			set_eoi_exitmap_one(vcpu, irq->vector);
+			goto out;
+		}
+		dst = &map->phys_map[irq->dest_id & 0xff];
+	} else {
+		u32 mda = irq->dest_id << (32 - map->ldr_bits);
+
+		dst = map->logical_map[apic_cluster_id(map, mda)];
+
+		bitmap = apic_logical_id(map, mda);
+	}
+
+	for_each_set_bit(i, &bitmap, 16) {
+		if (!dst[i])
+			continue;
+		if (dst[i]->vcpu == vcpu) {
+			set_eoi_exitmap_one(vcpu, irq->vector);
+			break;
+		}
+	}
+
+out:
+	rcu_read_unlock();
+}
+
+static void vmx_load_eoi_exitmap(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+	vmcs_write64(EOI_EXIT_BITMAP0, vmx->eoi_exit_bitmap[0]);
+	vmcs_write64(EOI_EXIT_BITMAP1, vmx->eoi_exit_bitmap[1]);
+	vmcs_write64(EOI_EXIT_BITMAP2, vmx->eoi_exit_bitmap[2]);
+	vmcs_write64(EOI_EXIT_BITMAP3, vmx->eoi_exit_bitmap[3]);
+}
+
+static void vmx_update_eoi_exitmap(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+	/* clear eoi exit bitmap */
+	memset(vmx->eoi_exit_bitmap, 0, 32);
+
+	set_eoi_exitmap(vcpu);
+	vmx_load_eoi_exitmap(vcpu);
+}
+
 static void vmx_complete_atomic_exit(struct vcpu_vmx *vmx)
 {
 	u32 exit_intr_info;
@@ -7517,6 +7682,11 @@ static struct kvm_x86_ops vmx_x86_ops = {
 	.enable_irq_window = enable_irq_window,
 	.update_cr8_intercept = update_cr8_intercept,
 	.set_virtual_x2apic_mode = vmx_set_virtual_x2apic_mode,
+	.has_virtual_interrupt_delivery = vmx_has_virtual_interrupt_delivery,
+	.check_ioapic_entry = vmx_check_ioapic_entry,
+	.update_apic_irq = vmx_update_apic_irq,
+	.update_eoi_exitmap = vmx_update_eoi_exitmap,
+	.set_svi = vmx_set_svi,
 
 	.set_tss_addr = vmx_set_tss_addr,
 	.get_tdp_level = get_ept_level,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1c9c834..1f5b5ac 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5527,7 +5527,7 @@ static void inject_pending_event(struct kvm_vcpu *vcpu)
 			vcpu->arch.nmi_injected = true;
 			kvm_x86_ops->set_nmi(vcpu);
 		}
-	} else if (kvm_cpu_has_interrupt(vcpu)) {
+	} else if (kvm_cpu_has_injectable_intr(vcpu)) {
 		if (kvm_x86_ops->interrupt_allowed(vcpu)) {
 			kvm_queue_interrupt(vcpu, kvm_cpu_get_interrupt(vcpu),
 					    false);
@@ -5648,6 +5648,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 			kvm_handle_pmu_event(vcpu);
 		if (kvm_check_request(KVM_REQ_PMI, vcpu))
 			kvm_deliver_pmi(vcpu);
+		if (kvm_check_request(KVM_REQ_EOIBITMAP, vcpu))
+			kvm_x86_ops->update_eoi_exitmap(vcpu);
 	}
 
 	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
@@ -5656,10 +5658,15 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 		/* enable NMI/IRQ window open exits if needed */
 		if (vcpu->arch.nmi_pending)
 			kvm_x86_ops->enable_nmi_window(vcpu);
-		else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
+		else if (kvm_cpu_has_injectable_intr(vcpu) || req_int_win)
 			kvm_x86_ops->enable_irq_window(vcpu);
 
 		if (kvm_lapic_enabled(vcpu)) {
+			/* update architecture specific hints for APIC
+			 * virtual interrupt delivery */
+			if (kvm_x86_ops->update_apic_irq)
+				kvm_x86_ops->update_apic_irq(vcpu,
+					      kvm_lapic_find_highest_irr(vcpu));
 			update_cr8_intercept(vcpu);
 			kvm_lapic_sync_to_vapic(vcpu);
 		}
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index e224f7a..e8e7b6a 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -122,6 +122,8 @@ int kvm_write_guest_virt_system(struct x86_emulate_ctxt *ctxt,
 	gva_t addr, void *val, unsigned int bytes,
 	struct x86_exception *exception);
 
+void set_eoi_exitmap(struct kvm_vcpu *vcpu);
+
 extern u64 host_xcr0;
 
 extern struct static_key kvm_no_apic_vcpu;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index cbe0d68..bc0e261 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -122,6 +122,7 @@ static inline bool is_error_page(struct page *page)
 #define KVM_REQ_WATCHDOG          18
 #define KVM_REQ_MASTERCLOCK_UPDATE 19
 #define KVM_REQ_MCLOCK_INPROGRESS 20
+#define KVM_REQ_EOIBITMAP         21
 
 #define KVM_USERSPACE_IRQ_SOURCE_ID		0
 #define KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID	1
@@ -537,6 +538,7 @@ void kvm_put_guest_fpu(struct kvm_vcpu *vcpu);
 void kvm_flush_remote_tlbs(struct kvm *kvm);
 void kvm_reload_remote_mmus(struct kvm *kvm);
 void kvm_make_mclock_inprogress_request(struct kvm *kvm);
+void kvm_make_update_eoibitmap_request(struct kvm *kvm);
 
 long kvm_arch_dev_ioctl(struct file *filp,
 			unsigned int ioctl, unsigned long arg);
@@ -690,6 +692,7 @@ int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level);
 int kvm_set_irq_inatomic(struct kvm *kvm, int irq_source_id, u32 irq, int level);
 int kvm_set_msi(struct kvm_kernel_irq_routing_entry *irq_entry, struct kvm *kvm,
 		int irq_source_id, int level);
+bool kvm_irq_has_notifier(struct kvm *kvm, unsigned irqchip, unsigned pin);
 void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin);
 void kvm_register_irq_ack_notifier(struct kvm *kvm,
 				   struct kvm_irq_ack_notifier *kian);
diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index f3abbef..d747263 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -35,6 +35,7 @@
 #include <linux/hrtimer.h>
 #include <linux/io.h>
 #include <linux/slab.h>
+#include <linux/export.h>
 #include <asm/processor.h>
 #include <asm/page.h>
 #include <asm/current.h>
@@ -115,6 +116,42 @@ static void update_handled_vectors(struct kvm_ioapic *ioapic)
 	smp_wmb();
 }
 
+void set_eoi_exitmap(struct kvm_vcpu *vcpu)
+{
+	struct kvm_ioapic *ioapic = vcpu->kvm->arch.vioapic;
+	union kvm_ioapic_redirect_entry *e;
+	struct kvm_lapic_irq irqe;
+	int index;
+
+	spin_lock(&ioapic->lock);
+	/* traverse ioapic entry to set eoi exit bitmap*/
+	for (index = 0; index < IOAPIC_NUM_PINS; index++) {
+		e = &ioapic->redirtbl[index];
+		if (!e->fields.mask &&
+			(e->fields.trig_mode == IOAPIC_LEVEL_TRIG ||
+			 kvm_irq_has_notifier(ioapic->kvm, KVM_IRQCHIP_IOAPIC,
+				 index))) {
+			irqe.dest_id = e->fields.dest_id;
+			irqe.vector = e->fields.vector;
+			irqe.dest_mode = e->fields.dest_mode;
+			irqe.delivery_mode = e->fields.delivery_mode << 8;
+			kvm_x86_ops->check_ioapic_entry(vcpu, &irqe);
+
+		}
+	}
+	spin_unlock(&ioapic->lock);
+}
+EXPORT_SYMBOL_GPL(set_eoi_exitmap);
+
+void ioapic_update_eoi_exitmap(struct kvm *kvm)
+{
+	struct kvm_ioapic *ioapic = kvm->arch.vioapic;
+
+	if (!kvm_apic_vid_enabled() || !ioapic)
+		return;
+	kvm_make_update_eoibitmap_request(kvm);
+}
+
 static void ioapic_write_indirect(struct kvm_ioapic *ioapic, u32 val)
 {
 	unsigned index;
@@ -156,6 +193,7 @@ static void ioapic_write_indirect(struct kvm_ioapic *ioapic, u32 val)
 		if (e->fields.trig_mode == IOAPIC_LEVEL_TRIG
 		    && ioapic->irr & (1 << index))
 			ioapic_service(ioapic, index);
+		ioapic_update_eoi_exitmap(ioapic->kvm);
 		break;
 	}
 }
diff --git a/virt/kvm/ioapic.h b/virt/kvm/ioapic.h
index a30abfe..e8d67cf 100644
--- a/virt/kvm/ioapic.h
+++ b/virt/kvm/ioapic.h
@@ -82,5 +82,6 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
 		struct kvm_lapic_irq *irq);
 int kvm_get_ioapic(struct kvm *kvm, struct kvm_ioapic_state *state);
 int kvm_set_ioapic(struct kvm *kvm, struct kvm_ioapic_state *state);
+void ioapic_update_eoi_exitmap(struct kvm *kvm);
 
 #endif
diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index 656fa45..84195f8 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -22,6 +22,7 @@
 
 #include <linux/kvm_host.h>
 #include <linux/slab.h>
+#include <linux/export.h>
 #include <trace/events/kvm.h>
 
 #include <asm/msidef.h>
@@ -237,6 +238,28 @@ int kvm_set_irq_inatomic(struct kvm *kvm, int irq_source_id, u32 irq, int level)
 	return ret;
 }
 
+bool kvm_irq_has_notifier(struct kvm *kvm, unsigned irqchip, unsigned pin)
+{
+	struct kvm_irq_ack_notifier *kian;
+	struct hlist_node *n;
+	int gsi;
+
+	rcu_read_lock();
+	gsi = rcu_dereference(kvm->irq_routing)->chip[irqchip][pin];
+	if (gsi != -1)
+		hlist_for_each_entry_rcu(kian, n, &kvm->irq_ack_notifier_list,
+					 link)
+			if (kian->gsi == gsi) {
+				rcu_read_unlock();
+				return true;
+			}
+
+	rcu_read_unlock();
+
+	return false;
+}
+EXPORT_SYMBOL_GPL(kvm_irq_has_notifier);
+
 void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin)
 {
 	struct kvm_irq_ack_notifier *kian;
@@ -261,6 +284,7 @@ void kvm_register_irq_ack_notifier(struct kvm *kvm,
 	mutex_lock(&kvm->irq_lock);
 	hlist_add_head_rcu(&kian->link, &kvm->irq_ack_notifier_list);
 	mutex_unlock(&kvm->irq_lock);
+	ioapic_update_eoi_exitmap(kvm);
 }
 
 void kvm_unregister_irq_ack_notifier(struct kvm *kvm,
@@ -270,6 +294,7 @@ void kvm_unregister_irq_ack_notifier(struct kvm *kvm,
 	hlist_del_init_rcu(&kian->link);
 	mutex_unlock(&kvm->irq_lock);
 	synchronize_rcu();
+	ioapic_update_eoi_exitmap(kvm);
 }
 
 int kvm_request_irq_source_id(struct kvm *kvm)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index e45c20c..cc465c6 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -217,6 +217,11 @@ void kvm_make_mclock_inprogress_request(struct kvm *kvm)
 	make_all_cpus_request(kvm, KVM_REQ_MCLOCK_INPROGRESS);
 }
 
+void kvm_make_update_eoibitmap_request(struct kvm *kvm)
+{
+	make_all_cpus_request(kvm, KVM_REQ_EOIBITMAP);
+}
+
 int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
 {
 	struct page *page;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v11 2/3] x86, apicv: add virtual x2apic support
  2013-01-16 10:21 ` [PATCH v11 2/3] x86, apicv: add virtual x2apic support Yang Zhang
@ 2013-01-17 13:22   ` Gleb Natapov
  2013-01-18  1:49     ` Zhang, Yang Z
  2013-01-21 19:59   ` Marcelo Tosatti
  1 sibling, 1 reply; 34+ messages in thread
From: Gleb Natapov @ 2013-01-17 13:22 UTC (permalink / raw)
  To: Yang Zhang; +Cc: kvm, haitao.shan, mtosatti, xiantao.zhang, Kevin Tian

On Wed, Jan 16, 2013 at 06:21:11PM +0800, Yang Zhang wrote:
> From: Yang Zhang <yang.z.zhang@Intel.com>
> 
> basically to benefit from apicv, we need to enable virtualized x2apic mode.
> Currently, we only enable it when guest is really using x2apic.
> 
> Also, clear MSR bitmap for corresponding x2apic MSRs when guest enabled x2apic:
>     0x800 - 0x8ff: no read intercept for apicv register virtualization,
>                    except APIC ID and TMCCT which need software's assistance to
> 		   get right value.
> 
> Signed-off-by: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
> ---
>  arch/x86/include/asm/kvm_host.h |    1 +
>  arch/x86/include/asm/vmx.h      |    1 +
>  arch/x86/kvm/lapic.c            |   20 ++--
>  arch/x86/kvm/lapic.h            |    5 +
>  arch/x86/kvm/svm.c              |    6 +
>  arch/x86/kvm/vmx.c              |  204 +++++++++++++++++++++++++++++++++++----
>  6 files changed, 209 insertions(+), 28 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index c431b33..35aa8e6 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -697,6 +697,7 @@ struct kvm_x86_ops {
>  	void (*enable_nmi_window)(struct kvm_vcpu *vcpu);
>  	void (*enable_irq_window)(struct kvm_vcpu *vcpu);
>  	void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);
> +	void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
>  	int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
>  	int (*get_tdp_level)(void);
>  	u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
> diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
> index 44c3f7e..0a54df0 100644
> --- a/arch/x86/include/asm/vmx.h
> +++ b/arch/x86/include/asm/vmx.h
> @@ -139,6 +139,7 @@
>  #define SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES 0x00000001
>  #define SECONDARY_EXEC_ENABLE_EPT               0x00000002
>  #define SECONDARY_EXEC_RDTSCP			0x00000008
> +#define SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE   0x00000010
>  #define SECONDARY_EXEC_ENABLE_VPID              0x00000020
>  #define SECONDARY_EXEC_WBINVD_EXITING		0x00000040
>  #define SECONDARY_EXEC_UNRESTRICTED_GUEST	0x00000080
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 0664c13..f39aee3 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -140,11 +140,6 @@ static inline int apic_enabled(struct kvm_lapic *apic)
>  	(LVT_MASK | APIC_MODE_MASK | APIC_INPUT_POLARITY | \
>  	 APIC_LVT_REMOTE_IRR | APIC_LVT_LEVEL_TRIGGER)
>  
> -static inline int apic_x2apic_mode(struct kvm_lapic *apic)
> -{
> -	return apic->vcpu->arch.apic_base & X2APIC_ENABLE;
> -}
> -
>  static inline int kvm_apic_id(struct kvm_lapic *apic)
>  {
>  	return (kvm_apic_get_reg(apic, APIC_ID) >> 24) & 0xff;
> @@ -1323,12 +1318,17 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value)
>  	if (!kvm_vcpu_is_bsp(apic->vcpu))
>  		value &= ~MSR_IA32_APICBASE_BSP;
>  
> -	vcpu->arch.apic_base = value;
> -	if (apic_x2apic_mode(apic)) {
> -		u32 id = kvm_apic_id(apic);
> -		u32 ldr = ((id >> 4) << 16) | (1 << (id & 0xf));
> -		kvm_apic_set_ldr(apic, ldr);
> +	if ((vcpu->arch.apic_base ^ value) & X2APIC_ENABLE) {
> +		if (value & X2APIC_ENABLE) {
> +			u32 id = kvm_apic_id(apic);
> +			u32 ldr = ((id >> 4) << 16) | (1 << (id & 0xf));
> +			kvm_apic_set_ldr(apic, ldr);
> +			kvm_x86_ops->set_virtual_x2apic_mode(vcpu, true);
> +		} else
> +			kvm_x86_ops->set_virtual_x2apic_mode(vcpu, false);
>  	}
> +
> +	vcpu->arch.apic_base = value;
>  	apic->base_address = apic->vcpu->arch.apic_base &
>  			     MSR_IA32_APICBASE_BASE;
>  
> diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
> index 9a8ee22..22a5397 100644
> --- a/arch/x86/kvm/lapic.h
> +++ b/arch/x86/kvm/lapic.h
> @@ -126,4 +126,9 @@ static inline int kvm_lapic_enabled(struct kvm_vcpu *vcpu)
>  	return kvm_apic_present(vcpu) && kvm_apic_sw_enabled(vcpu->arch.apic);
>  }
>  
> +static inline int apic_x2apic_mode(struct kvm_lapic *apic)
> +{
> +	return apic->vcpu->arch.apic_base & X2APIC_ENABLE;
> +}
> +
>  #endif
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index d29d3cd..38407e9 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -3571,6 +3571,11 @@ static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
>  		set_cr_intercept(svm, INTERCEPT_CR8_WRITE);
>  }
>  
> +static void svm_set_virtual_x2apic_mode(struct kvm_vcpu *vcpu, bool set)
> +{
> +	return;
> +}
> +
>  static int svm_nmi_allowed(struct kvm_vcpu *vcpu)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
> @@ -4290,6 +4295,7 @@ static struct kvm_x86_ops svm_x86_ops = {
>  	.enable_nmi_window = enable_nmi_window,
>  	.enable_irq_window = enable_irq_window,
>  	.update_cr8_intercept = update_cr8_intercept,
> +	.set_virtual_x2apic_mode = svm_set_virtual_x2apic_mode,
>  
>  	.set_tss_addr = svm_set_tss_addr,
>  	.get_tdp_level = get_npt_level,
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 0403634..526955d 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -643,6 +643,8 @@ static unsigned long *vmx_io_bitmap_a;
>  static unsigned long *vmx_io_bitmap_b;
>  static unsigned long *vmx_msr_bitmap_legacy;
>  static unsigned long *vmx_msr_bitmap_longmode;
> +static unsigned long *vmx_msr_bitmap_legacy_x2apic;
> +static unsigned long *vmx_msr_bitmap_longmode_x2apic;
>  
>  static bool cpu_has_load_ia32_efer;
>  static bool cpu_has_load_perf_global_ctrl;
> @@ -767,6 +769,12 @@ static inline bool cpu_has_vmx_virtualize_apic_accesses(void)
>  		SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
>  }
>  
> +static inline bool cpu_has_vmx_virtualize_x2apic_mode(void)
> +{
> +	return vmcs_config.cpu_based_2nd_exec_ctrl &
> +		SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE;
> +}
> +
>  static inline bool cpu_has_vmx_apic_register_virt(void)
>  {
>  	return vmcs_config.cpu_based_2nd_exec_ctrl &
> @@ -1830,6 +1838,24 @@ static void move_msr_up(struct vcpu_vmx *vmx, int from, int to)
>  	vmx->guest_msrs[from] = tmp;
>  }
>  
> +static void vmx_set_msr_bitmap(struct kvm_vcpu *vcpu)
> +{
> +	unsigned long *msr_bitmap;
> +
> +	if (apic_x2apic_mode(vcpu->arch.apic))
> +		if (is_long_mode(vcpu))
> +			msr_bitmap = vmx_msr_bitmap_longmode_x2apic;
> +		else
> +			msr_bitmap = vmx_msr_bitmap_legacy_x2apic;
> +	else
> +		if (is_long_mode(vcpu))
> +			msr_bitmap = vmx_msr_bitmap_longmode;
> +		else
> +			msr_bitmap = vmx_msr_bitmap_legacy;
> +
> +	vmcs_write64(MSR_BITMAP, __pa(msr_bitmap));
> +}
> +
>  /*
>   * Set up the vmcs to automatically save and restore system
>   * msrs.  Don't touch the 64-bit msrs if the guest is in legacy
> @@ -1838,7 +1864,6 @@ static void move_msr_up(struct vcpu_vmx *vmx, int from, int to)
>  static void setup_msrs(struct vcpu_vmx *vmx)
>  {
>  	int save_nmsrs, index;
> -	unsigned long *msr_bitmap;
>  
>  	save_nmsrs = 0;
>  #ifdef CONFIG_X86_64
> @@ -1870,14 +1895,8 @@ static void setup_msrs(struct vcpu_vmx *vmx)
>  
>  	vmx->save_nmsrs = save_nmsrs;
>  
> -	if (cpu_has_vmx_msr_bitmap()) {
> -		if (is_long_mode(&vmx->vcpu))
> -			msr_bitmap = vmx_msr_bitmap_longmode;
> -		else
> -			msr_bitmap = vmx_msr_bitmap_legacy;
> -
> -		vmcs_write64(MSR_BITMAP, __pa(msr_bitmap));
> -	}
> +	if (cpu_has_vmx_msr_bitmap())
> +		vmx_set_msr_bitmap(&vmx->vcpu);
>  }
>  
>  /*
> @@ -2543,6 +2562,7 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
>  	if (_cpu_based_exec_control & CPU_BASED_ACTIVATE_SECONDARY_CONTROLS) {
>  		min2 = 0;
>  		opt2 = SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES |
> +			SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE |
>  			SECONDARY_EXEC_WBINVD_EXITING |
>  			SECONDARY_EXEC_ENABLE_VPID |
>  			SECONDARY_EXEC_ENABLE_EPT |
> @@ -3724,7 +3744,10 @@ static void free_vpid(struct vcpu_vmx *vmx)
>  	spin_unlock(&vmx_vpid_lock);
>  }
>  
> -static void __vmx_disable_intercept_for_msr(unsigned long *msr_bitmap, u32 msr)
> +#define MSR_TYPE_R	1
> +#define MSR_TYPE_W	2
> +static void __vmx_disable_intercept_for_msr(unsigned long *msr_bitmap,
> +						u32 msr, int type)
>  {
>  	int f = sizeof(unsigned long);
>  
> @@ -3737,20 +3760,99 @@ static void __vmx_disable_intercept_for_msr(unsigned long *msr_bitmap, u32 msr)
>  	 * We can control MSRs 0x00000000-0x00001fff and 0xc0000000-0xc0001fff.
>  	 */
>  	if (msr <= 0x1fff) {
> -		__clear_bit(msr, msr_bitmap + 0x000 / f); /* read-low */
> -		__clear_bit(msr, msr_bitmap + 0x800 / f); /* write-low */
> +		if (type & MSR_TYPE_R)
> +			/* read-low */
> +			__clear_bit(msr, msr_bitmap + 0x000 / f);
> +
> +		if (type & MSR_TYPE_W)
> +			/* write-low */
> +			__clear_bit(msr, msr_bitmap + 0x800 / f);
> +
>  	} else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff)) {
>  		msr &= 0x1fff;
> -		__clear_bit(msr, msr_bitmap + 0x400 / f); /* read-high */
> -		__clear_bit(msr, msr_bitmap + 0xc00 / f); /* write-high */
> +		if (type & MSR_TYPE_R)
> +			/* read-high */
> +			__clear_bit(msr, msr_bitmap + 0x400 / f);
> +
> +		if (type & MSR_TYPE_W)
> +			/* write-high */
> +			__clear_bit(msr, msr_bitmap + 0xc00 / f);
> +
> +	}
> +}
> +
> +static void __vmx_enable_intercept_for_msr(unsigned long *msr_bitmap,
> +						u32 msr, int type)
> +{
> +	int f = sizeof(unsigned long);
> +
> +	if (!cpu_has_vmx_msr_bitmap())
> +		return;
> +
> +	/*
> +	 * See Intel PRM Vol. 3, 20.6.9 (MSR-Bitmap Address). Early manuals
> +	 * have the write-low and read-high bitmap offsets the wrong way round.
> +	 * We can control MSRs 0x00000000-0x00001fff and 0xc0000000-0xc0001fff.
> +	 */
> +	if (msr <= 0x1fff) {
> +		if (type & MSR_TYPE_R)
> +			/* read-low */
> +			__set_bit(msr, msr_bitmap + 0x000 / f);
> +
> +		if (type & MSR_TYPE_W)
> +			/* write-low */
> +			__set_bit(msr, msr_bitmap + 0x800 / f);
> +
> +	} else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff)) {
> +		msr &= 0x1fff;
> +		if (type & MSR_TYPE_R)
> +			/* read-high */
> +			__set_bit(msr, msr_bitmap + 0x400 / f);
> +
> +		if (type & MSR_TYPE_W)
> +			/* write-high */
> +			__set_bit(msr, msr_bitmap + 0xc00 / f);
> +
>  	}
>  }
>  
>  static void vmx_disable_intercept_for_msr(u32 msr, bool longmode_only)
>  {
>  	if (!longmode_only)
> -		__vmx_disable_intercept_for_msr(vmx_msr_bitmap_legacy, msr);
> -	__vmx_disable_intercept_for_msr(vmx_msr_bitmap_longmode, msr);
> +		__vmx_disable_intercept_for_msr(vmx_msr_bitmap_legacy,
> +						msr, MSR_TYPE_R | MSR_TYPE_W);
> +	__vmx_disable_intercept_for_msr(vmx_msr_bitmap_longmode,
> +						msr, MSR_TYPE_R | MSR_TYPE_W);
> +}
> +
> +static void vmx_intercept_for_msr_read_x2apic(u32 msr, bool set)
> +{
> +	if (set) {
> +		__vmx_enable_intercept_for_msr(vmx_msr_bitmap_legacy_x2apic,
> +				msr, MSR_TYPE_R);
> +		__vmx_enable_intercept_for_msr(vmx_msr_bitmap_longmode_x2apic,
> +				msr, MSR_TYPE_R);
> +	} else {
> +		__vmx_disable_intercept_for_msr(vmx_msr_bitmap_legacy_x2apic,
> +				msr, MSR_TYPE_R);
> +		__vmx_disable_intercept_for_msr(vmx_msr_bitmap_longmode_x2apic,
> +				msr, MSR_TYPE_R);
> +	}
> +}
> +
> +static void vmx_intercept_for_msr_write_x2apic(u32 msr, bool set)
> +{
> +	if (set) {
> +		__vmx_enable_intercept_for_msr(vmx_msr_bitmap_legacy_x2apic,
> +				msr, MSR_TYPE_W);
> +		__vmx_enable_intercept_for_msr(vmx_msr_bitmap_longmode_x2apic,
> +				msr, MSR_TYPE_W);
> +	} else {
> +		__vmx_disable_intercept_for_msr(vmx_msr_bitmap_legacy_x2apic,
> +				msr, MSR_TYPE_W);
> +		__vmx_disable_intercept_for_msr(vmx_msr_bitmap_longmode_x2apic,
> +				msr, MSR_TYPE_W);
> +	}
>  }
>  
>  /*
> @@ -3848,6 +3950,7 @@ static u32 vmx_secondary_exec_control(struct vcpu_vmx *vmx)
>  		exec_control &= ~SECONDARY_EXEC_PAUSE_LOOP_EXITING;
>  	if (!enable_apicv_reg)
>  		exec_control &= ~SECONDARY_EXEC_APIC_REGISTER_VIRT;
> +	exec_control &= ~SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE;
>  	return exec_control;
>  }
>  
> @@ -6103,6 +6206,53 @@ static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
>  	vmcs_write32(TPR_THRESHOLD, irr);
>  }
>  
> +static void vmx_set_virtual_x2apic_mode(struct kvm_vcpu *vcpu, bool set)
> +{
> +	u32 exec_control, sec_exec_control;
> +	int msr;
> +	struct vcpu_vmx *vmx = to_vmx(vcpu);
> +
> +	/* There is not point to enable virtualize x2apic without enable
> +	 * apicv*/
> +	if (!cpu_has_vmx_virtualize_x2apic_mode() || !enable_apicv_reg)
> +		return;
> +
> +	if (set) {
> +		exec_control = vmcs_read32(CPU_BASED_VM_EXEC_CONTROL);
> +		/* virtualize x2apic mode relies on tpr shadow */
> +		if (!(exec_control & CPU_BASED_TPR_SHADOW))
> +			return;
> +	}
> +
> +	sec_exec_control = vmcs_read32(SECONDARY_VM_EXEC_CONTROL);
> +
> +	if (set) {
> +		sec_exec_control &= ~SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
> +		sec_exec_control |= SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE;
> +	} else {
> +		sec_exec_control &= ~SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE;
> +		if (vm_need_virtualize_apic_accesses(vmx->vcpu.kvm))
> +			sec_exec_control |=
> +					SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
> +	}
> +	vmcs_write32(SECONDARY_VM_EXEC_CONTROL, sec_exec_control);
> +
> +	if (set) {
> +		for (msr = 0x800; msr <= 0x8ff; msr++)
> +			vmx_intercept_for_msr_read_x2apic(msr, false);
> +
> +		/* According SDM, in x2apic mode, the whole id reg is used.
> +		 * But in KVM, it only use the highest eight bits. Need to
> +		 * intercept it */
> +		vmx_intercept_for_msr_read_x2apic(0x802, true);
> +		/* TMCCT */
> +		vmx_intercept_for_msr_read_x2apic(0x839, true);
> +		/* TPR */
> +		vmx_intercept_for_msr_write_x2apic(0x808, false);
> +	}
Do it during vmx init, not here. Here you only need to call
vmx_set_msr_bitmap().

> +	vmx_set_msr_bitmap(vcpu);
> +}
> +
>  static void vmx_complete_atomic_exit(struct vcpu_vmx *vmx)
>  {
>  	u32 exit_intr_info;
> @@ -7366,6 +7516,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
>  	.enable_nmi_window = enable_nmi_window,
>  	.enable_irq_window = enable_irq_window,
>  	.update_cr8_intercept = update_cr8_intercept,
> +	.set_virtual_x2apic_mode = vmx_set_virtual_x2apic_mode,
>  
>  	.set_tss_addr = vmx_set_tss_addr,
>  	.get_tdp_level = get_ept_level,
> @@ -7419,11 +7570,19 @@ static int __init vmx_init(void)
>  	if (!vmx_msr_bitmap_legacy)
>  		goto out1;
>  
> +	vmx_msr_bitmap_legacy_x2apic =
> +				(unsigned long *)__get_free_page(GFP_KERNEL);
> +	if (!vmx_msr_bitmap_legacy_x2apic)
> +		goto out2;
>  
>  	vmx_msr_bitmap_longmode = (unsigned long *)__get_free_page(GFP_KERNEL);
>  	if (!vmx_msr_bitmap_longmode)
> -		goto out2;
> +		goto out3;
>  
> +	vmx_msr_bitmap_longmode_x2apic =
> +				(unsigned long *)__get_free_page(GFP_KERNEL);
> +	if (!vmx_msr_bitmap_longmode_x2apic)
> +		goto out4;
>  
>  	/*
>  	 * Allow direct access to the PC debug port (it is often used for I/O
> @@ -7456,6 +7615,11 @@ static int __init vmx_init(void)
>  	vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_ESP, false);
>  	vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_EIP, false);
>  
> +	memcpy(vmx_msr_bitmap_legacy_x2apic,
> +			vmx_msr_bitmap_legacy, PAGE_SIZE);
> +	memcpy(vmx_msr_bitmap_longmode_x2apic,
> +			vmx_msr_bitmap_longmode, PAGE_SIZE);
> +
>  	if (enable_ept) {
>  		kvm_mmu_set_mask_ptes(0ull,
>  			(enable_ept_ad_bits) ? VMX_EPT_ACCESS_BIT : 0ull,
> @@ -7468,8 +7632,10 @@ static int __init vmx_init(void)
>  
>  	return 0;
>  
> -out3:
> +out4:
>  	free_page((unsigned long)vmx_msr_bitmap_longmode);
> +out3:
> +	free_page((unsigned long)vmx_msr_bitmap_legacy_x2apic);
>  out2:
>  	free_page((unsigned long)vmx_msr_bitmap_legacy);
>  out1:
> @@ -7481,6 +7647,8 @@ out:
>  
>  static void __exit vmx_exit(void)
>  {
> +	free_page((unsigned long)vmx_msr_bitmap_legacy_x2apic);
> +	free_page((unsigned long)vmx_msr_bitmap_longmode_x2apic);
>  	free_page((unsigned long)vmx_msr_bitmap_legacy);
>  	free_page((unsigned long)vmx_msr_bitmap_longmode);
>  	free_page((unsigned long)vmx_io_bitmap_b);
> -- 
> 1.7.1

--
			Gleb.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [PATCH v11 2/3] x86, apicv: add virtual x2apic support
  2013-01-17 13:22   ` Gleb Natapov
@ 2013-01-18  1:49     ` Zhang, Yang Z
  0 siblings, 0 replies; 34+ messages in thread
From: Zhang, Yang Z @ 2013-01-18  1:49 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: kvm, Shan, Haitao, mtosatti, Zhang, Xiantao, Tian, Kevin

Gleb Natapov wrote on 2013-01-17:
> On Wed, Jan 16, 2013 at 06:21:11PM +0800, Yang Zhang wrote:
>> From: Yang Zhang <yang.z.zhang@Intel.com>
>> @@ -6103,6 +6206,53 @@ static void update_cr8_intercept(struct kvm_vcpu
> *vcpu, int tpr, int irr)
>>  	vmcs_write32(TPR_THRESHOLD, irr);
>>  }
>> +static void vmx_set_virtual_x2apic_mode(struct kvm_vcpu *vcpu, bool
>> set) +{ +	u32 exec_control, sec_exec_control; +	int msr; +	struct
>> vcpu_vmx *vmx = to_vmx(vcpu); + +	/* There is not point to enable
>> virtualize x2apic without enable +	 * apicv*/ +	if
>> (!cpu_has_vmx_virtualize_x2apic_mode() || !enable_apicv_reg) +		return;
>> + +	if (set) { +		exec_control =
>> vmcs_read32(CPU_BASED_VM_EXEC_CONTROL); +		/* virtualize x2apic mode
>> relies on tpr shadow */ +		if (!(exec_control & CPU_BASED_TPR_SHADOW))
>> +			return; +	} + +	sec_exec_control =
>> vmcs_read32(SECONDARY_VM_EXEC_CONTROL); + +	if (set) {
>> +		sec_exec_control &= ~SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
>> +		sec_exec_control |= SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE; +	} else
>> { +		sec_exec_control &= ~SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE; +		if
>> (vm_need_virtualize_apic_accesses(vmx->vcpu.kvm)) +			sec_exec_control
>> |= +					SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES; +	}
>> +	vmcs_write32(SECONDARY_VM_EXEC_CONTROL, sec_exec_control); + +	if
>> (set) { +		for (msr = 0x800; msr <= 0x8ff; msr++)
>> +			vmx_intercept_for_msr_read_x2apic(msr, false); + +		/* According
>> SDM, in x2apic mode, the whole id reg is used. +		 * But in KVM, it
>> only use the highest eight bits. Need to +		 * intercept it */
>> +		vmx_intercept_for_msr_read_x2apic(0x802, true); +		/* TMCCT */
>> +		vmx_intercept_for_msr_read_x2apic(0x839, true); +		/* TPR */
>> +		vmx_intercept_for_msr_write_x2apic(0x808, false); +	}
> Do it during vmx init, not here. Here you only need to call
> vmx_set_msr_bitmap().
Sure. It is more reasonable. 

>> +	vmx_set_msr_bitmap(vcpu);
>> +}
>> +
>>  static void vmx_complete_atomic_exit(struct vcpu_vmx *vmx) { 	u32
>>  exit_intr_info; @@ -7366,6 +7516,7 @@ static struct kvm_x86_ops
>>  vmx_x86_ops = { 	.enable_nmi_window = enable_nmi_window,
>>  	.enable_irq_window = enable_irq_window, 	.update_cr8_intercept =
>>  update_cr8_intercept,
>> +	.set_virtual_x2apic_mode = vmx_set_virtual_x2apic_mode,
>> 
>>  	.set_tss_addr = vmx_set_tss_addr, 	.get_tdp_level = get_ept_level, @@
>>  -7419,11 +7570,19 @@ static int __init vmx_init(void) 	if
>>  (!vmx_msr_bitmap_legacy) 		goto out1;
>> +	vmx_msr_bitmap_legacy_x2apic =
>> +				(unsigned long *)__get_free_page(GFP_KERNEL);
>> +	if (!vmx_msr_bitmap_legacy_x2apic)
>> +		goto out2;
>> 
>>  	vmx_msr_bitmap_longmode = (unsigned long
>>  *)__get_free_page(GFP_KERNEL); 	if (!vmx_msr_bitmap_longmode)
>> -		goto out2;
>> +		goto out3;
>> 
>> +	vmx_msr_bitmap_longmode_x2apic =
>> +				(unsigned long *)__get_free_page(GFP_KERNEL);
>> +	if (!vmx_msr_bitmap_longmode_x2apic)
>> +		goto out4;
>> 
>>  	/* 	 * Allow direct access to the PC debug port (it is often used for
>>  I/O @@ -7456,6 +7615,11 @@ static int __init vmx_init(void)
>>  	vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_ESP, false);
>>  	vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_EIP, false);
>> +	memcpy(vmx_msr_bitmap_legacy_x2apic,
>> +			vmx_msr_bitmap_legacy, PAGE_SIZE);
>> +	memcpy(vmx_msr_bitmap_longmode_x2apic,
>> +			vmx_msr_bitmap_longmode, PAGE_SIZE);
>> +
>>  	if (enable_ept) {
>>  		kvm_mmu_set_mask_ptes(0ull,
>>  			(enable_ept_ad_bits) ? VMX_EPT_ACCESS_BIT : 0ull,
>> @@ -7468,8 +7632,10 @@ static int __init vmx_init(void)
>> 
>>  	return 0;
>> -out3:
>> +out4:
>>  	free_page((unsigned long)vmx_msr_bitmap_longmode);
>> +out3:
>> +	free_page((unsigned long)vmx_msr_bitmap_legacy_x2apic);
>>  out2:
>>  	free_page((unsigned long)vmx_msr_bitmap_legacy);
>>  out1:
>> @@ -7481,6 +7647,8 @@ out:
>> 
>>  static void __exit vmx_exit(void)
>>  {
>> +	free_page((unsigned long)vmx_msr_bitmap_legacy_x2apic);
>> +	free_page((unsigned long)vmx_msr_bitmap_longmode_x2apic);
>>  	free_page((unsigned long)vmx_msr_bitmap_legacy);
>>  	free_page((unsigned long)vmx_msr_bitmap_longmode);
>>  	free_page((unsigned long)vmx_io_bitmap_b);
>> --
>> 1.7.1
> 
> --
> 			Gleb.


Best regards,
Yang



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v11 3/3] x86, apicv: add virtual interrupt delivery support
  2013-01-17  1:26   ` Zhang, Yang Z
@ 2013-01-20 12:51     ` Gleb Natapov
  2013-01-21  0:49       ` Zhang, Yang Z
  0 siblings, 1 reply; 34+ messages in thread
From: Gleb Natapov @ 2013-01-20 12:51 UTC (permalink / raw)
  To: Zhang, Yang Z; +Cc: kvm, Shan, Haitao, mtosatti, Zhang, Xiantao, Tian, Kevin

On Thu, Jan 17, 2013 at 01:26:03AM +0000, Zhang, Yang Z wrote:
> Previous patch is stale. Resend the new patch.
> The only change is clear EOI and SELF-IPI reg in msr bitmap when vid is enabled.
> 
> ------------------------
> From: Yang Zhang <yang.z.zhang@Intel.com>
> 
> Virtual interrupt delivery avoids KVM to inject vAPIC interrupts
> manually, which is fully taken care of by the hardware. This needs
> some special awareness into existing interrupr injection path:
> 
> - for pending interrupt, instead of direct injection, we may need
>   update architecture specific indicators before resuming to guest.
> 
> - A pending interrupt, which is masked by ISR, should be also
>   considered in above update action, since hardware will decide
>   when to inject it at right time. Current has_interrupt and
>   get_interrupt only returns a valid vector from injection p.o.v.
> 
> Signed-off-by: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
> ---
>  arch/ia64/kvm/lapic.h           |    6 ++
>  arch/x86/include/asm/kvm_host.h |    7 ++
>  arch/x86/include/asm/vmx.h      |   11 +++
>  arch/x86/kvm/irq.c              |   56 +++++++++++-
>  arch/x86/kvm/lapic.c            |   69 ++++++++------
>  arch/x86/kvm/lapic.h            |   23 +++++
>  arch/x86/kvm/svm.c              |   25 +++++
>  arch/x86/kvm/vmx.c              |  188 +++++++++++++++++++++++++++++++++++++--
>  arch/x86/kvm/x86.c              |   11 ++-
>  arch/x86/kvm/x86.h              |    2 +
>  include/linux/kvm_host.h        |    3 +
>  virt/kvm/ioapic.c               |   38 ++++++++
>  virt/kvm/ioapic.h               |    1 +
>  virt/kvm/irq_comm.c             |   25 +++++
>  virt/kvm/kvm_main.c             |    5 +
>  15 files changed, 425 insertions(+), 45 deletions(-)
> 
> diff --git a/arch/ia64/kvm/lapic.h b/arch/ia64/kvm/lapic.h
> index c5f92a9..c3e2935 100644
> --- a/arch/ia64/kvm/lapic.h
> +++ b/arch/ia64/kvm/lapic.h
> @@ -27,4 +27,10 @@ int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq);
>  #define kvm_apic_present(x) (true)
>  #define kvm_lapic_enabled(x) (true)
>  
> +static inline bool kvm_apic_vid_enabled(void)
> +{
> +	/* IA64 has no apicv supporting, do nothing here */
> +	return false;
> +}
> +
>  #endif
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 35aa8e6..d90bfa8 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -697,6 +697,12 @@ struct kvm_x86_ops {
>  	void (*enable_nmi_window)(struct kvm_vcpu *vcpu);
>  	void (*enable_irq_window)(struct kvm_vcpu *vcpu);
>  	void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);
> +	int (*has_virtual_interrupt_delivery)(void);
> +	void (*update_apic_irq)(struct kvm_vcpu *vcpu, int max_irr);
> +	void (*update_eoi_exitmap)(struct kvm_vcpu *vcpu);
> +	void (*set_svi)(int isr);
> +	void (*check_ioapic_entry)(struct kvm_vcpu *vcpu,
> +				   struct kvm_lapic_irq *irq);
>  	void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
>  	int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
>  	int (*get_tdp_level)(void);
> @@ -992,6 +998,7 @@ int kvm_age_hva(struct kvm *kvm, unsigned long hva);
>  int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
>  void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
>  int cpuid_maxphyaddr(struct kvm_vcpu *vcpu);
> +int kvm_cpu_has_injectable_intr(struct kvm_vcpu *v);
>  int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu);
>  int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu);
>  int kvm_cpu_get_interrupt(struct kvm_vcpu *v);
> diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
> index 0a54df0..694586c 100644
> --- a/arch/x86/include/asm/vmx.h
> +++ b/arch/x86/include/asm/vmx.h
> @@ -62,6 +62,7 @@
>  #define EXIT_REASON_MCE_DURING_VMENTRY  41
>  #define EXIT_REASON_TPR_BELOW_THRESHOLD 43
>  #define EXIT_REASON_APIC_ACCESS         44
> +#define EXIT_REASON_EOI_INDUCED         45
>  #define EXIT_REASON_EPT_VIOLATION       48
>  #define EXIT_REASON_EPT_MISCONFIG       49
>  #define EXIT_REASON_WBINVD              54
> @@ -144,6 +145,7 @@
>  #define SECONDARY_EXEC_WBINVD_EXITING		0x00000040
>  #define SECONDARY_EXEC_UNRESTRICTED_GUEST	0x00000080
>  #define SECONDARY_EXEC_APIC_REGISTER_VIRT       0x00000100
> +#define SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY    0x00000200
>  #define SECONDARY_EXEC_PAUSE_LOOP_EXITING	0x00000400
>  #define SECONDARY_EXEC_ENABLE_INVPCID		0x00001000
>  
> @@ -181,6 +183,7 @@ enum vmcs_field {
>  	GUEST_GS_SELECTOR               = 0x0000080a,
>  	GUEST_LDTR_SELECTOR             = 0x0000080c,
>  	GUEST_TR_SELECTOR               = 0x0000080e,
> +	GUEST_INTR_STATUS               = 0x00000810,
>  	HOST_ES_SELECTOR                = 0x00000c00,
>  	HOST_CS_SELECTOR                = 0x00000c02,
>  	HOST_SS_SELECTOR                = 0x00000c04,
> @@ -208,6 +211,14 @@ enum vmcs_field {
>  	APIC_ACCESS_ADDR_HIGH		= 0x00002015,
>  	EPT_POINTER                     = 0x0000201a,
>  	EPT_POINTER_HIGH                = 0x0000201b,
> +	EOI_EXIT_BITMAP0                = 0x0000201c,
> +	EOI_EXIT_BITMAP0_HIGH           = 0x0000201d,
> +	EOI_EXIT_BITMAP1                = 0x0000201e,
> +	EOI_EXIT_BITMAP1_HIGH           = 0x0000201f,
> +	EOI_EXIT_BITMAP2                = 0x00002020,
> +	EOI_EXIT_BITMAP2_HIGH           = 0x00002021,
> +	EOI_EXIT_BITMAP3                = 0x00002022,
> +	EOI_EXIT_BITMAP3_HIGH           = 0x00002023,
>  	GUEST_PHYSICAL_ADDRESS          = 0x00002400,
>  	GUEST_PHYSICAL_ADDRESS_HIGH     = 0x00002401,
>  	VMCS_LINK_POINTER               = 0x00002800,
> diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
> index b111aee..45a254e 100644
> --- a/arch/x86/kvm/irq.c
> +++ b/arch/x86/kvm/irq.c
> @@ -38,6 +38,38 @@ int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
>  EXPORT_SYMBOL(kvm_cpu_has_pending_timer);
>  
>  /*
> + * check if there is pending interrupt from
> + * non-APIC source without intack.
> + */
> +static int kvm_cpu_has_extint(struct kvm_vcpu *v)
> +{
> +	if (kvm_apic_accept_pic_intr(v))
> +		return pic_irqchip(v->kvm)->output;	/* PIC */
> +	else
> +		return 0;
> +}
> +
> +/*
> + * check if there is injectable interrupt:
> + * when virtual interrupt delivery enabled,
> + * interrupt from apic will handled by hardware,
> + * we don't need to check it here.
> + */
> +int kvm_cpu_has_injectable_intr(struct kvm_vcpu *v)
> +{
> +	if (!irqchip_in_kernel(v->kvm))
> +		return v->arch.interrupt.pending;
> +
> +	if (kvm_cpu_has_extint(v))
> +		return 1;
> +
> +	if (kvm_apic_vid_enabled())
> +		return 0;
> +
> +	return kvm_apic_has_interrupt(v) != -1; /* LAPIC */
> +}
> +
> +/*
>   * check if there is pending interrupt without
>   * intack.
>   */
> @@ -46,27 +78,41 @@ int kvm_cpu_has_interrupt(struct kvm_vcpu *v)
>  	if (!irqchip_in_kernel(v->kvm))
>  		return v->arch.interrupt.pending;
>  
> -	if (kvm_apic_accept_pic_intr(v) && pic_irqchip(v->kvm)->output)
> -		return pic_irqchip(v->kvm)->output;	/* PIC */
> +	if (kvm_cpu_has_extint(v))
> +		return 1;
>  
>  	return kvm_apic_has_interrupt(v) != -1;	/* LAPIC */
>  }
>  EXPORT_SYMBOL_GPL(kvm_cpu_has_interrupt);
>  
>  /*
> + * Read pending interrupt(from non-APIC source)
> + * vector and intack.
> + */
> +static int kvm_cpu_get_extint(struct kvm_vcpu *v)
> +{
> +	if (kvm_cpu_has_extint(v))
> +		return kvm_pic_read_irq(v->kvm); /* PIC */
> +	return -1;
> +}
> +
> +/*
>   * Read pending interrupt vector and intack.
>   */
>  int kvm_cpu_get_interrupt(struct kvm_vcpu *v)
>  {
> +	int vector;
> +
>  	if (!irqchip_in_kernel(v->kvm))
>  		return v->arch.interrupt.nr;
>  
> -	if (kvm_apic_accept_pic_intr(v) && pic_irqchip(v->kvm)->output)
> -		return kvm_pic_read_irq(v->kvm);	/* PIC */
> +	vector = kvm_cpu_get_extint(v);
> +
> +	if (kvm_apic_vid_enabled() || vector != -1)
> +		return vector;			/* PIC */
>  
>  	return kvm_get_apic_interrupt(v);	/* APIC */
>  }
> -EXPORT_SYMBOL_GPL(kvm_cpu_get_interrupt);
>  
>  void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu)
>  {
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index f39aee3..d96515e 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -145,23 +145,6 @@ static inline int kvm_apic_id(struct kvm_lapic *apic)
>  	return (kvm_apic_get_reg(apic, APIC_ID) >> 24) & 0xff;
>  }
>  
> -static inline u16 apic_cluster_id(struct kvm_apic_map *map, u32 ldr)
> -{
> -	u16 cid;
> -	ldr >>= 32 - map->ldr_bits;
> -	cid = (ldr >> map->cid_shift) & map->cid_mask;
> -
> -	BUG_ON(cid >= ARRAY_SIZE(map->logical_map));
> -
> -	return cid;
> -}
> -
> -static inline u16 apic_logical_id(struct kvm_apic_map *map, u32 ldr)
> -{
> -	ldr >>= (32 - map->ldr_bits);
> -	return ldr & map->lid_mask;
> -}
> -
>  static void recalculate_apic_map(struct kvm *kvm)
>  {
>  	struct kvm_apic_map *new, *old = NULL;
> @@ -225,6 +208,8 @@ out:
>  
>  	if (old)
>  		kfree_rcu(old, rcu);
> +
> +	ioapic_update_eoi_exitmap(kvm);
>  }
>  
>  static inline void kvm_apic_set_id(struct kvm_lapic *apic, u8 id)
> @@ -340,6 +325,8 @@ static inline int apic_find_highest_irr(struct kvm_lapic *apic)
>  {
>  	int result;
>  
> +	/* Note that irr_pending is just a hint. It will be always
> +	 * true with virtual interrupt delivery enabled. */
This is not correct format for multi-line comments.

>  	if (!apic->irr_pending)
>  		return -1;
>  
> @@ -456,6 +443,8 @@ static void pv_eoi_clr_pending(struct kvm_vcpu *vcpu)
>  static inline int apic_find_highest_isr(struct kvm_lapic *apic)
>  {
>  	int result;
> +
> +	/* Note that isr_count is always 1 with vid enabled*/
Space before *.

>  	if (!apic->isr_count)
>  		return -1;
>  	if (likely(apic->highest_isr_cache != -1))
> @@ -735,6 +724,19 @@ int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1, struct kvm_vcpu *vcpu2)
>  	return vcpu1->arch.apic_arb_prio - vcpu2->arch.apic_arb_prio;
>  }
>  
> +static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
> +{
> +	if (!(kvm_apic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI) &&
> +	    kvm_ioapic_handles_vector(apic->vcpu->kvm, vector)) {
> +		int trigger_mode;
> +		if (apic_test_vector(vector, apic->regs + APIC_TMR))
> +			trigger_mode = IOAPIC_LEVEL_TRIG;
> +		else
> +			trigger_mode = IOAPIC_EDGE_TRIG;
> +		kvm_ioapic_update_eoi(apic->vcpu->kvm, vector, trigger_mode);
> +	}
> +}
> +
>  static int apic_set_eoi(struct kvm_lapic *apic)
>  {
>  	int vector = apic_find_highest_isr(apic);
> @@ -751,19 +753,26 @@ static int apic_set_eoi(struct kvm_lapic *apic)
>  	apic_clear_isr(vector, apic);
>  	apic_update_ppr(apic);
>  
> -	if (!(kvm_apic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI) &&
> -	    kvm_ioapic_handles_vector(apic->vcpu->kvm, vector)) {
> -		int trigger_mode;
> -		if (apic_test_vector(vector, apic->regs + APIC_TMR))
> -			trigger_mode = IOAPIC_LEVEL_TRIG;
> -		else
> -			trigger_mode = IOAPIC_EDGE_TRIG;
> -		kvm_ioapic_update_eoi(apic->vcpu->kvm, vector, trigger_mode);
> -	}
> +	kvm_ioapic_send_eoi(apic, vector);
>  	kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
>  	return vector;
>  }
>  
> +/*
> + * this interface assumes a trap-like exit, which has already finished
> + * desired side effect including vISR and vPPR update.
> + */
> +void kvm_apic_set_eoi_accelerated(struct kvm_vcpu *vcpu, int vector)
> +{
> +	struct kvm_lapic *apic = vcpu->arch.apic;
> +
> +	trace_kvm_eoi(apic, vector);
> +
> +	kvm_ioapic_send_eoi(apic, vector);
> +	kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
> +}
> +EXPORT_SYMBOL_GPL(kvm_apic_set_eoi_accelerated);
> +
>  static void apic_send_ipi(struct kvm_lapic *apic)
>  {
>  	u32 icr_low = kvm_apic_get_reg(apic, APIC_ICR);
> @@ -1374,8 +1383,8 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu)
>  		apic_set_reg(apic, APIC_ISR + 0x10 * i, 0);
>  		apic_set_reg(apic, APIC_TMR + 0x10 * i, 0);
>  	}
> -	apic->irr_pending = false;
> -	apic->isr_count = 0;
> +	apic->irr_pending = kvm_apic_vid_enabled();
> +	apic->isr_count = kvm_apic_vid_enabled();
>  	apic->highest_isr_cache = -1;
>  	update_divide_count(apic);
>  	atomic_set(&apic->lapic_timer.pending, 0);
> @@ -1590,8 +1599,10 @@ void kvm_apic_post_state_restore(struct kvm_vcpu *vcpu,
>  	update_divide_count(apic);
>  	start_apic_timer(apic);
>  	apic->irr_pending = true;
> -	apic->isr_count = count_vectors(apic->regs + APIC_ISR);
> +	apic->isr_count = kvm_apic_vid_enabled() ?
> +				1 : count_vectors(apic->regs + APIC_ISR);
>  	apic->highest_isr_cache = -1;
> +	kvm_x86_ops->set_svi(apic_find_highest_isr(apic));
>  	kvm_make_request(KVM_REQ_EVENT, vcpu);
>  }
>  
> diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
> index 22a5397..109e37c 100644
> --- a/arch/x86/kvm/lapic.h
> +++ b/arch/x86/kvm/lapic.h
> @@ -65,6 +65,7 @@ u64 kvm_get_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu);
>  void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu, u64 data);
>  
>  void kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset);
> +void kvm_apic_set_eoi_accelerated(struct kvm_vcpu *vcpu, int vector);
>  
>  void kvm_lapic_set_vapic_addr(struct kvm_vcpu *vcpu, gpa_t vapic_addr);
>  void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu);
> @@ -131,4 +132,26 @@ static inline int apic_x2apic_mode(struct kvm_lapic *apic)
>  	return apic->vcpu->arch.apic_base & X2APIC_ENABLE;
>  }
>  
> +static inline bool kvm_apic_vid_enabled(void)
> +{
> +	return kvm_x86_ops->has_virtual_interrupt_delivery();
> +}
> +
> +static inline u16 apic_cluster_id(struct kvm_apic_map *map, u32 ldr)
> +{
> +	u16 cid;
> +	ldr >>= 32 - map->ldr_bits;
> +	cid = (ldr >> map->cid_shift) & map->cid_mask;
> +
> +	BUG_ON(cid >= ARRAY_SIZE(map->logical_map));
> +
> +	return cid;
> +}
> +
> +static inline u16 apic_logical_id(struct kvm_apic_map *map, u32 ldr)
> +{
> +	ldr >>= (32 - map->ldr_bits);
> +	return ldr & map->lid_mask;
> +}
> +
>  #endif
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 38407e9..53c31cb 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -3576,6 +3576,27 @@ static void svm_set_virtual_x2apic_mode(struct kvm_vcpu *vcpu, bool set)
>  	return;
>  }
>  
> +static int svm_has_virtual_interrupt_delivery(void)
> +{
> +	return 0;
> +}
> +
> +static void svm_update_eoi_exitmap(struct kvm_vcpu *vcpu)
> +{
> +	return;
> +}
> +
> +static void svm_set_svi(int isr)
> +{
> +	return;
> +}
> +
> +static void svm_check_ioapic_entry(struct kvm_vcpu *vcpu,
> +				   struct kvm_lapic_irq *irq)
> +{
> +	return;
> +}
> +
>  static int svm_nmi_allowed(struct kvm_vcpu *vcpu)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
> @@ -4296,6 +4317,10 @@ static struct kvm_x86_ops svm_x86_ops = {
>  	.enable_irq_window = enable_irq_window,
>  	.update_cr8_intercept = update_cr8_intercept,
>  	.set_virtual_x2apic_mode = svm_set_virtual_x2apic_mode,
> +	.has_virtual_interrupt_delivery = svm_has_virtual_interrupt_delivery,
> +	.update_eoi_exitmap = svm_update_eoi_exitmap,
> +	.set_svi = svm_set_svi,
> +	.check_ioapic_entry = svm_check_ioapic_entry,
>  
>  	.set_tss_addr = svm_set_tss_addr,
>  	.get_tdp_level = get_npt_level,
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 526955d..b0de04f 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -84,8 +84,8 @@ module_param(vmm_exclusive, bool, S_IRUGO);
>  static bool __read_mostly fasteoi = 1;
>  module_param(fasteoi, bool, S_IRUGO);
>  
> -static bool __read_mostly enable_apicv_reg = 1;
> -module_param(enable_apicv_reg, bool, S_IRUGO);
> +static bool __read_mostly enable_apicv_reg_vid = 1;
> +module_param(enable_apicv_reg_vid, bool, S_IRUGO);
>  
>  /*
>   * If nested=1, nested virtualization is supported, i.e., guests may use
> @@ -433,6 +433,8 @@ struct vcpu_vmx {
>  
>  	bool rdtscp_enabled;
>  
> +	u64 eoi_exit_bitmap[4];
> +
>  	/* Support for a guest hypervisor (nested VMX) */
>  	struct nested_vmx nested;
>  };
> @@ -781,6 +783,12 @@ static inline bool cpu_has_vmx_apic_register_virt(void)
>  		SECONDARY_EXEC_APIC_REGISTER_VIRT;
>  }
>  
> +static inline bool cpu_has_vmx_virtual_intr_delivery(void)
> +{
> +	return vmcs_config.cpu_based_2nd_exec_ctrl &
> +		SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY;
> +}
> +
>  static inline bool cpu_has_vmx_flexpriority(void)
>  {
>  	return cpu_has_vmx_tpr_shadow() &&
> @@ -2570,7 +2578,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
>  			SECONDARY_EXEC_PAUSE_LOOP_EXITING |
>  			SECONDARY_EXEC_RDTSCP |
>  			SECONDARY_EXEC_ENABLE_INVPCID |
> -			SECONDARY_EXEC_APIC_REGISTER_VIRT;
> +			SECONDARY_EXEC_APIC_REGISTER_VIRT |
> +			SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY;
>  		if (adjust_vmx_controls(min2, opt2,
>  					MSR_IA32_VMX_PROCBASED_CTLS2,
>  					&_cpu_based_2nd_exec_control) < 0)
> @@ -2584,7 +2593,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
>  
>  	if (!(_cpu_based_exec_control & CPU_BASED_TPR_SHADOW))
>  		_cpu_based_2nd_exec_control &= ~(
> -				SECONDARY_EXEC_APIC_REGISTER_VIRT);
> +				SECONDARY_EXEC_APIC_REGISTER_VIRT |
> +				SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY);
>  
>  	if (_cpu_based_2nd_exec_control & SECONDARY_EXEC_ENABLE_EPT) {
>  		/* CR3 accesses and invlpg don't need to cause VM Exits when EPT
> @@ -2783,8 +2793,14 @@ static __init int hardware_setup(void)
>  	if (!cpu_has_vmx_ple())
>  		ple_gap = 0;
>  
> -	if (!cpu_has_vmx_apic_register_virt())
> -		enable_apicv_reg = 0;
> +	if (!cpu_has_vmx_apic_register_virt() ||
> +				!cpu_has_vmx_virtual_intr_delivery())
> +		enable_apicv_reg_vid = 0;
> +
> +	if (enable_apicv_reg_vid)
> +		kvm_x86_ops->update_cr8_intercept = NULL;
> +	else
> +		kvm_x86_ops->update_apic_irq = NULL;
>  
>  	if (nested)
>  		nested_vmx_setup_ctls_msrs();
> @@ -3948,8 +3964,9 @@ static u32 vmx_secondary_exec_control(struct vcpu_vmx *vmx)
>  		exec_control &= ~SECONDARY_EXEC_UNRESTRICTED_GUEST;
>  	if (!ple_gap)
>  		exec_control &= ~SECONDARY_EXEC_PAUSE_LOOP_EXITING;
> -	if (!enable_apicv_reg)
> -		exec_control &= ~SECONDARY_EXEC_APIC_REGISTER_VIRT;
> +	if (!enable_apicv_reg_vid)
> +		exec_control &= ~(SECONDARY_EXEC_APIC_REGISTER_VIRT |
> +				  SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY);
>  	exec_control &= ~SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE;
>  	return exec_control;
>  }
> @@ -3995,6 +4012,15 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
>  				vmx_secondary_exec_control(vmx));
>  	}
>  
> +	if (enable_apicv_reg_vid) {
> +		vmcs_write64(EOI_EXIT_BITMAP0, 0);
> +		vmcs_write64(EOI_EXIT_BITMAP1, 0);
> +		vmcs_write64(EOI_EXIT_BITMAP2, 0);
> +		vmcs_write64(EOI_EXIT_BITMAP3, 0);
> +
> +		vmcs_write16(GUEST_INTR_STATUS, 0);
> +	}
> +
>  	if (ple_gap) {
>  		vmcs_write32(PLE_GAP, ple_gap);
>  		vmcs_write32(PLE_WINDOW, ple_window);
> @@ -4912,6 +4938,16 @@ static int handle_apic_access(struct kvm_vcpu *vcpu)
>  	return emulate_instruction(vcpu, 0) == EMULATE_DONE;
>  }
>  
> +static int handle_apic_eoi_induced(struct kvm_vcpu *vcpu)
> +{
> +	unsigned long exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
> +	int vector = exit_qualification & 0xff;
> +
> +	/* EOI-induced VM exit is trap-like and thus no need to adjust IP */
> +	kvm_apic_set_eoi_accelerated(vcpu, vector);
> +	return 1;
> +}
> +
>  static int handle_apic_write(struct kvm_vcpu *vcpu)
>  {
>  	unsigned long exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
> @@ -5857,6 +5893,7 @@ static int (*const kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
>  	[EXIT_REASON_TPR_BELOW_THRESHOLD]     = handle_tpr_below_threshold,
>  	[EXIT_REASON_APIC_ACCESS]             = handle_apic_access,
>  	[EXIT_REASON_APIC_WRITE]              = handle_apic_write,
> +	[EXIT_REASON_EOI_INDUCED]             = handle_apic_eoi_induced,
>  	[EXIT_REASON_WBINVD]                  = handle_wbinvd,
>  	[EXIT_REASON_XSETBV]                  = handle_xsetbv,
>  	[EXIT_REASON_TASK_SWITCH]             = handle_task_switch,
> @@ -6214,7 +6251,7 @@ static void vmx_set_virtual_x2apic_mode(struct kvm_vcpu *vcpu, bool set)
>  
>  	/* There is not point to enable virtualize x2apic without enable
>  	 * apicv*/
Wrong format of multi-line comment.

> -	if (!cpu_has_vmx_virtualize_x2apic_mode() || !enable_apicv_reg)
> +	if (!cpu_has_vmx_virtualize_x2apic_mode() || !enable_apicv_reg_vid)
>  		return;
>  
>  	if (set) {
> @@ -6249,10 +6286,138 @@ static void vmx_set_virtual_x2apic_mode(struct kvm_vcpu *vcpu, bool set)
>  		vmx_intercept_for_msr_read_x2apic(0x839, true);
>  		/* TPR */
>  		vmx_intercept_for_msr_write_x2apic(0x808, false);
> +		/* EOI */
> +		vmx_intercept_for_msr_write_x2apic(0x80b, false);
> +		/* SELF-IPI */
> +		vmx_intercept_for_msr_write_x2apic(0x83f, false);
>  	}
>  	vmx_set_msr_bitmap(vcpu);
>  }
>  
> +static int vmx_has_virtual_interrupt_delivery(void)
> +{
> +	return enable_apicv_reg_vid;
> +}
> +
> +static void vmx_set_svi(int isr)
> +{
> +	u16 status;
> +	u8 old;
> +
> +	if (!enable_apicv_reg_vid)
> +		return;
> +
> +	if (isr == -1)
> +		isr = 0;
> +
> +	status = vmcs_read16(GUEST_INTR_STATUS);
> +	old = status >> 8;
> +	if (isr != old) {
> +		status &= 0xff;
> +		status |= isr << 8;
> +		vmcs_write16(GUEST_INTR_STATUS, status);
> +	}
> +}
> +
> +static void vmx_set_rvi(int vector)
> +{
> +	u16 status;
> +	u8 old;
> +
> +	status = vmcs_read16(GUEST_INTR_STATUS);
> +	old = (u8)status & 0xff;
> +	if ((u8)vector != old) {
> +		status &= ~0xff;
> +		status |= (u8)vector;
> +		vmcs_write16(GUEST_INTR_STATUS, status);
> +	}
> +}
> +
> +static void vmx_update_apic_irq(struct kvm_vcpu *vcpu, int max_irr)
> +{
> +	if (max_irr == -1)
> +		return;
> +
> +	vmx_set_rvi(max_irr);
> +}
> +
> +static void set_eoi_exitmap_one(struct kvm_vcpu *vcpu,
> +				u32 vector)
> +{
> +	struct vcpu_vmx *vmx = to_vmx(vcpu);
> +
> +	if (WARN_ONCE((vector > 255),
> +		"KVM VMX: vector (%d) out of range\n", vector))
> +		return;
> +
> +	__set_bit(vector, (unsigned long *)vmx->eoi_exit_bitmap);
> +}
> +
> +static void vmx_check_ioapic_entry(struct kvm_vcpu *vcpu,
> +				   struct kvm_lapic_irq *irq)
> +{
> +	struct kvm_lapic **dst;
> +	struct kvm_apic_map *map;
> +	unsigned long bitmap = 1;
> +	int i;
> +
> +	rcu_read_lock();
> +	map = rcu_dereference(vcpu->kvm->arch.apic_map);
> +
> +	if (unlikely(!map)) {
> +		set_eoi_exitmap_one(vcpu, irq->vector);
> +		goto out;
> +	}
> +
> +	if (irq->dest_mode == 0) { /* physical mode */
> +		if (irq->delivery_mode == APIC_DM_LOWEST ||
> +				irq->dest_id == 0xff) {
> +			set_eoi_exitmap_one(vcpu, irq->vector);
> +			goto out;
> +		}
> +		dst = &map->phys_map[irq->dest_id & 0xff];
> +	} else {
> +		u32 mda = irq->dest_id << (32 - map->ldr_bits);
> +
> +		dst = map->logical_map[apic_cluster_id(map, mda)];
> +
> +		bitmap = apic_logical_id(map, mda);
> +	}
> +
> +	for_each_set_bit(i, &bitmap, 16) {
> +		if (!dst[i])
> +			continue;
> +		if (dst[i]->vcpu == vcpu) {
> +			set_eoi_exitmap_one(vcpu, irq->vector);
> +			break;
> +		}
> +	}
> +
> +out:
> +	rcu_read_unlock();
> +}
The logic in this function belongs to lapic code. The only thing
that is specific to vmx in the function is setting of the bit in
vmx->eoi_exit_bitmap, but since eoi_exit_bitmap is calculated and
loaded during same vcpu entry we do not need vmx->eoi_exit_bitmap at
all. Declare it on a stack in vmx_update_eoi_exitmap() and pass it to
set_eoi_exitmap() and vmx_load_eoi_exitmap().

> +
> +static void vmx_load_eoi_exitmap(struct kvm_vcpu *vcpu)
> +{
> +	struct vcpu_vmx *vmx = to_vmx(vcpu);
> +
> +	vmcs_write64(EOI_EXIT_BITMAP0, vmx->eoi_exit_bitmap[0]);
> +	vmcs_write64(EOI_EXIT_BITMAP1, vmx->eoi_exit_bitmap[1]);
> +	vmcs_write64(EOI_EXIT_BITMAP2, vmx->eoi_exit_bitmap[2]);
> +	vmcs_write64(EOI_EXIT_BITMAP3, vmx->eoi_exit_bitmap[3]);
> +}
> +
> +static void vmx_update_eoi_exitmap(struct kvm_vcpu *vcpu)
> +{
> +	struct vcpu_vmx *vmx = to_vmx(vcpu);
> +
> +	/* clear eoi exit bitmap */
> +	memset(vmx->eoi_exit_bitmap, 0, 32);
> +
> +	set_eoi_exitmap(vcpu);
> +	vmx_load_eoi_exitmap(vcpu);
> +}
> +
>  static void vmx_complete_atomic_exit(struct vcpu_vmx *vmx)
>  {
>  	u32 exit_intr_info;
> @@ -7517,6 +7682,11 @@ static struct kvm_x86_ops vmx_x86_ops = {
>  	.enable_irq_window = enable_irq_window,
>  	.update_cr8_intercept = update_cr8_intercept,
>  	.set_virtual_x2apic_mode = vmx_set_virtual_x2apic_mode,
> +	.has_virtual_interrupt_delivery = vmx_has_virtual_interrupt_delivery,
> +	.check_ioapic_entry = vmx_check_ioapic_entry,
> +	.update_apic_irq = vmx_update_apic_irq,
> +	.update_eoi_exitmap = vmx_update_eoi_exitmap,
> +	.set_svi = vmx_set_svi,
>  
>  	.set_tss_addr = vmx_set_tss_addr,
>  	.get_tdp_level = get_ept_level,
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 1c9c834..1f5b5ac 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5527,7 +5527,7 @@ static void inject_pending_event(struct kvm_vcpu *vcpu)
>  			vcpu->arch.nmi_injected = true;
>  			kvm_x86_ops->set_nmi(vcpu);
>  		}
> -	} else if (kvm_cpu_has_interrupt(vcpu)) {
> +	} else if (kvm_cpu_has_injectable_intr(vcpu)) {
>  		if (kvm_x86_ops->interrupt_allowed(vcpu)) {
>  			kvm_queue_interrupt(vcpu, kvm_cpu_get_interrupt(vcpu),
>  					    false);
> @@ -5648,6 +5648,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  			kvm_handle_pmu_event(vcpu);
>  		if (kvm_check_request(KVM_REQ_PMI, vcpu))
>  			kvm_deliver_pmi(vcpu);
> +		if (kvm_check_request(KVM_REQ_EOIBITMAP, vcpu))
> +			kvm_x86_ops->update_eoi_exitmap(vcpu);
>  	}
>  
>  	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
> @@ -5656,10 +5658,15 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  		/* enable NMI/IRQ window open exits if needed */
>  		if (vcpu->arch.nmi_pending)
>  			kvm_x86_ops->enable_nmi_window(vcpu);
> -		else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
> +		else if (kvm_cpu_has_injectable_intr(vcpu) || req_int_win)
>  			kvm_x86_ops->enable_irq_window(vcpu);
>  
>  		if (kvm_lapic_enabled(vcpu)) {
> +			/* update architecture specific hints for APIC
> +			 * virtual interrupt delivery */
Multi-line comment again.

> +			if (kvm_x86_ops->update_apic_irq)
> +				kvm_x86_ops->update_apic_irq(vcpu,
> +					      kvm_lapic_find_highest_irr(vcpu));
>  			update_cr8_intercept(vcpu);
>  			kvm_lapic_sync_to_vapic(vcpu);
>  		}
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index e224f7a..e8e7b6a 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -122,6 +122,8 @@ int kvm_write_guest_virt_system(struct x86_emulate_ctxt *ctxt,
>  	gva_t addr, void *val, unsigned int bytes,
>  	struct x86_exception *exception);
>  
> +void set_eoi_exitmap(struct kvm_vcpu *vcpu);
> +
>  extern u64 host_xcr0;
>  
>  extern struct static_key kvm_no_apic_vcpu;
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index cbe0d68..bc0e261 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -122,6 +122,7 @@ static inline bool is_error_page(struct page *page)
>  #define KVM_REQ_WATCHDOG          18
>  #define KVM_REQ_MASTERCLOCK_UPDATE 19
>  #define KVM_REQ_MCLOCK_INPROGRESS 20
> +#define KVM_REQ_EOIBITMAP         21
>  
>  #define KVM_USERSPACE_IRQ_SOURCE_ID		0
>  #define KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID	1
> @@ -537,6 +538,7 @@ void kvm_put_guest_fpu(struct kvm_vcpu *vcpu);
>  void kvm_flush_remote_tlbs(struct kvm *kvm);
>  void kvm_reload_remote_mmus(struct kvm *kvm);
>  void kvm_make_mclock_inprogress_request(struct kvm *kvm);
> +void kvm_make_update_eoibitmap_request(struct kvm *kvm);
>  
>  long kvm_arch_dev_ioctl(struct file *filp,
>  			unsigned int ioctl, unsigned long arg);
> @@ -690,6 +692,7 @@ int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level);
>  int kvm_set_irq_inatomic(struct kvm *kvm, int irq_source_id, u32 irq, int level);
>  int kvm_set_msi(struct kvm_kernel_irq_routing_entry *irq_entry, struct kvm *kvm,
>  		int irq_source_id, int level);
> +bool kvm_irq_has_notifier(struct kvm *kvm, unsigned irqchip, unsigned pin);
>  void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin);
>  void kvm_register_irq_ack_notifier(struct kvm *kvm,
>  				   struct kvm_irq_ack_notifier *kian);
> diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
> index f3abbef..d747263 100644
> --- a/virt/kvm/ioapic.c
> +++ b/virt/kvm/ioapic.c
> @@ -35,6 +35,7 @@
>  #include <linux/hrtimer.h>
>  #include <linux/io.h>
>  #include <linux/slab.h>
> +#include <linux/export.h>
>  #include <asm/processor.h>
>  #include <asm/page.h>
>  #include <asm/current.h>
> @@ -115,6 +116,42 @@ static void update_handled_vectors(struct kvm_ioapic *ioapic)
>  	smp_wmb();
>  }
>  
> +void set_eoi_exitmap(struct kvm_vcpu *vcpu)
> +{
This function is exported from the file and need to have more unique
name. kvm_ioapic_calculate_eoi_exitmap() for instance.

> +	struct kvm_ioapic *ioapic = vcpu->kvm->arch.vioapic;
> +	union kvm_ioapic_redirect_entry *e;
> +	struct kvm_lapic_irq irqe;
> +	int index;
> +
> +	spin_lock(&ioapic->lock);
> +	/* traverse ioapic entry to set eoi exit bitmap*/
> +	for (index = 0; index < IOAPIC_NUM_PINS; index++) {
> +		e = &ioapic->redirtbl[index];
> +		if (!e->fields.mask &&
> +			(e->fields.trig_mode == IOAPIC_LEVEL_TRIG ||
> +			 kvm_irq_has_notifier(ioapic->kvm, KVM_IRQCHIP_IOAPIC,
> +				 index))) {
> +			irqe.dest_id = e->fields.dest_id;
> +			irqe.vector = e->fields.vector;
> +			irqe.dest_mode = e->fields.dest_mode;
> +			irqe.delivery_mode = e->fields.delivery_mode << 8;
> +			kvm_x86_ops->check_ioapic_entry(vcpu, &irqe);
> +
> +		}
> +	}
> +	spin_unlock(&ioapic->lock);
> +}
> +EXPORT_SYMBOL_GPL(set_eoi_exitmap);
> +
> +void ioapic_update_eoi_exitmap(struct kvm *kvm)
> +{
Same. Add kmv_ here.

> +	struct kvm_ioapic *ioapic = kvm->arch.vioapic;
> +
> +	if (!kvm_apic_vid_enabled() || !ioapic)
> +		return;
> +	kvm_make_update_eoibitmap_request(kvm);
> +}
> +
>  static void ioapic_write_indirect(struct kvm_ioapic *ioapic, u32 val)
>  {
>  	unsigned index;
> @@ -156,6 +193,7 @@ static void ioapic_write_indirect(struct kvm_ioapic *ioapic, u32 val)
>  		if (e->fields.trig_mode == IOAPIC_LEVEL_TRIG
>  		    && ioapic->irr & (1 << index))
>  			ioapic_service(ioapic, index);
> +		ioapic_update_eoi_exitmap(ioapic->kvm);
ioapic_write_indirect() is called under ioapic->lock,
ioapic_update_eoi_exitmap() takes the same lock. Have you tested the
code?

>  		break;
>  	}
>  }

kvm_set_ioapic() need to call ioapic_update_eoi_exitmap() too.

> diff --git a/virt/kvm/ioapic.h b/virt/kvm/ioapic.h
> index a30abfe..e8d67cf 100644
> --- a/virt/kvm/ioapic.h
> +++ b/virt/kvm/ioapic.h
> @@ -82,5 +82,6 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
>  		struct kvm_lapic_irq *irq);
>  int kvm_get_ioapic(struct kvm *kvm, struct kvm_ioapic_state *state);
>  int kvm_set_ioapic(struct kvm *kvm, struct kvm_ioapic_state *state);
> +void ioapic_update_eoi_exitmap(struct kvm *kvm);
>  
>  #endif
> diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
> index 656fa45..84195f8 100644
> --- a/virt/kvm/irq_comm.c
> +++ b/virt/kvm/irq_comm.c
> @@ -22,6 +22,7 @@
>  
>  #include <linux/kvm_host.h>
>  #include <linux/slab.h>
> +#include <linux/export.h>
>  #include <trace/events/kvm.h>
>  
>  #include <asm/msidef.h>
> @@ -237,6 +238,28 @@ int kvm_set_irq_inatomic(struct kvm *kvm, int irq_source_id, u32 irq, int level)
>  	return ret;
>  }
>  
> +bool kvm_irq_has_notifier(struct kvm *kvm, unsigned irqchip, unsigned pin)
> +{
> +	struct kvm_irq_ack_notifier *kian;
> +	struct hlist_node *n;
> +	int gsi;
> +
> +	rcu_read_lock();
> +	gsi = rcu_dereference(kvm->irq_routing)->chip[irqchip][pin];
> +	if (gsi != -1)
> +		hlist_for_each_entry_rcu(kian, n, &kvm->irq_ack_notifier_list,
> +					 link)
> +			if (kian->gsi == gsi) {
> +				rcu_read_unlock();
> +				return true;
> +			}
> +
> +	rcu_read_unlock();
> +
> +	return false;
> +}
> +EXPORT_SYMBOL_GPL(kvm_irq_has_notifier);
> +
>  void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin)
>  {
>  	struct kvm_irq_ack_notifier *kian;
> @@ -261,6 +284,7 @@ void kvm_register_irq_ack_notifier(struct kvm *kvm,
>  	mutex_lock(&kvm->irq_lock);
>  	hlist_add_head_rcu(&kian->link, &kvm->irq_ack_notifier_list);
>  	mutex_unlock(&kvm->irq_lock);
> +	ioapic_update_eoi_exitmap(kvm);
>  }
>  
>  void kvm_unregister_irq_ack_notifier(struct kvm *kvm,
> @@ -270,6 +294,7 @@ void kvm_unregister_irq_ack_notifier(struct kvm *kvm,
>  	hlist_del_init_rcu(&kian->link);
>  	mutex_unlock(&kvm->irq_lock);
>  	synchronize_rcu();
> +	ioapic_update_eoi_exitmap(kvm);
>  }
>  
>  int kvm_request_irq_source_id(struct kvm *kvm)
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index e45c20c..cc465c6 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -217,6 +217,11 @@ void kvm_make_mclock_inprogress_request(struct kvm *kvm)
>  	make_all_cpus_request(kvm, KVM_REQ_MCLOCK_INPROGRESS);
>  }
>  
> +void kvm_make_update_eoibitmap_request(struct kvm *kvm)
> +{
> +	make_all_cpus_request(kvm, KVM_REQ_EOIBITMAP);
> +}
> +
>  int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
>  {
>  	struct page *page;
> -- 
> 1.7.1

--
			Gleb.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [PATCH v11 3/3] x86, apicv: add virtual interrupt delivery support
  2013-01-20 12:51     ` Gleb Natapov
@ 2013-01-21  0:49       ` Zhang, Yang Z
  2013-01-21  5:03         ` Gleb Natapov
  0 siblings, 1 reply; 34+ messages in thread
From: Zhang, Yang Z @ 2013-01-21  0:49 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: kvm, Shan, Haitao, mtosatti, Zhang, Xiantao, Tian, Kevin

Gleb Natapov wrote on 2013-01-20:
> On Thu, Jan 17, 2013 at 01:26:03AM +0000, Zhang, Yang Z wrote:
>> Previous patch is stale. Resend the new patch.
>> The only change is clear EOI and SELF-IPI reg in msr bitmap when vid is enabled.
>> 
>> ------------------------
>> @@ -340,6 +325,8 @@ static inline int apic_find_highest_irr(struct kvm_lapic
> *apic)
>>  {
>>  	int result;
>> +	/* Note that irr_pending is just a hint. It will be always
>> +	 * true with virtual interrupt delivery enabled. */
> This is not correct format for multi-line comments.
Sure, will correct it here and below.

>> +static void vmx_check_ioapic_entry(struct kvm_vcpu *vcpu,
>> +				   struct kvm_lapic_irq *irq)
>> +{
>> +	struct kvm_lapic **dst;
>> +	struct kvm_apic_map *map;
>> +	unsigned long bitmap = 1;
>> +	int i;
>> +
>> +	rcu_read_lock();
>> +	map = rcu_dereference(vcpu->kvm->arch.apic_map);
>> +
>> +	if (unlikely(!map)) {
>> +		set_eoi_exitmap_one(vcpu, irq->vector);
>> +		goto out;
>> +	}
>> +
>> +	if (irq->dest_mode == 0) { /* physical mode */
>> +		if (irq->delivery_mode == APIC_DM_LOWEST ||
>> +				irq->dest_id == 0xff) {
>> +			set_eoi_exitmap_one(vcpu, irq->vector);
>> +			goto out;
>> +		}
>> +		dst = &map->phys_map[irq->dest_id & 0xff];
>> +	} else {
>> +		u32 mda = irq->dest_id << (32 - map->ldr_bits);
>> +
>> +		dst = map->logical_map[apic_cluster_id(map, mda)];
>> +
>> +		bitmap = apic_logical_id(map, mda);
>> +	}
>> +
>> +	for_each_set_bit(i, &bitmap, 16) {
>> +		if (!dst[i])
>> +			continue;
>> +		if (dst[i]->vcpu == vcpu) {
>> +			set_eoi_exitmap_one(vcpu, irq->vector);
>> +			break;
>> +		}
>> +	}
>> +
>> +out:
>> +	rcu_read_unlock();
>> +}
> The logic in this function belongs to lapic code. The only thing
> that is specific to vmx in the function is setting of the bit in
> vmx->eoi_exit_bitmap, but since eoi_exit_bitmap is calculated and
> loaded during same vcpu entry we do not need vmx->eoi_exit_bitmap at
> all. Declare it on a stack in vmx_update_eoi_exitmap() and pass it to
> set_eoi_exitmap() and vmx_load_eoi_exitmap().
IIRC, this logic is in lapic before v7. And you suggested to move the whole function into vmx code.
So, it better to move back to lapic file?

>> @@ -115,6 +116,42 @@ static void update_handled_vectors(struct kvm_ioapic
> *ioapic)
>>  	smp_wmb();
>>  }
>> +void set_eoi_exitmap(struct kvm_vcpu *vcpu)
>> +{
> This function is exported from the file and need to have more unique
> name. kvm_ioapic_calculate_eoi_exitmap() for instance.
Ok.

>> @@ -156,6 +193,7 @@ static void ioapic_write_indirect(struct kvm_ioapic
> *ioapic, u32 val)
>>  		if (e->fields.trig_mode == IOAPIC_LEVEL_TRIG
>>  		    && ioapic->irr & (1 << index))
>>  			ioapic_service(ioapic, index);
>> +		ioapic_update_eoi_exitmap(ioapic->kvm);
> ioapic_write_indirect() is called under ioapic->lock,
> ioapic_update_eoi_exitmap() takes the same lock. Have you tested the
> code?
ioapic_update_eoi_exitmap doesn't take any lock.

I will do a full testing for every patch before sending out. It covers both windows and Linux guest.

Best regards,
Yang


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v11 3/3] x86, apicv: add virtual interrupt delivery support
  2013-01-21  0:49       ` Zhang, Yang Z
@ 2013-01-21  5:03         ` Gleb Natapov
  2013-01-21  7:18           ` Zhang, Yang Z
                             ` (2 more replies)
  0 siblings, 3 replies; 34+ messages in thread
From: Gleb Natapov @ 2013-01-21  5:03 UTC (permalink / raw)
  To: Zhang, Yang Z; +Cc: kvm, Shan, Haitao, mtosatti, Zhang, Xiantao, Tian, Kevin

On Mon, Jan 21, 2013 at 12:49:01AM +0000, Zhang, Yang Z wrote:
> Gleb Natapov wrote on 2013-01-20:
> > On Thu, Jan 17, 2013 at 01:26:03AM +0000, Zhang, Yang Z wrote:
> >> Previous patch is stale. Resend the new patch.
> >> The only change is clear EOI and SELF-IPI reg in msr bitmap when vid is enabled.
> >> 
> >> ------------------------
> >> @@ -340,6 +325,8 @@ static inline int apic_find_highest_irr(struct kvm_lapic
> > *apic)
> >>  {
> >>  	int result;
> >> +	/* Note that irr_pending is just a hint. It will be always
> >> +	 * true with virtual interrupt delivery enabled. */
> > This is not correct format for multi-line comments.
> Sure, will correct it here and below.
> 
> >> +static void vmx_check_ioapic_entry(struct kvm_vcpu *vcpu,
> >> +				   struct kvm_lapic_irq *irq)
> >> +{
> >> +	struct kvm_lapic **dst;
> >> +	struct kvm_apic_map *map;
> >> +	unsigned long bitmap = 1;
> >> +	int i;
> >> +
> >> +	rcu_read_lock();
> >> +	map = rcu_dereference(vcpu->kvm->arch.apic_map);
> >> +
> >> +	if (unlikely(!map)) {
> >> +		set_eoi_exitmap_one(vcpu, irq->vector);
> >> +		goto out;
> >> +	}
> >> +
> >> +	if (irq->dest_mode == 0) { /* physical mode */
> >> +		if (irq->delivery_mode == APIC_DM_LOWEST ||
> >> +				irq->dest_id == 0xff) {
> >> +			set_eoi_exitmap_one(vcpu, irq->vector);
> >> +			goto out;
> >> +		}
> >> +		dst = &map->phys_map[irq->dest_id & 0xff];
> >> +	} else {
> >> +		u32 mda = irq->dest_id << (32 - map->ldr_bits);
> >> +
> >> +		dst = map->logical_map[apic_cluster_id(map, mda)];
> >> +
> >> +		bitmap = apic_logical_id(map, mda);
> >> +	}
> >> +
> >> +	for_each_set_bit(i, &bitmap, 16) {
> >> +		if (!dst[i])
> >> +			continue;
> >> +		if (dst[i]->vcpu == vcpu) {
> >> +			set_eoi_exitmap_one(vcpu, irq->vector);
> >> +			break;
> >> +		}
> >> +	}
> >> +
> >> +out:
> >> +	rcu_read_unlock();
> >> +}
> > The logic in this function belongs to lapic code. The only thing
> > that is specific to vmx in the function is setting of the bit in
> > vmx->eoi_exit_bitmap, but since eoi_exit_bitmap is calculated and
> > loaded during same vcpu entry we do not need vmx->eoi_exit_bitmap at
> > all. Declare it on a stack in vmx_update_eoi_exitmap() and pass it to
> > set_eoi_exitmap() and vmx_load_eoi_exitmap().
> IIRC, this logic is in lapic before v7. And you suggested to move the whole function into vmx code.
> So, it better to move back to lapic file?
> 
IIRC I suggested to call it only from vmx, not move it there. Before
that the calculation was done even with vid disabled and only result was
ignored. With current logic KVM_REQ_EOIBITMAP will be set only with vid
enabled so the calculation will not be done needlessly.


> >> @@ -115,6 +116,42 @@ static void update_handled_vectors(struct kvm_ioapic
> > *ioapic)
> >>  	smp_wmb();
> >>  }
> >> +void set_eoi_exitmap(struct kvm_vcpu *vcpu)
> >> +{
> > This function is exported from the file and need to have more unique
> > name. kvm_ioapic_calculate_eoi_exitmap() for instance.
> Ok.
> 
> >> @@ -156,6 +193,7 @@ static void ioapic_write_indirect(struct kvm_ioapic
> > *ioapic, u32 val)
> >>  		if (e->fields.trig_mode == IOAPIC_LEVEL_TRIG
> >>  		    && ioapic->irr & (1 << index))
> >>  			ioapic_service(ioapic, index);
> >> +		ioapic_update_eoi_exitmap(ioapic->kvm);
> > ioapic_write_indirect() is called under ioapic->lock,
> > ioapic_update_eoi_exitmap() takes the same lock. Have you tested the
> > code?
> ioapic_update_eoi_exitmap doesn't take any lock.
> 
Sorry. You are correct. Confused between different functions.

> I will do a full testing for every patch before sending out. It covers both windows and Linux guest.
> 
We are getting close so please test with userspace irq chip too.

--
			Gleb.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [PATCH v11 3/3] x86, apicv: add virtual interrupt delivery support
  2013-01-21  5:03         ` Gleb Natapov
@ 2013-01-21  7:18           ` Zhang, Yang Z
  2013-01-21 21:08           ` Marcelo Tosatti
  2013-01-23  0:45           ` Zhang, Yang Z
  2 siblings, 0 replies; 34+ messages in thread
From: Zhang, Yang Z @ 2013-01-21  7:18 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: kvm, Shan, Haitao, mtosatti, Zhang, Xiantao, Tian, Kevin

Gleb Natapov wrote on 2013-01-21:
> On Mon, Jan 21, 2013 at 12:49:01AM +0000, Zhang, Yang Z wrote:
>> Gleb Natapov wrote on 2013-01-20:
>>> On Thu, Jan 17, 2013 at 01:26:03AM +0000, Zhang, Yang Z wrote:
>>>> Previous patch is stale. Resend the new patch. The only change is
>>>> clear EOI and SELF-IPI reg in msr bitmap when vid is enabled.
>>>> 
>>>> ------------------------
>>>> @@ -340,6 +325,8 @@ static inline int apic_find_highest_irr(struct kvm_lapic
>>> *apic)
>>>>  {
>>>>  	int result;
>>>> +	/* Note that irr_pending is just a hint. It will be always
>>>> +	 * true with virtual interrupt delivery enabled. */
>>> This is not correct format for multi-line comments.
>> Sure, will correct it here and below.
>> 
>>>> +static void vmx_check_ioapic_entry(struct kvm_vcpu *vcpu,
>>>> +				   struct kvm_lapic_irq *irq)
>>>> +{
>>>> +	struct kvm_lapic **dst;
>>>> +	struct kvm_apic_map *map;
>>>> +	unsigned long bitmap = 1;
>>>> +	int i;
>>>> +
>>>> +	rcu_read_lock();
>>>> +	map = rcu_dereference(vcpu->kvm->arch.apic_map);
>>>> +
>>>> +	if (unlikely(!map)) {
>>>> +		set_eoi_exitmap_one(vcpu, irq->vector);
>>>> +		goto out;
>>>> +	}
>>>> +
>>>> +	if (irq->dest_mode == 0) { /* physical mode */
>>>> +		if (irq->delivery_mode == APIC_DM_LOWEST ||
>>>> +				irq->dest_id == 0xff) {
>>>> +			set_eoi_exitmap_one(vcpu, irq->vector);
>>>> +			goto out;
>>>> +		}
>>>> +		dst = &map->phys_map[irq->dest_id & 0xff];
>>>> +	} else {
>>>> +		u32 mda = irq->dest_id << (32 - map->ldr_bits);
>>>> +
>>>> +		dst = map->logical_map[apic_cluster_id(map, mda)];
>>>> +
>>>> +		bitmap = apic_logical_id(map, mda);
>>>> +	}
>>>> +
>>>> +	for_each_set_bit(i, &bitmap, 16) {
>>>> +		if (!dst[i])
>>>> +			continue;
>>>> +		if (dst[i]->vcpu == vcpu) {
>>>> +			set_eoi_exitmap_one(vcpu, irq->vector);
>>>> +			break;
>>>> +		}
>>>> +	}
>>>> +
>>>> +out:
>>>> +	rcu_read_unlock();
>>>> +}
>>> The logic in this function belongs to lapic code. The only thing
>>> that is specific to vmx in the function is setting of the bit in
>>> vmx->eoi_exit_bitmap, but since eoi_exit_bitmap is calculated and
>>> loaded during same vcpu entry we do not need vmx->eoi_exit_bitmap at
>>> all. Declare it on a stack in vmx_update_eoi_exitmap() and pass it to
>>> set_eoi_exitmap() and vmx_load_eoi_exitmap().
>> IIRC, this logic is in lapic before v7. And you suggested to move the
>> whole function into vmx code. So, it better to move back to lapic file?
>> 
> IIRC I suggested to call it only from vmx, not move it there. Before
> that the calculation was done even with vid disabled and only result was
> ignored. With current logic KVM_REQ_EOIBITMAP will be set only with vid
> enabled so the calculation will not be done needlessly.
Maybe I misread your comments. :) 
Yes, it is more reasonable to put it in lapic.

> 
>>>> @@ -115,6 +116,42 @@ static void update_handled_vectors(struct
> kvm_ioapic
>>> *ioapic)
>>>>  	smp_wmb();
>>>>  }
>>>> +void set_eoi_exitmap(struct kvm_vcpu *vcpu)
>>>> +{
>>> This function is exported from the file and need to have more unique
>>> name. kvm_ioapic_calculate_eoi_exitmap() for instance.
>> Ok.
>> 
>>>> @@ -156,6 +193,7 @@ static void ioapic_write_indirect(struct kvm_ioapic
>>> *ioapic, u32 val)
>>>>  		if (e->fields.trig_mode == IOAPIC_LEVEL_TRIG
>>>>  		    && ioapic->irr & (1 << index))
>>>>  			ioapic_service(ioapic, index);
>>>> +		ioapic_update_eoi_exitmap(ioapic->kvm);
>>> ioapic_write_indirect() is called under ioapic->lock,
>>> ioapic_update_eoi_exitmap() takes the same lock. Have you tested the
>>> code?
>> ioapic_update_eoi_exitmap doesn't take any lock.
>> 
> Sorry. You are correct. Confused between different functions.
> 
>> I will do a full testing for every patch before sending out. It covers
>> both windows and Linux guest.
>> 
> We are getting close so please test with userspace irq chip too.
Sure.


Best regards,
Yang



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v11 2/3] x86, apicv: add virtual x2apic support
  2013-01-16 10:21 ` [PATCH v11 2/3] x86, apicv: add virtual x2apic support Yang Zhang
  2013-01-17 13:22   ` Gleb Natapov
@ 2013-01-21 19:59   ` Marcelo Tosatti
  2013-01-21 20:21     ` Gleb Natapov
  2013-01-22 12:21     ` Zhang, Yang Z
  1 sibling, 2 replies; 34+ messages in thread
From: Marcelo Tosatti @ 2013-01-21 19:59 UTC (permalink / raw)
  To: Yang Zhang; +Cc: kvm, gleb, haitao.shan, xiantao.zhang, Kevin Tian

On Wed, Jan 16, 2013 at 06:21:11PM +0800, Yang Zhang wrote:
> From: Yang Zhang <yang.z.zhang@Intel.com>
> 
> basically to benefit from apicv, we need to enable virtualized x2apic mode.
> Currently, we only enable it when guest is really using x2apic.
> 
> Also, clear MSR bitmap for corresponding x2apic MSRs when guest enabled x2apic:
>     0x800 - 0x8ff: no read intercept for apicv register virtualization,
>                    except APIC ID and TMCCT which need software's assistance to
> 		   get right value.
> 
> Signed-off-by: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
> ---
>  arch/x86/include/asm/kvm_host.h |    1 +
>  arch/x86/include/asm/vmx.h      |    1 +
>  arch/x86/kvm/lapic.c            |   20 ++--
>  arch/x86/kvm/lapic.h            |    5 +
>  arch/x86/kvm/svm.c              |    6 +
>  arch/x86/kvm/vmx.c              |  204 +++++++++++++++++++++++++++++++++++----
>  6 files changed, 209 insertions(+), 28 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index c431b33..35aa8e6 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -697,6 +697,7 @@ struct kvm_x86_ops {
>  	void (*enable_nmi_window)(struct kvm_vcpu *vcpu);
>  	void (*enable_irq_window)(struct kvm_vcpu *vcpu);
>  	void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);
> +	void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
>  	int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
>  	int (*get_tdp_level)(void);
>  	u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
> diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
> index 44c3f7e..0a54df0 100644
> --- a/arch/x86/include/asm/vmx.h
> +++ b/arch/x86/include/asm/vmx.h
> @@ -139,6 +139,7 @@
>  #define SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES 0x00000001
>  #define SECONDARY_EXEC_ENABLE_EPT               0x00000002
>  #define SECONDARY_EXEC_RDTSCP			0x00000008
> +#define SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE   0x00000010
>  #define SECONDARY_EXEC_ENABLE_VPID              0x00000020
>  #define SECONDARY_EXEC_WBINVD_EXITING		0x00000040
>  #define SECONDARY_EXEC_UNRESTRICTED_GUEST	0x00000080
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 0664c13..f39aee3 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -140,11 +140,6 @@ static inline int apic_enabled(struct kvm_lapic *apic)
>  	(LVT_MASK | APIC_MODE_MASK | APIC_INPUT_POLARITY | \
>  	 APIC_LVT_REMOTE_IRR | APIC_LVT_LEVEL_TRIGGER)
>  
> -static inline int apic_x2apic_mode(struct kvm_lapic *apic)
> -{
> -	return apic->vcpu->arch.apic_base & X2APIC_ENABLE;
> -}
> -
>  static inline int kvm_apic_id(struct kvm_lapic *apic)
>  {
>  	return (kvm_apic_get_reg(apic, APIC_ID) >> 24) & 0xff;
> @@ -1323,12 +1318,17 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value)
>  	if (!kvm_vcpu_is_bsp(apic->vcpu))
>  		value &= ~MSR_IA32_APICBASE_BSP;
>  
> -	vcpu->arch.apic_base = value;
> -	if (apic_x2apic_mode(apic)) {
> -		u32 id = kvm_apic_id(apic);
> -		u32 ldr = ((id >> 4) << 16) | (1 << (id & 0xf));
> -		kvm_apic_set_ldr(apic, ldr);
> +	if ((vcpu->arch.apic_base ^ value) & X2APIC_ENABLE) {
> +		if (value & X2APIC_ENABLE) {
> +			u32 id = kvm_apic_id(apic);
> +			u32 ldr = ((id >> 4) << 16) | (1 << (id & 0xf));
> +			kvm_apic_set_ldr(apic, ldr);
> +			kvm_x86_ops->set_virtual_x2apic_mode(vcpu, true);
> +		} else
> +			kvm_x86_ops->set_virtual_x2apic_mode(vcpu, false);
>  	}
> +
> +	vcpu->arch.apic_base = value;

Simpler to have

if (apic_x2apic_mode(apic)) {
	...
	kvm_x86_ops->set_virtual_x2apic_mode(vcpu, true);
} else {
	kvm_x86_ops->set_virtual_x2apic_mode(vcpu, false);
}

Also it must be done after assignment of vcpu->arch.apic_base (this
patch has vcpu->arch.apic_base being read from
->set_virtual_x2apic_mode() path).

> +static void vmx_set_msr_bitmap(struct kvm_vcpu *vcpu)
> +{
> +	unsigned long *msr_bitmap;
> +
> +	if (apic_x2apic_mode(vcpu->arch.apic))

vcpu->arch.apic can be NULL.

> +static void vmx_intercept_for_msr_read_x2apic(u32 msr, bool set)
> +{
> +	if (set) {
> +		__vmx_enable_intercept_for_msr(vmx_msr_bitmap_legacy_x2apic,
> +				msr, MSR_TYPE_R);
> +		__vmx_enable_intercept_for_msr(vmx_msr_bitmap_longmode_x2apic,
> +				msr, MSR_TYPE_R);
> +	} else {
> +		__vmx_disable_intercept_for_msr(vmx_msr_bitmap_legacy_x2apic,
> +				msr, MSR_TYPE_R);
> +		__vmx_disable_intercept_for_msr(vmx_msr_bitmap_longmode_x2apic,
> +				msr, MSR_TYPE_R);
> +	}
> +}

Please retain the enable_intercept/disable_intercept naming in the
function name, instead of a set parameter.

> +static void vmx_intercept_for_msr_write_x2apic(u32 msr, bool set)
> +{
> +	if (set) {
> +		__vmx_enable_intercept_for_msr(vmx_msr_bitmap_legacy_x2apic,
> +				msr, MSR_TYPE_W);
> +		__vmx_enable_intercept_for_msr(vmx_msr_bitmap_longmode_x2apic,
> +				msr, MSR_TYPE_W);
> +	} else {
> +		__vmx_disable_intercept_for_msr(vmx_msr_bitmap_legacy_x2apic,
> +				msr, MSR_TYPE_W);
> +		__vmx_disable_intercept_for_msr(vmx_msr_bitmap_longmode_x2apic,
> +				msr, MSR_TYPE_W);
> +	}
>  }

Same here.

> @@ -3848,6 +3950,7 @@ static u32 vmx_secondary_exec_control(struct vcpu_vmx *vmx)
>  		exec_control &= ~SECONDARY_EXEC_PAUSE_LOOP_EXITING;
>  	if (!enable_apicv_reg)
>  		exec_control &= ~SECONDARY_EXEC_APIC_REGISTER_VIRT;
> +	exec_control &= ~SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE;
>  	return exec_control;

Unconditionally disabling SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE? Its
awkward.

> +	struct vcpu_vmx *vmx = to_vmx(vcpu);
> +
> +	/* There is not point to enable virtualize x2apic without enable
> +	 * apicv*/
> +	if (!cpu_has_vmx_virtualize_x2apic_mode() || !enable_apicv_reg)
> +		return;
> +
> +	if (set) {
> +		exec_control = vmcs_read32(CPU_BASED_VM_EXEC_CONTROL);
> +		/* virtualize x2apic mode relies on tpr shadow */
> +		if (!(exec_control & CPU_BASED_TPR_SHADOW))
> +			return;
> +	}
> +
> +	sec_exec_control = vmcs_read32(SECONDARY_VM_EXEC_CONTROL);
> +
> +	if (set) {
> +		sec_exec_control &= ~SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
> +		sec_exec_control |= SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE;
> +	} else {
> +		sec_exec_control &= ~SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE;
> +		if (vm_need_virtualize_apic_accesses(vmx->vcpu.kvm))
> +			sec_exec_control |=
> +					SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
> +	}
> +	vmcs_write32(SECONDARY_VM_EXEC_CONTROL, sec_exec_control);
> +
> +	if (set) {
> +		for (msr = 0x800; msr <= 0x8ff; msr++)
> +			vmx_intercept_for_msr_read_x2apic(msr, false);
> +
> +		/* According SDM, in x2apic mode, the whole id reg is used.
> +		 * But in KVM, it only use the highest eight bits. Need to
> +		 * intercept it */
> +		vmx_intercept_for_msr_read_x2apic(0x802, true);
> +		/* TMCCT */
> +		vmx_intercept_for_msr_read_x2apic(0x839, true);
> +		/* TPR */
> +		vmx_intercept_for_msr_write_x2apic(0x808, false);
> +	}

Why not disable write intercept for all MSRs which represent APIC registers
that are virtualized? Why TPR is special?




^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v11 2/3] x86, apicv: add virtual x2apic support
  2013-01-21 19:59   ` Marcelo Tosatti
@ 2013-01-21 20:21     ` Gleb Natapov
  2013-01-21 21:21       ` Marcelo Tosatti
  2013-01-22 12:21     ` Zhang, Yang Z
  1 sibling, 1 reply; 34+ messages in thread
From: Gleb Natapov @ 2013-01-21 20:21 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Yang Zhang, kvm, haitao.shan, xiantao.zhang, Kevin Tian

On Mon, Jan 21, 2013 at 05:59:07PM -0200, Marcelo Tosatti wrote:
> On Wed, Jan 16, 2013 at 06:21:11PM +0800, Yang Zhang wrote:
> > From: Yang Zhang <yang.z.zhang@Intel.com>
> > 
> > basically to benefit from apicv, we need to enable virtualized x2apic mode.
> > Currently, we only enable it when guest is really using x2apic.
> > 
> > Also, clear MSR bitmap for corresponding x2apic MSRs when guest enabled x2apic:
> >     0x800 - 0x8ff: no read intercept for apicv register virtualization,
> >                    except APIC ID and TMCCT which need software's assistance to
> > 		   get right value.
> > 
> > Signed-off-by: Kevin Tian <kevin.tian@intel.com>
> > Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
> > ---
> >  arch/x86/include/asm/kvm_host.h |    1 +
> >  arch/x86/include/asm/vmx.h      |    1 +
> >  arch/x86/kvm/lapic.c            |   20 ++--
> >  arch/x86/kvm/lapic.h            |    5 +
> >  arch/x86/kvm/svm.c              |    6 +
> >  arch/x86/kvm/vmx.c              |  204 +++++++++++++++++++++++++++++++++++----
> >  6 files changed, 209 insertions(+), 28 deletions(-)
> > 
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index c431b33..35aa8e6 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -697,6 +697,7 @@ struct kvm_x86_ops {
> >  	void (*enable_nmi_window)(struct kvm_vcpu *vcpu);
> >  	void (*enable_irq_window)(struct kvm_vcpu *vcpu);
> >  	void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);
> > +	void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
> >  	int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
> >  	int (*get_tdp_level)(void);
> >  	u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
> > diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
> > index 44c3f7e..0a54df0 100644
> > --- a/arch/x86/include/asm/vmx.h
> > +++ b/arch/x86/include/asm/vmx.h
> > @@ -139,6 +139,7 @@
> >  #define SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES 0x00000001
> >  #define SECONDARY_EXEC_ENABLE_EPT               0x00000002
> >  #define SECONDARY_EXEC_RDTSCP			0x00000008
> > +#define SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE   0x00000010
> >  #define SECONDARY_EXEC_ENABLE_VPID              0x00000020
> >  #define SECONDARY_EXEC_WBINVD_EXITING		0x00000040
> >  #define SECONDARY_EXEC_UNRESTRICTED_GUEST	0x00000080
> > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > index 0664c13..f39aee3 100644
> > --- a/arch/x86/kvm/lapic.c
> > +++ b/arch/x86/kvm/lapic.c
> > @@ -140,11 +140,6 @@ static inline int apic_enabled(struct kvm_lapic *apic)
> >  	(LVT_MASK | APIC_MODE_MASK | APIC_INPUT_POLARITY | \
> >  	 APIC_LVT_REMOTE_IRR | APIC_LVT_LEVEL_TRIGGER)
> >  
> > -static inline int apic_x2apic_mode(struct kvm_lapic *apic)
> > -{
> > -	return apic->vcpu->arch.apic_base & X2APIC_ENABLE;
> > -}
> > -
> >  static inline int kvm_apic_id(struct kvm_lapic *apic)
> >  {
> >  	return (kvm_apic_get_reg(apic, APIC_ID) >> 24) & 0xff;
> > @@ -1323,12 +1318,17 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value)
> >  	if (!kvm_vcpu_is_bsp(apic->vcpu))
> >  		value &= ~MSR_IA32_APICBASE_BSP;
> >  
> > -	vcpu->arch.apic_base = value;
> > -	if (apic_x2apic_mode(apic)) {
> > -		u32 id = kvm_apic_id(apic);
> > -		u32 ldr = ((id >> 4) << 16) | (1 << (id & 0xf));
> > -		kvm_apic_set_ldr(apic, ldr);
> > +	if ((vcpu->arch.apic_base ^ value) & X2APIC_ENABLE) {
> > +		if (value & X2APIC_ENABLE) {
> > +			u32 id = kvm_apic_id(apic);
> > +			u32 ldr = ((id >> 4) << 16) | (1 << (id & 0xf));
> > +			kvm_apic_set_ldr(apic, ldr);
> > +			kvm_x86_ops->set_virtual_x2apic_mode(vcpu, true);
> > +		} else
> > +			kvm_x86_ops->set_virtual_x2apic_mode(vcpu, false);
> >  	}
> > +
> > +	vcpu->arch.apic_base = value;
> 
> Simpler to have
> 
> if (apic_x2apic_mode(apic)) {
> 	...
> 	kvm_x86_ops->set_virtual_x2apic_mode(vcpu, true);
> } else {
> 	kvm_x86_ops->set_virtual_x2apic_mode(vcpu, false);
> }
> 
This will not work during cpu init. That was discussed on one of
the previous iterations of the patch. When this code is called during
vcpu init vmcs is not loaded yet so set_virtual_x2apic_mode() cannot
write into it.

> Also it must be done after assignment of vcpu->arch.apic_base (this
> patch has vcpu->arch.apic_base being read from
> ->set_virtual_x2apic_mode() path).
> 
> > +static void vmx_set_msr_bitmap(struct kvm_vcpu *vcpu)
> > +{
> > +	unsigned long *msr_bitmap;
> > +
> > +	if (apic_x2apic_mode(vcpu->arch.apic))
> 
> vcpu->arch.apic can be NULL.
> 
> > +static void vmx_intercept_for_msr_read_x2apic(u32 msr, bool set)
> > +{
> > +	if (set) {
> > +		__vmx_enable_intercept_for_msr(vmx_msr_bitmap_legacy_x2apic,
> > +				msr, MSR_TYPE_R);
> > +		__vmx_enable_intercept_for_msr(vmx_msr_bitmap_longmode_x2apic,
> > +				msr, MSR_TYPE_R);
> > +	} else {
> > +		__vmx_disable_intercept_for_msr(vmx_msr_bitmap_legacy_x2apic,
> > +				msr, MSR_TYPE_R);
> > +		__vmx_disable_intercept_for_msr(vmx_msr_bitmap_longmode_x2apic,
> > +				msr, MSR_TYPE_R);
> > +	}
> > +}
> 
> Please retain the enable_intercept/disable_intercept naming in the
> function name, instead of a set parameter.
> 
> > +static void vmx_intercept_for_msr_write_x2apic(u32 msr, bool set)
> > +{
> > +	if (set) {
> > +		__vmx_enable_intercept_for_msr(vmx_msr_bitmap_legacy_x2apic,
> > +				msr, MSR_TYPE_W);
> > +		__vmx_enable_intercept_for_msr(vmx_msr_bitmap_longmode_x2apic,
> > +				msr, MSR_TYPE_W);
> > +	} else {
> > +		__vmx_disable_intercept_for_msr(vmx_msr_bitmap_legacy_x2apic,
> > +				msr, MSR_TYPE_W);
> > +		__vmx_disable_intercept_for_msr(vmx_msr_bitmap_longmode_x2apic,
> > +				msr, MSR_TYPE_W);
> > +	}
> >  }
> 
> Same here.
> 
> > @@ -3848,6 +3950,7 @@ static u32 vmx_secondary_exec_control(struct vcpu_vmx *vmx)
> >  		exec_control &= ~SECONDARY_EXEC_PAUSE_LOOP_EXITING;
> >  	if (!enable_apicv_reg)
> >  		exec_control &= ~SECONDARY_EXEC_APIC_REGISTER_VIRT;
> > +	exec_control &= ~SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE;
> >  	return exec_control;
> 
> Unconditionally disabling SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE? Its
> awkward.
> 
That is how vmx_secondary_exec_control works though. It takes what can
be set in secondary exec control and drops what should not be set there.

> > +	struct vcpu_vmx *vmx = to_vmx(vcpu);
> > +
> > +	/* There is not point to enable virtualize x2apic without enable
> > +	 * apicv*/
> > +	if (!cpu_has_vmx_virtualize_x2apic_mode() || !enable_apicv_reg)
> > +		return;
> > +
> > +	if (set) {
> > +		exec_control = vmcs_read32(CPU_BASED_VM_EXEC_CONTROL);
> > +		/* virtualize x2apic mode relies on tpr shadow */
> > +		if (!(exec_control & CPU_BASED_TPR_SHADOW))
> > +			return;
> > +	}
> > +
> > +	sec_exec_control = vmcs_read32(SECONDARY_VM_EXEC_CONTROL);
> > +
> > +	if (set) {
> > +		sec_exec_control &= ~SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
> > +		sec_exec_control |= SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE;
> > +	} else {
> > +		sec_exec_control &= ~SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE;
> > +		if (vm_need_virtualize_apic_accesses(vmx->vcpu.kvm))
> > +			sec_exec_control |=
> > +					SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
> > +	}
> > +	vmcs_write32(SECONDARY_VM_EXEC_CONTROL, sec_exec_control);
> > +
> > +	if (set) {
> > +		for (msr = 0x800; msr <= 0x8ff; msr++)
> > +			vmx_intercept_for_msr_read_x2apic(msr, false);
> > +
> > +		/* According SDM, in x2apic mode, the whole id reg is used.
> > +		 * But in KVM, it only use the highest eight bits. Need to
> > +		 * intercept it */
> > +		vmx_intercept_for_msr_read_x2apic(0x802, true);
> > +		/* TMCCT */
> > +		vmx_intercept_for_msr_read_x2apic(0x839, true);
> > +		/* TPR */
> > +		vmx_intercept_for_msr_write_x2apic(0x808, false);
> > +	}
> 
> Why not disable write intercept for all MSRs which represent APIC registers
> that are virtualized? Why TPR is special?
> 
This patch goes before vid is enabled. At this point only TPR is
vitalized. If APIC_WRITE exit will be generated on unhandled MSR write
then we can disable intercept for all x2apic MSRs here.

--
			Gleb.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v11 3/3] x86, apicv: add virtual interrupt delivery support
  2013-01-16 10:21 ` [PATCH v11 3/3] x86, apicv: add virtual interrupt delivery support Yang Zhang
  2013-01-17  1:26   ` Zhang, Yang Z
@ 2013-01-21 21:05   ` Marcelo Tosatti
  2013-01-21 21:18     ` Gleb Natapov
  1 sibling, 1 reply; 34+ messages in thread
From: Marcelo Tosatti @ 2013-01-21 21:05 UTC (permalink / raw)
  To: Yang Zhang; +Cc: kvm, gleb, haitao.shan, xiantao.zhang, Kevin Tian

On Wed, Jan 16, 2013 at 06:21:12PM +0800, Yang Zhang wrote:
> From: Yang Zhang <yang.z.zhang@Intel.com>
> 
> Virtual interrupt delivery avoids KVM to inject vAPIC interrupts
> manually, which is fully taken care of by the hardware. This needs
> some special awareness into existing interrupr injection path:
> 
> - for pending interrupt, instead of direct injection, we may need
>   update architecture specific indicators before resuming to guest.
> 
> - A pending interrupt, which is masked by ISR, should be also
>   considered in above update action, since hardware will decide
>   when to inject it at right time. Current has_interrupt and
>   get_interrupt only returns a valid vector from injection p.o.v.
> 
> Signed-off-by: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
> ---
>  arch/ia64/kvm/lapic.h           |    6 ++
>  arch/x86/include/asm/kvm_host.h |    7 ++
>  arch/x86/include/asm/vmx.h      |   11 +++
>  arch/x86/kvm/irq.c              |   56 +++++++++++-
>  arch/x86/kvm/lapic.c            |   69 +++++++++------
>  arch/x86/kvm/lapic.h            |   23 +++++
>  arch/x86/kvm/svm.c              |   25 +++++
>  arch/x86/kvm/vmx.c              |  184 +++++++++++++++++++++++++++++++++++++--
>  arch/x86/kvm/x86.c              |   11 ++-
>  arch/x86/kvm/x86.h              |    2 +
>  include/linux/kvm_host.h        |    3 +
>  virt/kvm/ioapic.c               |   38 ++++++++
>  virt/kvm/ioapic.h               |    1 +
>  virt/kvm/irq_comm.c             |   25 +++++
>  virt/kvm/kvm_main.c             |    5 +
>  15 files changed, 421 insertions(+), 45 deletions(-)
> 

> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 35aa8e6..d90bfa8 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -697,6 +697,12 @@ struct kvm_x86_ops {
>  	void (*enable_nmi_window)(struct kvm_vcpu *vcpu);
>  	void (*enable_irq_window)(struct kvm_vcpu *vcpu);
>  	void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);
> +	int (*has_virtual_interrupt_delivery)(void);
> +	void (*update_apic_irq)(struct kvm_vcpu *vcpu, int max_irr);

Call it hwapic_irr_update().

> +	void (*update_eoi_exitmap)(struct kvm_vcpu *vcpu);
> +	void (*set_svi)(int isr);

SVI is VMX-specific. Call it hwapic_isr_update().

> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 526955d..8b9752a 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -84,8 +84,8 @@ module_param(vmm_exclusive, bool, S_IRUGO);
>  static bool __read_mostly fasteoi = 1;
>  module_param(fasteoi, bool, S_IRUGO);
>  
> -static bool __read_mostly enable_apicv_reg = 1;
> -module_param(enable_apicv_reg, bool, S_IRUGO);
> +static bool __read_mostly enable_apicv_reg_vid = 1;
> +module_param(enable_apicv_reg_vid, bool, S_IRUGO);
>  
>  /*
>   * If nested=1, nested virtualization is supported, i.e., guests may use

> @@ -2783,8 +2793,14 @@ static __init int hardware_setup(void)
>  	if (!cpu_has_vmx_ple())
>  		ple_gap = 0;
>  
> -	if (!cpu_has_vmx_apic_register_virt())
> -		enable_apicv_reg = 0;
> +	if (!cpu_has_vmx_apic_register_virt() ||
> +				!cpu_has_vmx_virtual_intr_delivery())
> +		enable_apicv_reg_vid = 0;
> +
> +	if (enable_apicv_reg_vid)
> +		kvm_x86_ops->update_cr8_intercept = NULL;
> +	else
> +		kvm_x86_ops->update_apic_irq = NULL;

Please be consistent regarding ->update_apic_irq: if HW does not support 
acceleration, or HW acceleration has been disabled, it is NULL (SVM
should be NULL, then).

If so, then please delete enable_apic_reg_vid check in vmx_set_svi.

> +static void vmx_set_svi(int isr)
> +{
> +	u16 status;
> +	u8 old;
> +
> +	if (!enable_apicv_reg_vid)
> +		return;

See above.

> +static void set_eoi_exitmap_one(struct kvm_vcpu *vcpu,
> +				u32 vector)
> +{
> +	struct vcpu_vmx *vmx = to_vmx(vcpu);
> +
> +	if (WARN_ONCE((vector > 255),
> +		"KVM VMX: vector (%d) out of range\n", vector))
> +		return;

Please remove the WARN_ON, inject triple fault if this happens
(and add a tracepoint).

> +
> +	__set_bit(vector, (unsigned long *)vmx->eoi_exit_bitmap);
> +}
> +
> +static void vmx_check_ioapic_entry(struct kvm_vcpu *vcpu,
> +				   struct kvm_lapic_irq *irq)
> +{
> +	struct kvm_lapic **dst;
> +	struct kvm_apic_map *map;
> +	unsigned long bitmap = 1;
> +	int i;
> +
> +	rcu_read_lock();
> +	map = rcu_dereference(vcpu->kvm->arch.apic_map);
> +
> +	if (unlikely(!map)) {
> +		set_eoi_exitmap_one(vcpu, irq->vector);
> +		goto out;
> +	}
> +
> +	if (irq->dest_mode == 0) { /* physical mode */
> +		if (irq->delivery_mode == APIC_DM_LOWEST ||
> +				irq->dest_id == 0xff) {
> +			set_eoi_exitmap_one(vcpu, irq->vector);
> +			goto out;
> +		}
> +		dst = &map->phys_map[irq->dest_id & 0xff];
> +	} else {
> +		u32 mda = irq->dest_id << (32 - map->ldr_bits);
> +
> +		dst = map->logical_map[apic_cluster_id(map, mda)];
> +
> +		bitmap = apic_logical_id(map, mda);
> +	}
> +
> +	for_each_set_bit(i, &bitmap, 16) {
> +		if (!dst[i])
> +			continue;
> +		if (dst[i]->vcpu == vcpu) {
> +			set_eoi_exitmap_one(vcpu, irq->vector);
> +			break;
> +		}
> +	}
> +
> +out:
> +	rcu_read_unlock();
> +}

All this can be moved to generic code, then

hwapic_vector_intercept_on_eoi() becomes a callback to kvm_x86_ops.

> index e224f7a..e8e7b6a 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -122,6 +122,8 @@ int kvm_write_guest_virt_system(struct x86_emulate_ctxt *ctxt,
>  	gva_t addr, void *val, unsigned int bytes,
>  	struct x86_exception *exception);
>  
> +void set_eoi_exitmap(struct kvm_vcpu *vcpu);
> +

This should be in virt/kvm/ioapic.h.

> @@ -82,5 +82,6 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
>  		struct kvm_lapic_irq *irq);
>  int kvm_get_ioapic(struct kvm *kvm, struct kvm_ioapic_state *state);
>  int kvm_set_ioapic(struct kvm *kvm, struct kvm_ioapic_state *state);
> +void ioapic_update_eoi_exitmap(struct kvm *kvm);

prefix with kvm_


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v11 3/3] x86, apicv: add virtual interrupt delivery support
  2013-01-21  5:03         ` Gleb Natapov
  2013-01-21  7:18           ` Zhang, Yang Z
@ 2013-01-21 21:08           ` Marcelo Tosatti
  2013-01-23  0:45           ` Zhang, Yang Z
  2 siblings, 0 replies; 34+ messages in thread
From: Marcelo Tosatti @ 2013-01-21 21:08 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Zhang, Yang Z, kvm, Shan, Haitao, Zhang, Xiantao, Tian, Kevin

On Mon, Jan 21, 2013 at 07:03:00AM +0200, Gleb Natapov wrote:
> > >> @@ -115,6 +116,42 @@ static void update_handled_vectors(struct kvm_ioapic
> > > *ioapic)
> > >>  	smp_wmb();
> > >>  }
> > >> +void set_eoi_exitmap(struct kvm_vcpu *vcpu)
> > >> +{
> > > This function is exported from the file and need to have more unique
> > > name. kvm_ioapic_calculate_eoi_exitmap() for instance.
> > Ok.
> > 
> > >> @@ -156,6 +193,7 @@ static void ioapic_write_indirect(struct kvm_ioapic
> > > *ioapic, u32 val)
> > >>  		if (e->fields.trig_mode == IOAPIC_LEVEL_TRIG
> > >>  		    && ioapic->irr & (1 << index))
> > >>  			ioapic_service(ioapic, index);
> > >> +		ioapic_update_eoi_exitmap(ioapic->kvm);
> > > ioapic_write_indirect() is called under ioapic->lock,
> > > ioapic_update_eoi_exitmap() takes the same lock. Have you tested the
> > > code?
> > ioapic_update_eoi_exitmap doesn't take any lock.
> > 
> Sorry. You are correct. Confused between different functions.

It needs a more descriptive name, then.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v11 3/3] x86, apicv: add virtual interrupt delivery support
  2013-01-21 21:05   ` Marcelo Tosatti
@ 2013-01-21 21:18     ` Gleb Natapov
  2013-01-21 21:54       ` Marcelo Tosatti
  0 siblings, 1 reply; 34+ messages in thread
From: Gleb Natapov @ 2013-01-21 21:18 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Yang Zhang, kvm, haitao.shan, xiantao.zhang, Kevin Tian

On Mon, Jan 21, 2013 at 07:05:43PM -0200, Marcelo Tosatti wrote:
> > +	struct vcpu_vmx *vmx = to_vmx(vcpu);
> > +
> > +	if (WARN_ONCE((vector > 255),
> > +		"KVM VMX: vector (%d) out of range\n", vector))
> > +		return;
> 
> Please remove the WARN_ON, inject triple fault if this happens
> (and add a tracepoint).
> 
Since this cannot be triggered by a guest triple fault is unwarranted.
Vector can be greater than 255 here only due to KVM bug, so WARN() or
even BUG() is appropriate.

--
			Gleb.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v11 2/3] x86, apicv: add virtual x2apic support
  2013-01-21 20:21     ` Gleb Natapov
@ 2013-01-21 21:21       ` Marcelo Tosatti
  2013-01-21 21:34         ` Gleb Natapov
  0 siblings, 1 reply; 34+ messages in thread
From: Marcelo Tosatti @ 2013-01-21 21:21 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Yang Zhang, kvm, haitao.shan, xiantao.zhang, Kevin Tian

On Mon, Jan 21, 2013 at 10:21:14PM +0200, Gleb Natapov wrote:
> > >  	}
> > > +
> > > +	vcpu->arch.apic_base = value;
> > 
> > Simpler to have
> > 
> > if (apic_x2apic_mode(apic)) {
> > 	...
> > 	kvm_x86_ops->set_virtual_x2apic_mode(vcpu, true);
> > } else {
> > 	kvm_x86_ops->set_virtual_x2apic_mode(vcpu, false);
> > }
> > 
> This will not work during cpu init. That was discussed on one of
> the previous iterations of the patch. When this code is called during
> vcpu init vmcs is not loaded yet so set_virtual_x2apic_mode() cannot
> write into it.

Are you saying that the logic to write on bit value change is due to 
ordering with cpu init or that the callback is at the wrong place?

> > 
> > Why not disable write intercept for all MSRs which represent APIC registers
> > that are virtualized? Why TPR is special?
> > 
> This patch goes before vid is enabled. At this point only TPR is
> vitalized. If APIC_WRITE exit will be generated on unhandled MSR write
> then we can disable intercept for all x2apic MSRs here.

-ENOPARSE, please be more verbose. "unhandled MSR write" ?


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v11 2/3] x86, apicv: add virtual x2apic support
  2013-01-21 21:21       ` Marcelo Tosatti
@ 2013-01-21 21:34         ` Gleb Natapov
  2013-01-21 22:16           ` Marcelo Tosatti
  0 siblings, 1 reply; 34+ messages in thread
From: Gleb Natapov @ 2013-01-21 21:34 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Yang Zhang, kvm, haitao.shan, xiantao.zhang, Kevin Tian

On Mon, Jan 21, 2013 at 07:21:13PM -0200, Marcelo Tosatti wrote:
> On Mon, Jan 21, 2013 at 10:21:14PM +0200, Gleb Natapov wrote:
> > > >  	}
> > > > +
> > > > +	vcpu->arch.apic_base = value;
> > > 
> > > Simpler to have
> > > 
> > > if (apic_x2apic_mode(apic)) {
> > > 	...
> > > 	kvm_x86_ops->set_virtual_x2apic_mode(vcpu, true);
> > > } else {
> > > 	kvm_x86_ops->set_virtual_x2apic_mode(vcpu, false);
> > > }
> > > 
> > This will not work during cpu init. That was discussed on one of
> > the previous iterations of the patch. When this code is called during
> > vcpu init vmcs is not loaded yet so set_virtual_x2apic_mode() cannot
> > write into it.
> 
> Are you saying that the logic to write on bit value change is due to 
> ordering with cpu init or that the callback is at the wrong place?
> 
The logic is because of ordering with cpu init.

> > > 
> > > Why not disable write intercept for all MSRs which represent APIC registers
> > > that are virtualized? Why TPR is special?
> > > 
> > This patch goes before vid is enabled. At this point only TPR is
> > vitalized. If APIC_WRITE exit will be generated on unhandled MSR write
> > then we can disable intercept for all x2apic MSRs here.
> 
> -ENOPARSE, please be more verbose. "unhandled MSR write" ?
By unhandled I mean unintercepted. Write to x2apic MSR will either
generate MSR write exit if msr bitmap says so and then x2apic MSR will
be handled just like today, or, if we disable intercept, it will
generate APIC_WRITE exit (need to consult SDM here, not sure about it).
One is not really preferable to the other.
 
--
			Gleb.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v11 3/3] x86, apicv: add virtual interrupt delivery support
  2013-01-21 21:18     ` Gleb Natapov
@ 2013-01-21 21:54       ` Marcelo Tosatti
  2013-01-22  3:38         ` Zhang, Yang Z
  0 siblings, 1 reply; 34+ messages in thread
From: Marcelo Tosatti @ 2013-01-21 21:54 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Yang Zhang, kvm, haitao.shan, xiantao.zhang, Kevin Tian

On Mon, Jan 21, 2013 at 11:18:28PM +0200, Gleb Natapov wrote:
> On Mon, Jan 21, 2013 at 07:05:43PM -0200, Marcelo Tosatti wrote:
> > > +	struct vcpu_vmx *vmx = to_vmx(vcpu);
> > > +
> > > +	if (WARN_ONCE((vector > 255),
> > > +		"KVM VMX: vector (%d) out of range\n", vector))
> > > +		return;
> > 
> > Please remove the WARN_ON, inject triple fault if this happens
> > (and add a tracepoint).
> > 
> Since this cannot be triggered by a guest triple fault is unwarranted.
> Vector can be greater than 255 here only due to KVM bug, so WARN() or
> even BUG() is appropriate.

Then BUG_ON().


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v11 2/3] x86, apicv: add virtual x2apic support
  2013-01-21 21:34         ` Gleb Natapov
@ 2013-01-21 22:16           ` Marcelo Tosatti
  2013-01-22  0:33             ` Marcelo Tosatti
  2013-01-22  9:42             ` Gleb Natapov
  0 siblings, 2 replies; 34+ messages in thread
From: Marcelo Tosatti @ 2013-01-21 22:16 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Yang Zhang, kvm, haitao.shan, xiantao.zhang, Kevin Tian

On Mon, Jan 21, 2013 at 11:34:20PM +0200, Gleb Natapov wrote:
> On Mon, Jan 21, 2013 at 07:21:13PM -0200, Marcelo Tosatti wrote:
> > On Mon, Jan 21, 2013 at 10:21:14PM +0200, Gleb Natapov wrote:
> > > > >  	}
> > > > > +
> > > > > +	vcpu->arch.apic_base = value;
> > > > 
> > > > Simpler to have
> > > > 
> > > > if (apic_x2apic_mode(apic)) {
> > > > 	...
> > > > 	kvm_x86_ops->set_virtual_x2apic_mode(vcpu, true);
> > > > } else {
> > > > 	kvm_x86_ops->set_virtual_x2apic_mode(vcpu, false);
> > > > }
> > > > 
> > > This will not work during cpu init. That was discussed on one of
> > > the previous iterations of the patch. When this code is called during
> > > vcpu init vmcs is not loaded yet so set_virtual_x2apic_mode() cannot
> > > write into it.
> > 
> > Are you saying that the logic to write on bit value change is due to 
> > ordering with cpu init or that the callback is at the wrong place?
> > 
> The logic is because of ordering with cpu init.

OK. Still must move this conditional callback after assignment of apic_base.

> > > > Why not disable write intercept for all MSRs which represent APIC registers
> > > > that are virtualized? Why TPR is special?
> > > > 
> > > This patch goes before vid is enabled. At this point only TPR is
> > > vitalized. If APIC_WRITE exit will be generated on unhandled MSR write
> > > then we can disable intercept for all x2apic MSRs here.
> > 
> > -ENOPARSE, please be more verbose. "unhandled MSR write" ?
> By unhandled I mean unintercepted. Write to x2apic MSR will either
> generate MSR write exit if msr bitmap says so and then x2apic MSR will
> be handled just like today, or, if we disable intercept, it will
> generate APIC_WRITE exit (need to consult SDM here, not sure about it).
> One is not really preferable to the other.

It will generate APIC_WRITE, for example, if due to EOI virtualization
exiting.

The question is, why is intercept for EOI MSR address (0x80B) not being
disabled here, while TPR is? I don't see intercept disabled by other
patches either.

APIC_WRITE exit (conditional, see EOI virtualization page) is 
preferable to MSR write exit (unconditional).


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v11 2/3] x86, apicv: add virtual x2apic support
  2013-01-21 22:16           ` Marcelo Tosatti
@ 2013-01-22  0:33             ` Marcelo Tosatti
  2013-01-22  3:37               ` Zhang, Yang Z
  2013-01-22  9:45               ` Gleb Natapov
  2013-01-22  9:42             ` Gleb Natapov
  1 sibling, 2 replies; 34+ messages in thread
From: Marcelo Tosatti @ 2013-01-22  0:33 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Yang Zhang, kvm, haitao.shan, xiantao.zhang, Kevin Tian

On Mon, Jan 21, 2013 at 08:16:18PM -0200, Marcelo Tosatti wrote:
> On Mon, Jan 21, 2013 at 11:34:20PM +0200, Gleb Natapov wrote:
> > On Mon, Jan 21, 2013 at 07:21:13PM -0200, Marcelo Tosatti wrote:
> > > On Mon, Jan 21, 2013 at 10:21:14PM +0200, Gleb Natapov wrote:
> > > > > >  	}
> > > > > > +
> > > > > > +	vcpu->arch.apic_base = value;
> > > > > 
> > > > > Simpler to have
> > > > > 
> > > > > if (apic_x2apic_mode(apic)) {
> > > > > 	...
> > > > > 	kvm_x86_ops->set_virtual_x2apic_mode(vcpu, true);
> > > > > } else {
> > > > > 	kvm_x86_ops->set_virtual_x2apic_mode(vcpu, false);
> > > > > }
> > > > > 
> > > > This will not work during cpu init. That was discussed on one of
> > > > the previous iterations of the patch. When this code is called during
> > > > vcpu init vmcs is not loaded yet so set_virtual_x2apic_mode() cannot
> > > > write into it.
> > > 
> > > Are you saying that the logic to write on bit value change is due to 
> > > ordering with cpu init or that the callback is at the wrong place?
> > > 
> > The logic is because of ordering with cpu init.
> 
> OK. Still must move this conditional callback after assignment of apic_base.
> 
> > > > > Why not disable write intercept for all MSRs which represent APIC registers
> > > > > that are virtualized? Why TPR is special?
> > > > > 
> > > > This patch goes before vid is enabled. At this point only TPR is
> > > > vitalized. If APIC_WRITE exit will be generated on unhandled MSR write
> > > > then we can disable intercept for all x2apic MSRs here.
> > > 
> > > -ENOPARSE, please be more verbose. "unhandled MSR write" ?
> > By unhandled I mean unintercepted. Write to x2apic MSR will either
> > generate MSR write exit if msr bitmap says so and then x2apic MSR will
> > be handled just like today, or, if we disable intercept, it will
> > generate APIC_WRITE exit (need to consult SDM here, not sure about it).
> > One is not really preferable to the other.
> 
> It will generate APIC_WRITE, for example, if due to EOI virtualization
> exiting.

Err, no, EOI induced vmexit is due to EOI virtualization.

> The question is, why is intercept for EOI MSR address (0x80B) not being
> disabled here, while TPR is? I don't see intercept disabled by other
> patches either.

Point still valid: why intercept for EOI MSR address not being disabled?


^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [PATCH v11 2/3] x86, apicv: add virtual x2apic support
  2013-01-22  0:33             ` Marcelo Tosatti
@ 2013-01-22  3:37               ` Zhang, Yang Z
  2013-01-22  9:45               ` Gleb Natapov
  1 sibling, 0 replies; 34+ messages in thread
From: Zhang, Yang Z @ 2013-01-22  3:37 UTC (permalink / raw)
  To: Marcelo Tosatti, Gleb Natapov
  Cc: kvm, Shan, Haitao, Zhang, Xiantao, Tian, Kevin

Marcelo Tosatti wrote on 2013-01-22:
> On Mon, Jan 21, 2013 at 08:16:18PM -0200, Marcelo Tosatti wrote:
>> On Mon, Jan 21, 2013 at 11:34:20PM +0200, Gleb Natapov wrote:
>>> On Mon, Jan 21, 2013 at 07:21:13PM -0200, Marcelo Tosatti wrote:
>>>> On Mon, Jan 21, 2013 at 10:21:14PM +0200, Gleb Natapov wrote:
>>>>>>>  	}
>>>>>>> +
>>>>>>> +	vcpu->arch.apic_base = value;
>>>>>> 
>>>>>> Simpler to have
>>>>>> 
>>>>>> if (apic_x2apic_mode(apic)) {
>>>>>> 	...
>>>>>> 	kvm_x86_ops->set_virtual_x2apic_mode(vcpu, true);
>>>>>> } else {
>>>>>> 	kvm_x86_ops->set_virtual_x2apic_mode(vcpu, false);
>>>>>> }
>>>>>> 
>>>>> This will not work during cpu init. That was discussed on one of
>>>>> the previous iterations of the patch. When this code is called during
>>>>> vcpu init vmcs is not loaded yet so set_virtual_x2apic_mode() cannot
>>>>> write into it.
>>>> 
>>>> Are you saying that the logic to write on bit value change is due to
>>>> ordering with cpu init or that the callback is at the wrong place?
>>>> 
>>> The logic is because of ordering with cpu init.
>> 
>> OK. Still must move this conditional callback after assignment of apic_base.
You are correct. will move it in next patch.

>>>>>> Why not disable write intercept for all MSRs which represent APIC
>>>>>> registers that are virtualized? Why TPR is special?
>>>>>> 
>>>>> This patch goes before vid is enabled. At this point only TPR is
>>>>> vitalized. If APIC_WRITE exit will be generated on unhandled MSR write
>>>>> then we can disable intercept for all x2apic MSRs here.
>>>> 
>>>> -ENOPARSE, please be more verbose. "unhandled MSR write" ?
>>> By unhandled I mean unintercepted. Write to x2apic MSR will either
>>> generate MSR write exit if msr bitmap says so and then x2apic MSR will
>>> be handled just like today, or, if we disable intercept, it will
>>> generate APIC_WRITE exit (need to consult SDM here, not sure about it).
>>> One is not really preferable to the other.
>> 
>> It will generate APIC_WRITE, for example, if due to EOI virtualization
>> exiting.
> 
> Err, no, EOI induced vmexit is due to EOI virtualization.
> 
>> The question is, why is intercept for EOI MSR address (0x80B) not being
>> disabled here, while TPR is? I don't see intercept disabled by other
>> patches either.
TPR shadow is required by virtual x2apic mode. So if enabled virtual x2apic mode, that means TPR shadow is also enabled, then we can disable TPR msr intercept.

> Point still valid: why intercept for EOI MSR address not being disabled?
Please see the third patch. Only vid is enabled then we will disable eoi msr intercept and self ipi.

@@ -6249,10 +6286,138 @@ static void vmx_set_virtual_x2apic_mode(struct kvm_vcpu *vcpu, bool set)
                vmx_intercept_for_msr_read_x2apic(0x839, true);
                /* TPR */
                vmx_intercept_for_msr_write_x2apic(0x808, false);
+               /* EOI */
+               vmx_intercept_for_msr_write_x2apic(0x80b, false);
+               /* SELF-IPI */
+               vmx_intercept_for_msr_write_x2apic(0x83f, false);
        }
        vmx_set_msr_bitmap(vcpu);

Best regards,
Yang



^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [PATCH v11 3/3] x86, apicv: add virtual interrupt delivery support
  2013-01-21 21:54       ` Marcelo Tosatti
@ 2013-01-22  3:38         ` Zhang, Yang Z
  0 siblings, 0 replies; 34+ messages in thread
From: Zhang, Yang Z @ 2013-01-22  3:38 UTC (permalink / raw)
  To: Marcelo Tosatti, Gleb Natapov
  Cc: kvm, Shan, Haitao, Zhang, Xiantao, Tian, Kevin

Marcelo Tosatti wrote on 2013-01-22:
> On Mon, Jan 21, 2013 at 11:18:28PM +0200, Gleb Natapov wrote:
>> On Mon, Jan 21, 2013 at 07:05:43PM -0200, Marcelo Tosatti wrote:
>>>> +	struct vcpu_vmx *vmx = to_vmx(vcpu);
>>>> +
>>>> +	if (WARN_ONCE((vector > 255),
>>>> +		"KVM VMX: vector (%d) out of range\n", vector))
>>>> +		return;
>>> 
>>> Please remove the WARN_ON, inject triple fault if this happens
>>> (and add a tracepoint).
>>> 
>> Since this cannot be triggered by a guest triple fault is unwarranted.
>> Vector can be greater than 255 here only due to KVM bug, so WARN() or
>> even BUG() is appropriate.
> 
> Then BUG_ON().
Sure. 

Best regards,
Yang



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v11 2/3] x86, apicv: add virtual x2apic support
  2013-01-21 22:16           ` Marcelo Tosatti
  2013-01-22  0:33             ` Marcelo Tosatti
@ 2013-01-22  9:42             ` Gleb Natapov
  1 sibling, 0 replies; 34+ messages in thread
From: Gleb Natapov @ 2013-01-22  9:42 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Yang Zhang, kvm, haitao.shan, xiantao.zhang, Kevin Tian

On Mon, Jan 21, 2013 at 08:16:18PM -0200, Marcelo Tosatti wrote:
> On Mon, Jan 21, 2013 at 11:34:20PM +0200, Gleb Natapov wrote:
> > On Mon, Jan 21, 2013 at 07:21:13PM -0200, Marcelo Tosatti wrote:
> > > On Mon, Jan 21, 2013 at 10:21:14PM +0200, Gleb Natapov wrote:
> > > > > >  	}
> > > > > > +
> > > > > > +	vcpu->arch.apic_base = value;
> > > > > 
> > > > > Simpler to have
> > > > > 
> > > > > if (apic_x2apic_mode(apic)) {
> > > > > 	...
> > > > > 	kvm_x86_ops->set_virtual_x2apic_mode(vcpu, true);
> > > > > } else {
> > > > > 	kvm_x86_ops->set_virtual_x2apic_mode(vcpu, false);
> > > > > }
> > > > > 
> > > > This will not work during cpu init. That was discussed on one of
> > > > the previous iterations of the patch. When this code is called during
> > > > vcpu init vmcs is not loaded yet so set_virtual_x2apic_mode() cannot
> > > > write into it.
> > > 
> > > Are you saying that the logic to write on bit value change is due to 
> > > ordering with cpu init or that the callback is at the wrong place?
> > > 
> > The logic is because of ordering with cpu init.
> 
> OK. Still must move this conditional callback after assignment of apic_base.
> 
Sure.

--
			Gleb.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v11 2/3] x86, apicv: add virtual x2apic support
  2013-01-22  0:33             ` Marcelo Tosatti
  2013-01-22  3:37               ` Zhang, Yang Z
@ 2013-01-22  9:45               ` Gleb Natapov
  1 sibling, 0 replies; 34+ messages in thread
From: Gleb Natapov @ 2013-01-22  9:45 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Yang Zhang, kvm, haitao.shan, xiantao.zhang, Kevin Tian

On Mon, Jan 21, 2013 at 10:33:46PM -0200, Marcelo Tosatti wrote:
> > The question is, why is intercept for EOI MSR address (0x80B) not being
> > disabled here, while TPR is? I don't see intercept disabled by other
> > patches either.
> 
> Point still valid: why intercept for EOI MSR address not being disabled?
Yang sent two version of the third patch. Second one disabled intercept
for EOI MSR. Ignore the first one.

--
			Gleb.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [PATCH v11 2/3] x86, apicv: add virtual x2apic support
  2013-01-21 19:59   ` Marcelo Tosatti
  2013-01-21 20:21     ` Gleb Natapov
@ 2013-01-22 12:21     ` Zhang, Yang Z
  2013-01-22 15:55       ` Gleb Natapov
  1 sibling, 1 reply; 34+ messages in thread
From: Zhang, Yang Z @ 2013-01-22 12:21 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm, gleb, Shan, Haitao, Zhang, Xiantao, Tian, Kevin

Marcelo Tosatti wrote on 2013-01-22:
> On Wed, Jan 16, 2013 at 06:21:11PM +0800, Yang Zhang wrote:
>> From: Yang Zhang <yang.z.zhang@Intel.com>
>> 
>> basically to benefit from apicv, we need to enable virtualized x2apic mode.
>> Currently, we only enable it when guest is really using x2apic.
>> 
>> Also, clear MSR bitmap for corresponding x2apic MSRs when guest enabled
> x2apic:
>>     0x800 - 0x8ff: no read intercept for apicv register virtualization,
>>                    except APIC ID and TMCCT which need software's
> assistance to
>> 		   get right value.
>> 
>> Signed-off-by: Kevin Tian <kevin.tian@intel.com>
>> Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
>> ---
>>  arch/x86/include/asm/kvm_host.h |    1 + arch/x86/include/asm/vmx.h   
>>    |    1 + arch/x86/kvm/lapic.c            |   20 ++--
>>  arch/x86/kvm/lapic.h            |    5 + arch/x86/kvm/svm.c           
>>    |    6 + arch/x86/kvm/vmx.c              |  204
>>  +++++++++++++++++++++++++++++++++++---- 6 files changed, 209
>>  insertions(+), 28 deletions(-)
>> diff --git a/arch/x86/include/asm/kvm_host.h
>> b/arch/x86/include/asm/kvm_host.h index c431b33..35aa8e6 100644 ---
>> a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -697,6 +697,7 @@ struct kvm_x86_ops {
>>  	void (*enable_nmi_window)(struct kvm_vcpu *vcpu); 	void
>>  (*enable_irq_window)(struct kvm_vcpu *vcpu); 	void
>>  (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);
>>  +	void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
>>  	int (*set_tss_addr)(struct kvm *kvm, unsigned int addr); 	int
>>  (*get_tdp_level)(void); 	u64 (*get_mt_mask)(struct kvm_vcpu *vcpu,
>>  gfn_t gfn, bool is_mmio);
>> diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
>> index 44c3f7e..0a54df0 100644
>> --- a/arch/x86/include/asm/vmx.h
>> +++ b/arch/x86/include/asm/vmx.h
>> @@ -139,6 +139,7 @@
>>  #define SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES 0x00000001 #define
>>  SECONDARY_EXEC_ENABLE_EPT               0x00000002 #define
>>  SECONDARY_EXEC_RDTSCP			0x00000008 +#define
>>  SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE   0x00000010 #define
>>  SECONDARY_EXEC_ENABLE_VPID              0x00000020 #define
>>  SECONDARY_EXEC_WBINVD_EXITING		0x00000040 #define
>>  SECONDARY_EXEC_UNRESTRICTED_GUEST	0x00000080
>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>> index 0664c13..f39aee3 100644
>> --- a/arch/x86/kvm/lapic.c
>> +++ b/arch/x86/kvm/lapic.c
>> @@ -140,11 +140,6 @@ static inline int apic_enabled(struct kvm_lapic *apic)
>>  	(LVT_MASK | APIC_MODE_MASK | APIC_INPUT_POLARITY | \
>>  	 APIC_LVT_REMOTE_IRR | APIC_LVT_LEVEL_TRIGGER)
>> -static inline int apic_x2apic_mode(struct kvm_lapic *apic)
>> -{
>> -	return apic->vcpu->arch.apic_base & X2APIC_ENABLE;
>> -}
>> -
>>  static inline int kvm_apic_id(struct kvm_lapic *apic)
>>  {
>>  	return (kvm_apic_get_reg(apic, APIC_ID) >> 24) & 0xff;
>> @@ -1323,12 +1318,17 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu,
> u64 value)
>>  	if (!kvm_vcpu_is_bsp(apic->vcpu))
>>  		value &= ~MSR_IA32_APICBASE_BSP;
>> -	vcpu->arch.apic_base = value;
>> -	if (apic_x2apic_mode(apic)) {
>> -		u32 id = kvm_apic_id(apic);
>> -		u32 ldr = ((id >> 4) << 16) | (1 << (id & 0xf));
>> -		kvm_apic_set_ldr(apic, ldr);
>> +	if ((vcpu->arch.apic_base ^ value) & X2APIC_ENABLE) {
>> +		if (value & X2APIC_ENABLE) {
>> +			u32 id = kvm_apic_id(apic);
>> +			u32 ldr = ((id >> 4) << 16) | (1 << (id & 0xf));
>> +			kvm_apic_set_ldr(apic, ldr);
>> +			kvm_x86_ops->set_virtual_x2apic_mode(vcpu, true);
>> +		} else
>> +			kvm_x86_ops->set_virtual_x2apic_mode(vcpu, false);
>>  	}
>> +
>> +	vcpu->arch.apic_base = value;
> 
> Simpler to have
> 
> if (apic_x2apic_mode(apic)) {
> 	...
> 	kvm_x86_ops->set_virtual_x2apic_mode(vcpu, true);
> } else {
> 	kvm_x86_ops->set_virtual_x2apic_mode(vcpu, false);
> }
> 
> Also it must be done after assignment of vcpu->arch.apic_base (this
> patch has vcpu->arch.apic_base being read from
> ->set_virtual_x2apic_mode() path).
> 
>> +static void vmx_set_msr_bitmap(struct kvm_vcpu *vcpu)
>> +{
>> +	unsigned long *msr_bitmap;
>> +
>> +	if (apic_x2apic_mode(vcpu->arch.apic))
> 
> vcpu->arch.apic can be NULL.
Actually, call apic_x2apic_mode to check whether use x2apic msr bitmap is wrong.
VCPU uses x2apic but it may not set virtual x2apic mode bit due to TPR shadow not enabled or irqchip not in kernel. Check the virtual x2apic mode bit in vmcs directly should be a better choice. How about the follow code:

static void vmx_set_msr_bitmap(struct kvm_vcpu *vcpu)
{
        unsigned long *msr_bitmap;

        if (vmcs_read32(SECONDARY_VM_EXEC_CONTROL) & SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE)
                if (is_long_mode(vcpu))
                        msr_bitmap = vmx_msr_bitmap_longmode_x2apic;
                else
                        msr_bitmap = vmx_msr_bitmap_legacy_x2apic;
        else
                if (is_long_mode(vcpu))
                        msr_bitmap = vmx_msr_bitmap_longmode;
                else
                        msr_bitmap = vmx_msr_bitmap_legacy;

        vmcs_write64(MSR_BITMAP, __pa(msr_bitmap));
}

Best regards,
Yang



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v11 2/3] x86, apicv: add virtual x2apic support
  2013-01-22 12:21     ` Zhang, Yang Z
@ 2013-01-22 15:55       ` Gleb Natapov
  2013-01-22 23:13         ` Marcelo Tosatti
  0 siblings, 1 reply; 34+ messages in thread
From: Gleb Natapov @ 2013-01-22 15:55 UTC (permalink / raw)
  To: Zhang, Yang Z
  Cc: Marcelo Tosatti, kvm, Shan, Haitao, Zhang, Xiantao, Tian, Kevin

On Tue, Jan 22, 2013 at 12:21:47PM +0000, Zhang, Yang Z wrote:
> >> +static void vmx_set_msr_bitmap(struct kvm_vcpu *vcpu)
> >> +{
> >> +	unsigned long *msr_bitmap;
> >> +
> >> +	if (apic_x2apic_mode(vcpu->arch.apic))
> > 
> > vcpu->arch.apic can be NULL.
> Actually, call apic_x2apic_mode to check whether use x2apic msr bitmap is wrong.
> VCPU uses x2apic but it may not set virtual x2apic mode bit due to TPR shadow not enabled or irqchip not in kernel. Check the virtual x2apic mode bit in vmcs directly should be a better choice. How about the follow code:
> 
If TPR shadow it not enabled vmx_msr_bitmap_.*x2apic bitmap will have x2apic MSRs intercepted.

> static void vmx_set_msr_bitmap(struct kvm_vcpu *vcpu)
> {
>         unsigned long *msr_bitmap;
> 
>         if (vmcs_read32(SECONDARY_VM_EXEC_CONTROL) & SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE)
>                 if (is_long_mode(vcpu))
>                         msr_bitmap = vmx_msr_bitmap_longmode_x2apic;
>                 else
>                         msr_bitmap = vmx_msr_bitmap_legacy_x2apic;
>         else
>                 if (is_long_mode(vcpu))
>                         msr_bitmap = vmx_msr_bitmap_longmode;
>                 else
>                         msr_bitmap = vmx_msr_bitmap_legacy;
> 
>         vmcs_write64(MSR_BITMAP, __pa(msr_bitmap));
> }
> 
> Best regards,
> Yang
> 

--
			Gleb.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v11 2/3] x86, apicv: add virtual x2apic support
  2013-01-22 15:55       ` Gleb Natapov
@ 2013-01-22 23:13         ` Marcelo Tosatti
  2013-01-23  0:35           ` Zhang, Yang Z
  2013-01-23 10:29           ` Gleb Natapov
  0 siblings, 2 replies; 34+ messages in thread
From: Marcelo Tosatti @ 2013-01-22 23:13 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Zhang, Yang Z, kvm, Shan, Haitao, Zhang, Xiantao, Tian, Kevin

On Tue, Jan 22, 2013 at 05:55:53PM +0200, Gleb Natapov wrote:
> On Tue, Jan 22, 2013 at 12:21:47PM +0000, Zhang, Yang Z wrote:
> > >> +static void vmx_set_msr_bitmap(struct kvm_vcpu *vcpu)
> > >> +{
> > >> +	unsigned long *msr_bitmap;
> > >> +
> > >> +	if (apic_x2apic_mode(vcpu->arch.apic))
> > > 
> > > vcpu->arch.apic can be NULL.
> > Actually, call apic_x2apic_mode to check whether use x2apic msr bitmap is wrong.
> > VCPU uses x2apic but it may not set virtual x2apic mode bit due to TPR shadow not enabled or irqchip not in kernel. Check the virtual x2apic mode bit in vmcs directly should be a better choice. How about the follow code:
> > 
> If TPR shadow it not enabled vmx_msr_bitmap_.*x2apic bitmap will have x2apic MSRs intercepted.

And what is the problem? APIC register virt depends on TPR shadow.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [PATCH v11 2/3] x86, apicv: add virtual x2apic support
  2013-01-22 23:13         ` Marcelo Tosatti
@ 2013-01-23  0:35           ` Zhang, Yang Z
  2013-01-23 10:29           ` Gleb Natapov
  1 sibling, 0 replies; 34+ messages in thread
From: Zhang, Yang Z @ 2013-01-23  0:35 UTC (permalink / raw)
  To: Marcelo Tosatti, Gleb Natapov
  Cc: kvm, Shan, Haitao, Zhang, Xiantao, Tian, Kevin

Marcelo Tosatti wrote on 2013-01-23:
> On Tue, Jan 22, 2013 at 05:55:53PM +0200, Gleb Natapov wrote:
>> On Tue, Jan 22, 2013 at 12:21:47PM +0000, Zhang, Yang Z wrote:
>>>>> +static void vmx_set_msr_bitmap(struct kvm_vcpu *vcpu)
>>>>> +{
>>>>> +	unsigned long *msr_bitmap;
>>>>> +
>>>>> +	if (apic_x2apic_mode(vcpu->arch.apic))
>>>> 
>>>> vcpu->arch.apic can be NULL.
>>> Actually, call apic_x2apic_mode to check whether use x2apic msr bitmap
>>> is wrong. VCPU uses x2apic but it may not set virtual x2apic mode bit
>>> due to TPR
> shadow not enabled or irqchip not in kernel. Check the virtual x2apic mode bit in
> vmcs directly should be a better choice. How about the follow code:
>>> 
>> If TPR shadow it not enabled vmx_msr_bitmap_.*x2apic bitmap will have
> x2apic MSRs intercepted.
Right. So check virtual x2apic mode bit also covers the TPR shadow check. Or else, we need two check: one for apic mode and one for TPR shadow.

> And what is the problem? APIC register virt depends on TPR shadow.
No problem. The new implementation is more reasonable and needn't to do additional check of TPR shadow, and will not touch apic_base.



Best regards,
Yang



^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [PATCH v11 3/3] x86, apicv: add virtual interrupt delivery support
  2013-01-21  5:03         ` Gleb Natapov
  2013-01-21  7:18           ` Zhang, Yang Z
  2013-01-21 21:08           ` Marcelo Tosatti
@ 2013-01-23  0:45           ` Zhang, Yang Z
  2013-01-23 10:33             ` Gleb Natapov
  2 siblings, 1 reply; 34+ messages in thread
From: Zhang, Yang Z @ 2013-01-23  0:45 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: kvm, Shan, Haitao, mtosatti, Zhang, Xiantao, Tian, Kevin

Gleb Natapov wrote on 2013-01-21:
> On Mon, Jan 21, 2013 at 12:49:01AM +0000, Zhang, Yang Z wrote:
>> Gleb Natapov wrote on 2013-01-20:
>>> On Thu, Jan 17, 2013 at 01:26:03AM +0000, Zhang, Yang Z wrote:
>>>> Previous patch is stale. Resend the new patch. The only change is
>>>> clear EOI and SELF-IPI reg in msr bitmap when vid is enabled.
>>>> 
>>>> ------------------------
>>>> @@ -340,6 +325,8 @@ static inline int apic_find_highest_irr(struct kvm_lapic
>>> *apic)
>>>>  {
>>>>  	int result;
>>>> +	/* Note that irr_pending is just a hint. It will be always
>>>> +	 * true with virtual interrupt delivery enabled. */
>>> This is not correct format for multi-line comments.
>> Sure, will correct it here and below.
>> 
>>>> +static void vmx_check_ioapic_entry(struct kvm_vcpu *vcpu,
>>>> +				   struct kvm_lapic_irq *irq)
>>>> +{
>>>> +	struct kvm_lapic **dst;
>>>> +	struct kvm_apic_map *map;
>>>> +	unsigned long bitmap = 1;
>>>> +	int i;
>>>> +
>>>> +	rcu_read_lock();
>>>> +	map = rcu_dereference(vcpu->kvm->arch.apic_map);
>>>> +
>>>> +	if (unlikely(!map)) {
>>>> +		set_eoi_exitmap_one(vcpu, irq->vector);
>>>> +		goto out;
>>>> +	}
>>>> +
>>>> +	if (irq->dest_mode == 0) { /* physical mode */
>>>> +		if (irq->delivery_mode == APIC_DM_LOWEST ||
>>>> +				irq->dest_id == 0xff) {
>>>> +			set_eoi_exitmap_one(vcpu, irq->vector);
>>>> +			goto out;
>>>> +		}
>>>> +		dst = &map->phys_map[irq->dest_id & 0xff];
>>>> +	} else {
>>>> +		u32 mda = irq->dest_id << (32 - map->ldr_bits);
>>>> +
>>>> +		dst = map->logical_map[apic_cluster_id(map, mda)];
>>>> +
>>>> +		bitmap = apic_logical_id(map, mda);
>>>> +	}
>>>> +
>>>> +	for_each_set_bit(i, &bitmap, 16) {
>>>> +		if (!dst[i])
>>>> +			continue;
>>>> +		if (dst[i]->vcpu == vcpu) {
>>>> +			set_eoi_exitmap_one(vcpu, irq->vector);
>>>> +			break;
>>>> +		}
>>>> +	}
>>>> +
>>>> +out:
>>>> +	rcu_read_unlock();
>>>> +}
>>> The logic in this function belongs to lapic code. The only thing
>>> that is specific to vmx in the function is setting of the bit in
>>> vmx->eoi_exit_bitmap, but since eoi_exit_bitmap is calculated and
>>> loaded during same vcpu entry we do not need vmx->eoi_exit_bitmap at
>>> all. Declare it on a stack in vmx_update_eoi_exitmap() and pass it to
>>> set_eoi_exitmap() and vmx_load_eoi_exitmap().
>> IIRC, this logic is in lapic before v7. And you suggested to move the
>> whole function into vmx code. So, it better to move back to lapic file?
>> 
> IIRC I suggested to call it only from vmx, not move it there. Before
> that the calculation was done even with vid disabled and only result was
> ignored. With current logic KVM_REQ_EOIBITMAP will be set only with vid
> enabled so the calculation will not be done needlessly.
> 
> 
>>>> @@ -115,6 +116,42 @@ static void update_handled_vectors(struct
> kvm_ioapic
>>> *ioapic)
>>>>  	smp_wmb();
>>>>  }
>>>> +void set_eoi_exitmap(struct kvm_vcpu *vcpu)
>>>> +{
>>> This function is exported from the file and need to have more unique
>>> name. kvm_ioapic_calculate_eoi_exitmap() for instance.
>> Ok.
>> 
>>>> @@ -156,6 +193,7 @@ static void ioapic_write_indirect(struct kvm_ioapic
>>> *ioapic, u32 val)
>>>>  		if (e->fields.trig_mode == IOAPIC_LEVEL_TRIG
>>>>  		    && ioapic->irr & (1 << index))
>>>>  			ioapic_service(ioapic, index);
>>>> +		ioapic_update_eoi_exitmap(ioapic->kvm);
>>> ioapic_write_indirect() is called under ioapic->lock,
>>> ioapic_update_eoi_exitmap() takes the same lock. Have you tested the
>>> code?
>> ioapic_update_eoi_exitmap doesn't take any lock.
>> 
> Sorry. You are correct. Confused between different functions.
> 
>> I will do a full testing for every patch before sending out. It covers
>> both windows and Linux guest.
>> 
> We are getting close so please test with userspace irq chip too.
Thanks for your suggestion to test with userspace irqchip. I found some issues and will modify the logic:
As we known, APICv deponds on TPR shadow. But TPR shadow is per VM(it will be disabled when VM uses userspace irq chip), this means APICv also is per VM. But in current implementation, we use the global variable enable_apicv_reg to check whether APICv is used by target vcpu. This is wrong. Instead, it should to read VMCS to see whether the bit is set or not.

Best regards,
Yang



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v11 2/3] x86, apicv: add virtual x2apic support
  2013-01-22 23:13         ` Marcelo Tosatti
  2013-01-23  0:35           ` Zhang, Yang Z
@ 2013-01-23 10:29           ` Gleb Natapov
  1 sibling, 0 replies; 34+ messages in thread
From: Gleb Natapov @ 2013-01-23 10:29 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Zhang, Yang Z, kvm, Shan, Haitao, Zhang, Xiantao, Tian, Kevin

On Tue, Jan 22, 2013 at 09:13:06PM -0200, Marcelo Tosatti wrote:
> On Tue, Jan 22, 2013 at 05:55:53PM +0200, Gleb Natapov wrote:
> > On Tue, Jan 22, 2013 at 12:21:47PM +0000, Zhang, Yang Z wrote:
> > > >> +static void vmx_set_msr_bitmap(struct kvm_vcpu *vcpu)
> > > >> +{
> > > >> +	unsigned long *msr_bitmap;
> > > >> +
> > > >> +	if (apic_x2apic_mode(vcpu->arch.apic))
> > > > 
> > > > vcpu->arch.apic can be NULL.
> > > Actually, call apic_x2apic_mode to check whether use x2apic msr bitmap is wrong.
> > > VCPU uses x2apic but it may not set virtual x2apic mode bit due to TPR shadow not enabled or irqchip not in kernel. Check the virtual x2apic mode bit in vmcs directly should be a better choice. How about the follow code:
> > > 
> > If TPR shadow it not enabled vmx_msr_bitmap_.*x2apic bitmap will have x2apic MSRs intercepted.
> 
> And what is the problem? APIC register virt depends on TPR shadow.
No problem. I am saying that it is safe to set exit bitmap to
vmx_msr_bitmap_.*x2apic in this case, so no need to check vmcs,
apic_x2apic_mode() is sufficient.
--
			Gleb.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v11 3/3] x86, apicv: add virtual interrupt delivery support
  2013-01-23  0:45           ` Zhang, Yang Z
@ 2013-01-23 10:33             ` Gleb Natapov
  2013-01-23 10:52               ` Zhang, Yang Z
  0 siblings, 1 reply; 34+ messages in thread
From: Gleb Natapov @ 2013-01-23 10:33 UTC (permalink / raw)
  To: Zhang, Yang Z; +Cc: kvm, Shan, Haitao, mtosatti, Zhang, Xiantao, Tian, Kevin

On Wed, Jan 23, 2013 at 12:45:39AM +0000, Zhang, Yang Z wrote:
> > We are getting close so please test with userspace irq chip too.
> Thanks for your suggestion to test with userspace irqchip. I found some issues and will modify the logic:
> As we known, APICv deponds on TPR shadow. But TPR shadow is per VM(it will be disabled when VM uses userspace irq chip), this means APICv also is per VM. But in current implementation, we use the global variable enable_apicv_reg to check whether APICv is used by target vcpu. This is wrong. Instead, it should to read VMCS to see whether the bit is set or not.
> 
No, the apicv and TPR shadow are global for all VMs with irq chip in
kernel and VMs without irq chip in kernel skip APIC that code entirely.
If there is generic code that should behave differently with or without
in-kernel irq chip make the check for the condition there.

--
			Gleb.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [PATCH v11 3/3] x86, apicv: add virtual interrupt delivery support
  2013-01-23 10:33             ` Gleb Natapov
@ 2013-01-23 10:52               ` Zhang, Yang Z
  0 siblings, 0 replies; 34+ messages in thread
From: Zhang, Yang Z @ 2013-01-23 10:52 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: kvm, Shan, Haitao, mtosatti, Zhang, Xiantao, Tian, Kevin

Gleb Natapov wrote on 2013-01-23:
> On Wed, Jan 23, 2013 at 12:45:39AM +0000, Zhang, Yang Z wrote:
>>> We are getting close so please test with userspace irq chip too.
>> Thanks for your suggestion to test with userspace irqchip. I found some
>> issues and will modify the logic: As we known, APICv deponds on TPR
>> shadow. But TPR shadow is per VM(it will
> be disabled when VM uses userspace irq chip), this means APICv also is per VM.
> But in current implementation, we use the global variable enable_apicv_reg to
> check whether APICv is used by target vcpu. This is wrong. Instead, it should to
> read VMCS to see whether the bit is set or not.
>> 
> No, the apicv and TPR shadow are global for all VMs with irq chip in
> kernel and VMs without irq chip in kernel skip APIC that code entirely.
> If there is generic code that should behave differently with or without
> in-kernel irq chip make the check for the condition there.
Yes, check irqchip_in_kernel is enough. No need to check the bits in vmcs. Like this:

static int vmx_vcpu_has_apicv(struct kvm *kvm)
{
        return enable_apicv_reg_vid && irqchip_in_kernel(kvm);
}

Best regards,
Yang


^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2013-01-23 10:52 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-01-16 10:21 [PATCH v11 0/3] x86, apicv: Add APIC virtualization support Yang Zhang
2013-01-16 10:21 ` [PATCH v11 1/3] x86, apicv: add APICv register " Yang Zhang
2013-01-16 10:21 ` [PATCH v11 2/3] x86, apicv: add virtual x2apic support Yang Zhang
2013-01-17 13:22   ` Gleb Natapov
2013-01-18  1:49     ` Zhang, Yang Z
2013-01-21 19:59   ` Marcelo Tosatti
2013-01-21 20:21     ` Gleb Natapov
2013-01-21 21:21       ` Marcelo Tosatti
2013-01-21 21:34         ` Gleb Natapov
2013-01-21 22:16           ` Marcelo Tosatti
2013-01-22  0:33             ` Marcelo Tosatti
2013-01-22  3:37               ` Zhang, Yang Z
2013-01-22  9:45               ` Gleb Natapov
2013-01-22  9:42             ` Gleb Natapov
2013-01-22 12:21     ` Zhang, Yang Z
2013-01-22 15:55       ` Gleb Natapov
2013-01-22 23:13         ` Marcelo Tosatti
2013-01-23  0:35           ` Zhang, Yang Z
2013-01-23 10:29           ` Gleb Natapov
2013-01-16 10:21 ` [PATCH v11 3/3] x86, apicv: add virtual interrupt delivery support Yang Zhang
2013-01-17  1:26   ` Zhang, Yang Z
2013-01-20 12:51     ` Gleb Natapov
2013-01-21  0:49       ` Zhang, Yang Z
2013-01-21  5:03         ` Gleb Natapov
2013-01-21  7:18           ` Zhang, Yang Z
2013-01-21 21:08           ` Marcelo Tosatti
2013-01-23  0:45           ` Zhang, Yang Z
2013-01-23 10:33             ` Gleb Natapov
2013-01-23 10:52               ` Zhang, Yang Z
2013-01-21 21:05   ` Marcelo Tosatti
2013-01-21 21:18     ` Gleb Natapov
2013-01-21 21:54       ` Marcelo Tosatti
2013-01-22  3:38         ` Zhang, Yang Z
2013-01-16 10:27 ` [PATCH v11 0/3] x86, apicv: Add APIC virtualization support Gleb Natapov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.