All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/2] KVM: VMX: Add Posted Interrupt supporting
@ 2013-02-19 13:39 Yang Zhang
  2013-02-19 13:39 ` [PATCH v3 1/2] KVM: VMX: enable acknowledge interupt on vmexit Yang Zhang
  2013-02-19 13:39 ` [PATCH v3 2/2] KVM: VMX: Add Posted Interrupt supporting Yang Zhang
  0 siblings, 2 replies; 16+ messages in thread
From: Yang Zhang @ 2013-02-19 13:39 UTC (permalink / raw)
  To: kvm; +Cc: gleb, mtosatti, Yang Zhang

From: Yang Zhang <yang.z.zhang@Intel.com>

The two patches are adding the Posted Interrupt supporting to KVM:
The first patch enables the feature 'acknowledge interrupt on vmexit'.Since
it is required by Posted interrupt, we need to enable it firstly.

And the second patch is adding the posted interrupt supporting.

Please see the comments in the two patch to get more details.

Changes from v2 to v3:
* Add CONFIG_HAVE_KVM check when calling BUILD_INTERRUPT for posted interrupt.
* Rename send_notification_event() to deliver_posted_interrupt().
* Modify the algorithm of calculating interrupt coalescence: interrupt will be 
  considered as delivered only when there is no previous interrupt pending in
  both irr and pir.
* Remove using new request to sync pir to irr, use KVM_REQ_EVENT for instead. 
* Remove the optimization of checking outsanding notification bit after sending 
  posted interrupt to remote cpu.
* Embed pi_desc inside struct vmx instead allocating it dynamically.
* Rebase on top of KVM upstream.

Yang Zhang (2):
  KVM: VMX: enable acknowledge interupt on vmexit
  KVM: VMX: Add Posted Interrupt supporting

 arch/x86/include/asm/entry_arch.h  |    4 +
 arch/x86/include/asm/hw_irq.h      |    1 +
 arch/x86/include/asm/irq_vectors.h |    5 +
 arch/x86/include/asm/kvm_host.h    |    4 +
 arch/x86/include/asm/vmx.h         |    4 +
 arch/x86/kernel/entry_64.S         |    5 +
 arch/x86/kernel/irq.c              |   20 +++
 arch/x86/kernel/irqinit.c          |    4 +
 arch/x86/kvm/lapic.c               |   19 +++-
 arch/x86/kvm/lapic.h               |    1 +
 arch/x86/kvm/svm.c                 |   19 +++
 arch/x86/kvm/vmx.c                 |  226 ++++++++++++++++++++++++++++++++----
 arch/x86/kvm/x86.c                 |    5 +-
 13 files changed, 289 insertions(+), 28 deletions(-)


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v3 1/2] KVM: VMX: enable acknowledge interupt on vmexit
  2013-02-19 13:39 [PATCH v3 0/2] KVM: VMX: Add Posted Interrupt supporting Yang Zhang
@ 2013-02-19 13:39 ` Yang Zhang
  2013-02-19 17:35   ` Avi Kivity
  2013-02-19 13:39 ` [PATCH v3 2/2] KVM: VMX: Add Posted Interrupt supporting Yang Zhang
  1 sibling, 1 reply; 16+ messages in thread
From: Yang Zhang @ 2013-02-19 13:39 UTC (permalink / raw)
  To: kvm; +Cc: gleb, mtosatti, Yang Zhang

From: Yang Zhang <yang.z.zhang@Intel.com>

The "acknowledge interrupt on exit" feature controls processor behavior
for external interrupt acknowledgement. When this control is set, the
processor acknowledges the interrupt controller to acquire the
interrupt vector on VM exit.

After enabling this feature, an interrupt which arrived when target cpu is
running in vmx non-root mode will be handled by vmx handler instead of handler
in idt. Currently, vmx handler only fakes an interrupt stack and jump to idt
table to let real handler to handle it. Further, we will recognize the interrupt
and only delivery the interrupt which not belong to current vcpu through idt table.
The interrupt which belonged to current vcpu will be handled inside vmx handler.
This will reduce the interrupt handle cost of KVM.

Also, interrupt enable logic is changed if this feature is turnning on:
Before this patch, hypervior call local_irq_enable() to enable it directly.
Now IF bit is set on interrupt stack frame, and will be enabled on a return from
interrupt handler if exterrupt interrupt exists. If no external interrupt, still
call local_irq_enable() to enable it.

Refer to Intel SDM volum 3, chapter 33.2.

Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
---
 arch/x86/include/asm/kvm_host.h |    1 +
 arch/x86/kvm/svm.c              |    6 +++
 arch/x86/kvm/vmx.c              |   69 ++++++++++++++++++++++++++++++++++++--
 arch/x86/kvm/x86.c              |    4 ++-
 4 files changed, 75 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 635a74d..b8388e9 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -730,6 +730,7 @@ struct kvm_x86_ops {
 	int (*check_intercept)(struct kvm_vcpu *vcpu,
 			       struct x86_instruction_info *info,
 			       enum x86_intercept_stage stage);
+	void (*handle_external_intr)(struct kvm_vcpu *vcpu);
 };
 
 struct kvm_arch_async_pf {
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index e1b1ce2..a7d60d7 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -4247,6 +4247,11 @@ out:
 	return ret;
 }
 
+static void svm_handle_external_intr(struct kvm_vcpu *vcpu)
+{
+	local_irq_enable();
+}
+
 static struct kvm_x86_ops svm_x86_ops = {
 	.cpu_has_kvm_support = has_svm,
 	.disabled_by_bios = is_disabled,
@@ -4342,6 +4347,7 @@ static struct kvm_x86_ops svm_x86_ops = {
 	.set_tdp_cr3 = set_tdp_cr3,
 
 	.check_intercept = svm_check_intercept,
+	.handle_external_intr = svm_handle_external_intr,
 };
 
 static int __init svm_init(void)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 6667042..436b134 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -377,6 +377,7 @@ struct vcpu_vmx {
 	struct shared_msr_entry *guest_msrs;
 	int                   nmsrs;
 	int                   save_nmsrs;
+	unsigned long	      host_idt_base;
 #ifdef CONFIG_X86_64
 	u64 		      msr_host_kernel_gs_base;
 	u64 		      msr_guest_kernel_gs_base;
@@ -2605,7 +2606,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
 #ifdef CONFIG_X86_64
 	min |= VM_EXIT_HOST_ADDR_SPACE_SIZE;
 #endif
-	opt = VM_EXIT_SAVE_IA32_PAT | VM_EXIT_LOAD_IA32_PAT;
+	opt = VM_EXIT_SAVE_IA32_PAT | VM_EXIT_LOAD_IA32_PAT |
+		VM_EXIT_ACK_INTR_ON_EXIT;
 	if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_EXIT_CTLS,
 				&_vmexit_control) < 0)
 		return -EIO;
@@ -3875,7 +3877,7 @@ static void vmx_disable_intercept_msr_write_x2apic(u32 msr)
  * Note that host-state that does change is set elsewhere. E.g., host-state
  * that is set differently for each CPU is set in vmx_vcpu_load(), not here.
  */
-static void vmx_set_constant_host_state(void)
+static void vmx_set_constant_host_state(struct vcpu_vmx *vmx)
 {
 	u32 low32, high32;
 	unsigned long tmpl;
@@ -3903,6 +3905,7 @@ static void vmx_set_constant_host_state(void)
 
 	native_store_idt(&dt);
 	vmcs_writel(HOST_IDTR_BASE, dt.address);   /* 22.2.4 */
+	vmx->host_idt_base = dt.address;
 
 	vmcs_writel(HOST_RIP, vmx_return); /* 22.2.5 */
 
@@ -4035,7 +4038,7 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
 
 	vmcs_write16(HOST_FS_SELECTOR, 0);            /* 22.2.4 */
 	vmcs_write16(HOST_GS_SELECTOR, 0);            /* 22.2.4 */
-	vmx_set_constant_host_state();
+	vmx_set_constant_host_state(vmx);
 #ifdef CONFIG_X86_64
 	rdmsrl(MSR_FS_BASE, a);
 	vmcs_writel(HOST_FS_BASE, a); /* 22.2.4 */
@@ -6346,6 +6349,63 @@ static void vmx_complete_atomic_exit(struct vcpu_vmx *vmx)
 	}
 }
 
+static void vmx_handle_external_intr(struct kvm_vcpu *vcpu)
+{
+	u32 exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
+
+	/*
+	 * If external interrupt exists, IF bit is set in rflags/eflags on the
+	 * interrupt stack frame, and interrupt will be enabled on a return
+	 * from interrupt handler.
+	 */
+	if ((exit_intr_info & (INTR_INFO_VALID_MASK | INTR_INFO_INTR_TYPE_MASK))
+			== (INTR_INFO_VALID_MASK | INTR_TYPE_EXT_INTR)) {
+		unsigned int vector;
+		unsigned long entry;
+		gate_desc *desc;
+		struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+		vector =  exit_intr_info & INTR_INFO_VECTOR_MASK;
+#ifdef CONFIG_X86_64
+		desc = (void *)vmx->host_idt_base + vector * 16;
+#else
+		desc = (void *)vmx->host_idt_base + vector * 8;
+#endif
+
+		entry = gate_offset(*desc);
+		asm(
+			"mov %0, %%" _ASM_DX " \n\t"
+#ifdef CONFIG_X86_64
+			"mov %%" _ASM_SP ", %%" _ASM_BX " \n\t"
+			"and $0xfffffffffffffff0, %%" _ASM_SP " \n\t"
+			"mov %%ss, %%" _ASM_AX " \n\t"
+			"push %%" _ASM_AX " \n\t"
+			"push %%" _ASM_BX " \n\t"
+#endif
+			"pushf \n\t"
+			"pop %%" _ASM_AX " \n\t"
+			"or $0x200, %%" _ASM_AX " \n\t"
+			"push %%" _ASM_AX " \n\t"
+			"mov %%cs, %%" _ASM_AX " \n\t"
+			"push %%" _ASM_AX " \n\t"
+			"push intr_return \n\t"
+			"jmp *%% " _ASM_DX " \n\t"
+			"1: \n\t"
+			".pushsection .rodata \n\t"
+			".global intr_return \n\t"
+			"intr_return: " _ASM_PTR " 1b \n\t"
+			".popsection \n\t"
+			: : "m"(entry) :
+#ifdef CONFIG_X86_64
+			"rax", "rbx", "rdx"
+#else
+			"eax", "edx"
+#endif
+			);
+	} else
+		local_irq_enable();
+}
+
 static void vmx_recover_nmi_blocking(struct vcpu_vmx *vmx)
 {
 	u32 exit_intr_info;
@@ -7016,7 +7076,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
 	 * Other fields are different per CPU, and will be set later when
 	 * vmx_vcpu_load() is called, and when vmx_save_host_state() is called.
 	 */
-	vmx_set_constant_host_state();
+	vmx_set_constant_host_state(vmx);
 
 	/*
 	 * HOST_RSP is normally set correctly in vmx_vcpu_run() just before
@@ -7618,6 +7678,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
 	.set_tdp_cr3 = vmx_set_cr3,
 
 	.check_intercept = vmx_check_intercept,
+	.handle_external_intr = vmx_handle_external_intr,
 };
 
 static int __init vmx_init(void)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3c5bb6f..f1fa37e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5784,7 +5784,9 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 
 	vcpu->mode = OUTSIDE_GUEST_MODE;
 	smp_wmb();
-	local_irq_enable();
+
+	/* Interrupt is enabled by handle_external_intr() */
+	kvm_x86_ops->handle_external_intr(vcpu);
 
 	++vcpu->stat.exits;
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 2/2] KVM: VMX: Add Posted Interrupt supporting
  2013-02-19 13:39 [PATCH v3 0/2] KVM: VMX: Add Posted Interrupt supporting Yang Zhang
  2013-02-19 13:39 ` [PATCH v3 1/2] KVM: VMX: enable acknowledge interupt on vmexit Yang Zhang
@ 2013-02-19 13:39 ` Yang Zhang
  2013-02-21  6:04   ` Zhang, Yang Z
  1 sibling, 1 reply; 16+ messages in thread
From: Yang Zhang @ 2013-02-19 13:39 UTC (permalink / raw)
  To: kvm; +Cc: gleb, mtosatti, Yang Zhang

From: Yang Zhang <yang.z.zhang@Intel.com>

Posted Interrupt allows APIC interrupts to inject into guest directly
without any vmexit.

- When delivering a interrupt to guest, if target vcpu is running,
  update Posted-interrupt requests bitmap and send a notification event
  to the vcpu. Then the vcpu will handle this interrupt automatically,
  without any software involvemnt.

- If target vcpu is not running or there already a notification event
  pending in the vcpu, do nothing. The interrupt will be handled by
  next vm entry

Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
---
 arch/x86/include/asm/entry_arch.h  |    4 +
 arch/x86/include/asm/hw_irq.h      |    1 +
 arch/x86/include/asm/irq_vectors.h |    5 +
 arch/x86/include/asm/kvm_host.h    |    3 +
 arch/x86/include/asm/vmx.h         |    4 +
 arch/x86/kernel/entry_64.S         |    5 +
 arch/x86/kernel/irq.c              |   20 +++++
 arch/x86/kernel/irqinit.c          |    4 +
 arch/x86/kvm/lapic.c               |   19 ++++-
 arch/x86/kvm/lapic.h               |    1 +
 arch/x86/kvm/svm.c                 |   13 +++
 arch/x86/kvm/vmx.c                 |  157 +++++++++++++++++++++++++++++++-----
 arch/x86/kvm/x86.c                 |    1 +
 13 files changed, 214 insertions(+), 23 deletions(-)

diff --git a/arch/x86/include/asm/entry_arch.h b/arch/x86/include/asm/entry_arch.h
index 40afa00..9bd4eca 100644
--- a/arch/x86/include/asm/entry_arch.h
+++ b/arch/x86/include/asm/entry_arch.h
@@ -19,6 +19,10 @@ BUILD_INTERRUPT(reboot_interrupt,REBOOT_VECTOR)
 
 BUILD_INTERRUPT(x86_platform_ipi, X86_PLATFORM_IPI_VECTOR)
 
+#ifdef CONFIG_HAVE_KVM
+BUILD_INTERRUPT(kvm_posted_intr_ipi, POSTED_INTR_VECTOR)
+#endif
+
 /*
  * every pentium local APIC has two 'local interrupts', with a
  * soft-definable vector attached to both interrupts, one of
diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index eb92a6e..cebef02 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -28,6 +28,7 @@
 /* Interrupt handlers registered during init_IRQ */
 extern void apic_timer_interrupt(void);
 extern void x86_platform_ipi(void);
+extern void kvm_posted_intr_ipi(void);
 extern void error_interrupt(void);
 extern void irq_work_interrupt(void);
 
diff --git a/arch/x86/include/asm/irq_vectors.h b/arch/x86/include/asm/irq_vectors.h
index 1508e51..774dc9f 100644
--- a/arch/x86/include/asm/irq_vectors.h
+++ b/arch/x86/include/asm/irq_vectors.h
@@ -102,6 +102,11 @@
  */
 #define X86_PLATFORM_IPI_VECTOR		0xf7
 
+/* Vector for KVM to deliver posted interrupt IPI */
+#ifdef CONFIG_HAVE_KVM
+#define POSTED_INTR_VECTOR		0xf2
+#endif
+
 /*
  * IRQ work vector:
  */
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b8388e9..79da55e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -704,6 +704,9 @@ struct kvm_x86_ops {
 	void (*hwapic_isr_update)(struct kvm *kvm, int isr);
 	void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
 	void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
+	bool (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector,
+					int *result, bool *delivered);
+	void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
 	int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
 	int (*get_tdp_level)(void);
 	u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 5c9dbad..ce8ac80 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -158,6 +158,7 @@
 #define PIN_BASED_EXT_INTR_MASK                 0x00000001
 #define PIN_BASED_NMI_EXITING                   0x00000008
 #define PIN_BASED_VIRTUAL_NMIS                  0x00000020
+#define PIN_BASED_POSTED_INTR                   0x00000080
 
 #define VM_EXIT_SAVE_DEBUG_CONTROLS             0x00000002
 #define VM_EXIT_HOST_ADDR_SPACE_SIZE            0x00000200
@@ -180,6 +181,7 @@
 /* VMCS Encodings */
 enum vmcs_field {
 	VIRTUAL_PROCESSOR_ID            = 0x00000000,
+	POSTED_INTR_NV                  = 0x00000002,
 	GUEST_ES_SELECTOR               = 0x00000800,
 	GUEST_CS_SELECTOR               = 0x00000802,
 	GUEST_SS_SELECTOR               = 0x00000804,
@@ -214,6 +216,8 @@ enum vmcs_field {
 	VIRTUAL_APIC_PAGE_ADDR_HIGH     = 0x00002013,
 	APIC_ACCESS_ADDR		= 0x00002014,
 	APIC_ACCESS_ADDR_HIGH		= 0x00002015,
+	POSTED_INTR_DESC_ADDR           = 0x00002016,
+	POSTED_INTR_DESC_ADDR_HIGH      = 0x00002017,
 	EPT_POINTER                     = 0x0000201a,
 	EPT_POINTER_HIGH                = 0x0000201b,
 	EOI_EXIT_BITMAP0                = 0x0000201c,
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 70641af..b409846 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1177,6 +1177,11 @@ apicinterrupt LOCAL_TIMER_VECTOR \
 apicinterrupt X86_PLATFORM_IPI_VECTOR \
 	x86_platform_ipi smp_x86_platform_ipi
 
+#ifdef CONFIG_HAVE_KVM
+apicinterrupt POSTED_INTR_VECTOR \
+	kvm_posted_intr_ipi smp_posted_intr_ipi
+#endif
+
 apicinterrupt THRESHOLD_APIC_VECTOR \
 	threshold_interrupt smp_threshold_interrupt
 apicinterrupt THERMAL_APIC_VECTOR \
diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index e4595f1..da74d65 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -228,6 +228,26 @@ void smp_x86_platform_ipi(struct pt_regs *regs)
 	set_irq_regs(old_regs);
 }
 
+#ifdef CONFIG_HAVE_KVM
+/*
+ * Handler for POSTED_INTERRUPT_VECTOR.
+ */
+void smp_posted_intr_ipi(struct pt_regs *regs)
+{
+	struct pt_regs *old_regs = set_irq_regs(regs);
+
+	ack_APIC_irq();
+
+	irq_enter();
+
+	exit_idle();
+
+	irq_exit();
+
+	set_irq_regs(old_regs);
+}
+#endif
+
 EXPORT_SYMBOL_GPL(vector_used_by_percpu_irq);
 
 #ifdef CONFIG_HOTPLUG_CPU
diff --git a/arch/x86/kernel/irqinit.c b/arch/x86/kernel/irqinit.c
index 6e03b0d..2329a54 100644
--- a/arch/x86/kernel/irqinit.c
+++ b/arch/x86/kernel/irqinit.c
@@ -205,6 +205,10 @@ static void __init apic_intr_init(void)
 
 	/* IPI for X86 platform specific use */
 	alloc_intr_gate(X86_PLATFORM_IPI_VECTOR, x86_platform_ipi);
+#ifdef CONFIG_HAVE_KVM
+	/* IPI for KVM to deliver posted interrupt */
+	alloc_intr_gate(POSTED_INTR_VECTOR, kvm_posted_intr_ipi);
+#endif
 
 	/* IPI vectors for APIC spurious and error interrupts */
 	alloc_intr_gate(SPURIOUS_APIC_VECTOR, spurious_interrupt);
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 02b51dd..ebc32bb 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -357,6 +357,12 @@ static u8 count_vectors(void *bitmap)
 	return count;
 }
 
+int kvm_apic_test_irr(int vec, struct kvm_lapic *apic)
+{
+	return apic_test_vector(vec, apic->regs + APIC_IRR);
+}
+EXPORT_SYMBOL_GPL(kvm_apic_test_irr);
+
 static inline int apic_test_and_set_irr(int vec, struct kvm_lapic *apic)
 {
 	apic->irr_pending = true;
@@ -379,6 +385,7 @@ static inline int apic_find_highest_irr(struct kvm_lapic *apic)
 	if (!apic->irr_pending)
 		return -1;
 
+	kvm_x86_ops->sync_pir_to_irr(apic->vcpu);
 	result = apic_search_irr(apic);
 	ASSERT(result == -1 || result >= 16);
 
@@ -685,6 +692,7 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
 {
 	int result = 0;
 	struct kvm_vcpu *vcpu = apic->vcpu;
+	bool delivered = false;
 
 	switch (delivery_mode) {
 	case APIC_DM_LOWEST:
@@ -700,7 +708,10 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
 		} else
 			apic_clear_vector(vector, apic->regs + APIC_TMR);
 
-		result = !apic_test_and_set_irr(vector, apic);
+		if (!kvm_x86_ops->deliver_posted_interrupt(vcpu, vector,
+						&result, &delivered))
+			result = !apic_test_and_set_irr(vector, apic);
+
 		trace_kvm_apic_accept_irq(vcpu->vcpu_id, delivery_mode,
 					  trig_mode, vector, !result);
 		if (!result) {
@@ -710,8 +721,10 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
 			break;
 		}
 
-		kvm_make_request(KVM_REQ_EVENT, vcpu);
-		kvm_vcpu_kick(vcpu);
+		if (!delivered) {
+			kvm_make_request(KVM_REQ_EVENT, vcpu);
+			kvm_vcpu_kick(vcpu);
+		}
 		break;
 
 	case APIC_DM_REMRD:
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 1676d34..1a7016c 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -157,5 +157,6 @@ static inline u16 apic_logical_id(struct kvm_apic_map *map, u32 ldr)
 void kvm_calculate_eoi_exitmap(struct kvm_vcpu *vcpu,
 				struct kvm_lapic_irq *irq,
 				u64 *eoi_bitmap);
+int kvm_apic_test_irr(int vec, struct kvm_lapic *apic);
 
 #endif
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index a7d60d7..9e705e3 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3591,6 +3591,17 @@ static void svm_hwapic_isr_update(struct kvm *kvm, int isr)
 	return;
 }
 
+static void svm_sync_pir_to_irr(struct kvm_vcpu *vcpu)
+{
+	return;
+}
+
+static bool svm_deliver_posted_interrupt(struct kvm_vcpu *vcpu,
+		int vector, int *result, bool *delivered)
+{
+	return false;
+}
+
 static int svm_nmi_allowed(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
@@ -4319,6 +4330,8 @@ static struct kvm_x86_ops svm_x86_ops = {
 	.vm_has_apicv = svm_vm_has_apicv,
 	.load_eoi_exitmap = svm_load_eoi_exitmap,
 	.hwapic_isr_update = svm_hwapic_isr_update,
+	.sync_pir_to_irr = svm_sync_pir_to_irr,
+	.deliver_posted_interrupt = svm_deliver_posted_interrupt,
 
 	.set_tss_addr = svm_set_tss_addr,
 	.get_tdp_level = get_npt_level,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 436b134..2fdf537 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -84,7 +84,8 @@ module_param(vmm_exclusive, bool, S_IRUGO);
 static bool __read_mostly fasteoi = 1;
 module_param(fasteoi, bool, S_IRUGO);
 
-static bool __read_mostly enable_apicv_reg_vid;
+static bool __read_mostly enable_apicv = 1;
+module_param(enable_apicv, bool, S_IRUGO);
 
 /*
  * If nested=1, nested virtualization is supported, i.e., guests may use
@@ -365,6 +366,36 @@ struct nested_vmx {
 	struct page *apic_access_page;
 };
 
+#define POSTED_INTR_ON  0
+/* Posted-Interrupt Descriptor */
+struct pi_desc {
+	u32 pir[8];     /* Posted interrupt requested */
+	union {
+		struct {
+			u8  on:1,
+			    rsvd:7;
+		} control;
+		u32 rsvd[8];
+	} u;
+} __aligned(64);
+
+static bool pi_test_and_set_on(struct pi_desc *pi_desc)
+{
+	return test_and_set_bit(POSTED_INTR_ON,
+			(unsigned long *)&pi_desc->u.control);
+}
+
+static bool pi_test_and_clear_on(struct pi_desc *pi_desc)
+{
+	return test_and_clear_bit(POSTED_INTR_ON,
+			(unsigned long *)&pi_desc->u.control);
+}
+
+static int pi_test_and_set_pir(int vector, struct pi_desc *pi_desc)
+{
+	return test_and_set_bit(vector, (unsigned long *)pi_desc->pir);
+}
+
 struct vcpu_vmx {
 	struct kvm_vcpu       vcpu;
 	unsigned long         host_rsp;
@@ -429,6 +460,9 @@ struct vcpu_vmx {
 
 	bool rdtscp_enabled;
 
+	/* Posted interrupt descriptor */
+	struct pi_desc pi_desc;
+
 	/* Support for a guest hypervisor (nested VMX) */
 	struct nested_vmx nested;
 };
@@ -783,6 +817,18 @@ static inline bool cpu_has_vmx_virtual_intr_delivery(void)
 		SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY;
 }
 
+static inline bool cpu_has_vmx_posted_intr(void)
+{
+	return vmcs_config.pin_based_exec_ctrl & PIN_BASED_POSTED_INTR;
+}
+
+static inline bool cpu_has_vmx_apicv(void)
+{
+	return cpu_has_vmx_apic_register_virt() &&
+		cpu_has_vmx_virtual_intr_delivery() &&
+		cpu_has_vmx_posted_intr();
+}
+
 static inline bool cpu_has_vmx_flexpriority(void)
 {
 	return cpu_has_vmx_tpr_shadow() &&
@@ -2530,12 +2576,6 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
 	u32 _vmexit_control = 0;
 	u32 _vmentry_control = 0;
 
-	min = PIN_BASED_EXT_INTR_MASK | PIN_BASED_NMI_EXITING;
-	opt = PIN_BASED_VIRTUAL_NMIS;
-	if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_PINBASED_CTLS,
-				&_pin_based_exec_control) < 0)
-		return -EIO;
-
 	min = CPU_BASED_HLT_EXITING |
 #ifdef CONFIG_X86_64
 	      CPU_BASED_CR8_LOAD_EXITING |
@@ -2612,6 +2652,17 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
 				&_vmexit_control) < 0)
 		return -EIO;
 
+	min = PIN_BASED_EXT_INTR_MASK | PIN_BASED_NMI_EXITING;
+	opt = PIN_BASED_VIRTUAL_NMIS | PIN_BASED_POSTED_INTR;
+	if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_PINBASED_CTLS,
+				&_pin_based_exec_control) < 0)
+		return -EIO;
+
+	if (!(_cpu_based_2nd_exec_control &
+		SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY) ||
+		!(_vmexit_control & VM_EXIT_ACK_INTR_ON_EXIT))
+		_pin_based_exec_control &= ~PIN_BASED_POSTED_INTR;
+
 	min = 0;
 	opt = VM_ENTRY_LOAD_IA32_PAT;
 	if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_ENTRY_CTLS,
@@ -2790,11 +2841,10 @@ static __init int hardware_setup(void)
 	if (!cpu_has_vmx_ple())
 		ple_gap = 0;
 
-	if (!cpu_has_vmx_apic_register_virt() ||
-				!cpu_has_vmx_virtual_intr_delivery())
-		enable_apicv_reg_vid = 0;
+	if (!cpu_has_vmx_apicv())
+		enable_apicv = 0;
 
-	if (enable_apicv_reg_vid)
+	if (enable_apicv)
 		kvm_x86_ops->update_cr8_intercept = NULL;
 	else
 		kvm_x86_ops->hwapic_irr_update = NULL;
@@ -3871,6 +3921,62 @@ static void vmx_disable_intercept_msr_write_x2apic(u32 msr)
 			msr, MSR_TYPE_W);
 }
 
+static int vmx_vm_has_apicv(struct kvm *kvm)
+{
+	return enable_apicv && irqchip_in_kernel(kvm);
+}
+
+static bool vmx_deliver_posted_interrupt(struct kvm_vcpu *vcpu,
+		int vector, int *result, bool *delivered)
+{
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+	if (!vmx_vm_has_apicv(vcpu->kvm))
+		return false;
+
+	if (kvm_apic_test_irr(vector, vcpu->arch.apic))
+		goto out;
+	else {
+		*result = !pi_test_and_set_pir(vector, &vmx->pi_desc);
+		if (!*result)
+			goto out;
+	}
+
+	if (!pi_test_and_set_on(&vmx->pi_desc) &&
+			(vcpu->mode == IN_GUEST_MODE)) {
+		kvm_make_request(KVM_REQ_EVENT, vcpu);
+		apic->send_IPI_mask(get_cpu_mask(vcpu->cpu),
+				    POSTED_INTR_VECTOR);
+		*delivered = true;
+	}
+out:
+	return true;
+}
+
+static void vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
+	struct kvm_lapic *apic = vcpu->arch.apic;
+	unsigned int i, old, new, ret_val, irr_offset, pir_val;
+
+	if (!vmx_vm_has_apicv(vcpu->kvm) ||
+			!pi_test_and_clear_on(&vmx->pi_desc))
+		return;
+
+	for (i = 0; i <= 7; i++) {
+		pir_val = xchg(&vmx->pi_desc.pir[i], 0);
+		if (pir_val) {
+			irr_offset = APIC_IRR + i * 0x10;
+			do {
+				old = kvm_apic_get_reg(apic, irr_offset);
+				new = old | pir_val;
+				ret_val = cmpxchg((u32 *)(apic->regs +
+						irr_offset), old, new);
+			} while (unlikely(ret_val != old));
+		}
+	}
+}
+
 /*
  * Set up the vmcs's constant host-state fields, i.e., host-state fields that
  * will not change in the lifetime of the guest.
@@ -3931,6 +4037,15 @@ static void set_cr4_guest_host_mask(struct vcpu_vmx *vmx)
 	vmcs_writel(CR4_GUEST_HOST_MASK, ~vmx->vcpu.arch.cr4_guest_owned_bits);
 }
 
+static u32 vmx_pin_based_exec_ctrl(struct vcpu_vmx *vmx)
+{
+	u32 pin_based_exec_ctrl = vmcs_config.pin_based_exec_ctrl;
+
+	if (!vmx_vm_has_apicv(vmx->vcpu.kvm))
+		pin_based_exec_ctrl &= ~PIN_BASED_POSTED_INTR;
+	return pin_based_exec_ctrl;
+}
+
 static u32 vmx_exec_control(struct vcpu_vmx *vmx)
 {
 	u32 exec_control = vmcs_config.cpu_based_exec_ctrl;
@@ -3948,11 +4063,6 @@ static u32 vmx_exec_control(struct vcpu_vmx *vmx)
 	return exec_control;
 }
 
-static int vmx_vm_has_apicv(struct kvm *kvm)
-{
-	return enable_apicv_reg_vid && irqchip_in_kernel(kvm);
-}
-
 static u32 vmx_secondary_exec_control(struct vcpu_vmx *vmx)
 {
 	u32 exec_control = vmcs_config.cpu_based_2nd_exec_ctrl;
@@ -4008,8 +4118,7 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
 	vmcs_write64(VMCS_LINK_POINTER, -1ull); /* 22.3.1.5 */
 
 	/* Control */
-	vmcs_write32(PIN_BASED_VM_EXEC_CONTROL,
-		vmcs_config.pin_based_exec_ctrl);
+	vmcs_write32(PIN_BASED_VM_EXEC_CONTROL, vmx_pin_based_exec_ctrl(vmx));
 
 	vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, vmx_exec_control(vmx));
 
@@ -4018,13 +4127,16 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
 				vmx_secondary_exec_control(vmx));
 	}
 
-	if (enable_apicv_reg_vid) {
+	if (vmx_vm_has_apicv(vmx->vcpu.kvm)) {
 		vmcs_write64(EOI_EXIT_BITMAP0, 0);
 		vmcs_write64(EOI_EXIT_BITMAP1, 0);
 		vmcs_write64(EOI_EXIT_BITMAP2, 0);
 		vmcs_write64(EOI_EXIT_BITMAP3, 0);
 
 		vmcs_write16(GUEST_INTR_STATUS, 0);
+
+		vmcs_write64(POSTED_INTR_NV, POSTED_INTR_VECTOR);
+		vmcs_write64(POSTED_INTR_DESC_ADDR, __pa((&vmx->pi_desc)));
 	}
 
 	if (ple_gap) {
@@ -4174,6 +4286,9 @@ static int vmx_vcpu_reset(struct kvm_vcpu *vcpu)
 		vmcs_write64(APIC_ACCESS_ADDR,
 			     page_to_phys(vmx->vcpu.kvm->arch.apic_access_page));
 
+	if (vmx_vm_has_apicv(vcpu->kvm))
+		memset(&vmx->pi_desc, 0, sizeof(struct pi_desc));
+
 	if (vmx->vpid != 0)
 		vmcs_write16(VIRTUAL_PROCESSOR_ID, vmx->vpid);
 
@@ -7650,6 +7765,8 @@ static struct kvm_x86_ops vmx_x86_ops = {
 	.load_eoi_exitmap = vmx_load_eoi_exitmap,
 	.hwapic_irr_update = vmx_hwapic_irr_update,
 	.hwapic_isr_update = vmx_hwapic_isr_update,
+	.sync_pir_to_irr = vmx_sync_pir_to_irr,
+	.deliver_posted_interrupt = vmx_deliver_posted_interrupt,
 
 	.set_tss_addr = vmx_set_tss_addr,
 	.get_tdp_level = get_ept_level,
@@ -7753,7 +7870,7 @@ static int __init vmx_init(void)
 	memcpy(vmx_msr_bitmap_longmode_x2apic,
 			vmx_msr_bitmap_longmode, PAGE_SIZE);
 
-	if (enable_apicv_reg_vid) {
+	if (enable_apicv) {
 		for (msr = 0x800; msr <= 0x8ff; msr++)
 			vmx_disable_intercept_msr_read_x2apic(msr);
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f1fa37e..62f8c94 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2679,6 +2679,7 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 static int kvm_vcpu_ioctl_get_lapic(struct kvm_vcpu *vcpu,
 				    struct kvm_lapic_state *s)
 {
+	kvm_x86_ops->sync_pir_to_irr(vcpu);
 	memcpy(s->regs, vcpu->arch.apic->regs, sizeof *s);
 
 	return 0;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 1/2] KVM: VMX: enable acknowledge interupt on vmexit
  2013-02-19 13:39 ` [PATCH v3 1/2] KVM: VMX: enable acknowledge interupt on vmexit Yang Zhang
@ 2013-02-19 17:35   ` Avi Kivity
  2013-02-20  2:46     ` Zhang, Yang Z
  0 siblings, 1 reply; 16+ messages in thread
From: Avi Kivity @ 2013-02-19 17:35 UTC (permalink / raw)
  To: Yang Zhang; +Cc: kvm, Gleb Natapov, Marcelo Tosatti

On Tue, Feb 19, 2013 at 3:39 PM, Yang Zhang <yang.z.zhang@intel.com> wrote:
> From: Yang Zhang <yang.z.zhang@Intel.com>
>
> The "acknowledge interrupt on exit" feature controls processor behavior
> for external interrupt acknowledgement. When this control is set, the
> processor acknowledges the interrupt controller to acquire the
> interrupt vector on VM exit.
>
> After enabling this feature, an interrupt which arrived when target cpu is
> running in vmx non-root mode will be handled by vmx handler instead of handler
> in idt. Currently, vmx handler only fakes an interrupt stack and jump to idt
> table to let real handler to handle it. Further, we will recognize the interrupt
> and only delivery the interrupt which not belong to current vcpu through idt table.
> The interrupt which belonged to current vcpu will be handled inside vmx handler.
> This will reduce the interrupt handle cost of KVM.
>
> Also, interrupt enable logic is changed if this feature is turnning on:
> Before this patch, hypervior call local_irq_enable() to enable it directly.
> Now IF bit is set on interrupt stack frame, and will be enabled on a return from
> interrupt handler if exterrupt interrupt exists. If no external interrupt, still
> call local_irq_enable() to enable it.
>
> Refer to Intel SDM volum 3, chapter 33.2.
>

> +static void vmx_handle_external_intr(struct kvm_vcpu *vcpu)
> +{
> +       u32 exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
> +
> +       /*
> +        * If external interrupt exists, IF bit is set in rflags/eflags on the
> +        * interrupt stack frame, and interrupt will be enabled on a return
> +        * from interrupt handler.
> +        */
> +       if ((exit_intr_info & (INTR_INFO_VALID_MASK | INTR_INFO_INTR_TYPE_MASK))
> +                       == (INTR_INFO_VALID_MASK | INTR_TYPE_EXT_INTR)) {
> +               unsigned int vector;
> +               unsigned long entry;
> +               gate_desc *desc;
> +               struct vcpu_vmx *vmx = to_vmx(vcpu);
> +
> +               vector =  exit_intr_info & INTR_INFO_VECTOR_MASK;
> +#ifdef CONFIG_X86_64
> +               desc = (void *)vmx->host_idt_base + vector * 16;
> +#else
> +               desc = (void *)vmx->host_idt_base + vector * 8;
> +#endif
> +
> +               entry = gate_offset(*desc);
> +               asm(
> +                       "mov %0, %%" _ASM_DX " \n\t"
> +#ifdef CONFIG_X86_64
> +                       "mov %%" _ASM_SP ", %%" _ASM_BX " \n\t"
> +                       "and $0xfffffffffffffff0, %%" _ASM_SP " \n\t"
> +                       "mov %%ss, %%" _ASM_AX " \n\t"
> +                       "push %%" _ASM_AX " \n\t"
> +                       "push %%" _ASM_BX " \n\t"
> +#endif

Are we sure no interrupts are using the IST feature?  I guess it's unlikely.

> +                       "pushf \n\t"
> +                       "pop %%" _ASM_AX " \n\t"
> +                       "or $0x200, %%" _ASM_AX " \n\t"
> +                       "push %%" _ASM_AX " \n\t"

Can simplify to pushf; orl $0x200, %%rsp.

> +                       "mov %%cs, %%" _ASM_AX " \n\t"
> +                       "push %%" _ASM_AX " \n\t"

push %%cs

> +                       "push intr_return \n\t"

push $1f.  Or even combine with the next instruction, and call %rdx.

> +                       "jmp *%% " _ASM_DX " \n\t"
> +                       "1: \n\t"
> +                       ".pushsection .rodata \n\t"
> +                       ".global intr_return \n\t"
> +                       "intr_return: " _ASM_PTR " 1b \n\t"
> +                       ".popsection \n\t"
> +                       : : "m"(entry) :
> +#ifdef CONFIG_X86_64
> +                       "rax", "rbx", "rdx"
> +#else
> +                       "eax", "edx"
> +#endif
> +                       );
> +       } else
> +               local_irq_enable();
> +}
> +

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH v3 1/2] KVM: VMX: enable acknowledge interupt on vmexit
  2013-02-19 17:35   ` Avi Kivity
@ 2013-02-20  2:46     ` Zhang, Yang Z
  2013-02-20 10:10       ` Gleb Natapov
                         ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Zhang, Yang Z @ 2013-02-20  2:46 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, Gleb Natapov, Marcelo Tosatti

Avi Kivity wrote on 2013-02-20:
> On Tue, Feb 19, 2013 at 3:39 PM, Yang Zhang <yang.z.zhang@intel.com> wrote:
>> From: Yang Zhang <yang.z.zhang@Intel.com>
>> 
>> The "acknowledge interrupt on exit" feature controls processor behavior
>> for external interrupt acknowledgement. When this control is set, the
>> processor acknowledges the interrupt controller to acquire the
>> interrupt vector on VM exit.
>> 
>> After enabling this feature, an interrupt which arrived when target cpu
>> is running in vmx non-root mode will be handled by vmx handler instead
>> of handler in idt. Currently, vmx handler only fakes an interrupt stack
>> and jump to idt table to let real handler to handle it. Further, we
>> will recognize the interrupt and only delivery the interrupt which not
>> belong to current vcpu through idt table. The interrupt which belonged
>> to current vcpu will be handled inside vmx handler. This will reduce
>> the interrupt handle cost of KVM.
>> 
>> Also, interrupt enable logic is changed if this feature is turnning on:
>> Before this patch, hypervior call local_irq_enable() to enable it directly.
>> Now IF bit is set on interrupt stack frame, and will be enabled on a return from
>> interrupt handler if exterrupt interrupt exists. If no external interrupt, still
>> call local_irq_enable() to enable it.
>> 
>> Refer to Intel SDM volum 3, chapter 33.2.
>> 
>> 
>> +static void vmx_handle_external_intr(struct kvm_vcpu *vcpu) +{ +      
>> u32 exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO); + +       /* +    
>>    * If external interrupt exists, IF bit is set in rflags/eflags on
>> the +        * interrupt stack frame, and interrupt will be enabled on
>> a return +        * from interrupt handler. +        */ +       if
>> ((exit_intr_info & (INTR_INFO_VALID_MASK | INTR_INFO_INTR_TYPE_MASK)) +
>>                       == (INTR_INFO_VALID_MASK | INTR_TYPE_EXT_INTR)) {
>> +               unsigned int vector; +               unsigned long
>> entry; +               gate_desc *desc; +               struct vcpu_vmx
>> *vmx = to_vmx(vcpu); + +               vector =  exit_intr_info &
>> INTR_INFO_VECTOR_MASK; +#ifdef CONFIG_X86_64 +               desc =
>> (void *)vmx->host_idt_base + vector * 16; +#else +               desc =
>> (void *)vmx->host_idt_base + vector * 8; +#endif + +              
>> entry = gate_offset(*desc); +               asm( +                     
>>  "mov %0, %%" _ASM_DX " \n\t" +#ifdef CONFIG_X86_64 +                  
>>     "mov %%" _ASM_SP ", %%" _ASM_BX " \n\t" +                      
>> "and $0xfffffffffffffff0, %%" _ASM_SP " \n\t" +                      
>> "mov %%ss, %%" _ASM_AX " \n\t" +                       "push %%"
>> _ASM_AX " \n\t" +                       "push %%" _ASM_BX " \n\t"
>> +#endif
> 
> Are we sure no interrupts are using the IST feature?  I guess it's unlikely.
Linux uses IST for NMI, stack fault, machine-check, double fault and debug interrupt . No external interrupt will use it.
This feature is only for external interrupt. So we don't need to check it here.

> 
>> +                       "pushf \n\t"
>> +                       "pop %%" _ASM_AX " \n\t"
>> +                       "or $0x200, %%" _ASM_AX " \n\t"
>> +                       "push %%" _ASM_AX " \n\t"
> 
> Can simplify to pushf; orl $0x200, %%rsp.
Sure.

>> +                       "mov %%cs, %%" _ASM_AX " \n\t"
>> +                       "push %%" _ASM_AX " \n\t"
> 
> push %%cs
"push %%cs" is invalid in x86_64.

>> +                       "push intr_return \n\t"
> 
> push $1f.  Or even combine with the next instruction, and call %rdx.
Which is faster? jmp or call?

>> +                       "jmp *%% " _ASM_DX " \n\t"
>> +                       "1: \n\t"
>> +                       ".pushsection .rodata \n\t"
>> +                       ".global intr_return \n\t"
>> +                       "intr_return: " _ASM_PTR " 1b \n\t"
>> +                       ".popsection \n\t"
>> +                       : : "m"(entry) :
>> +#ifdef CONFIG_X86_64
>> +                       "rax", "rbx", "rdx"
>> +#else
>> +                       "eax", "edx"
>> +#endif
>> +                       );
>> +       } else
>> +               local_irq_enable();
>> +}
>> +


Best regards,
Yang



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 1/2] KVM: VMX: enable acknowledge interupt on vmexit
  2013-02-20  2:46     ` Zhang, Yang Z
@ 2013-02-20 10:10       ` Gleb Natapov
  2013-02-20 11:07         ` Zhang, Yang Z
  2013-02-20 12:35       ` Avi Kivity
       [not found]       ` <CAG7+5M1c7mtENHao+1yFCQkQus78HXK+QQBi3vwE6chAr_ZxVA@mail.gmail.com>
  2 siblings, 1 reply; 16+ messages in thread
From: Gleb Natapov @ 2013-02-20 10:10 UTC (permalink / raw)
  To: Zhang, Yang Z; +Cc: Avi Kivity, kvm, Marcelo Tosatti

On Wed, Feb 20, 2013 at 02:46:05AM +0000, Zhang, Yang Z wrote:
> Avi Kivity wrote on 2013-02-20:
> > On Tue, Feb 19, 2013 at 3:39 PM, Yang Zhang <yang.z.zhang@intel.com> wrote:
> >> From: Yang Zhang <yang.z.zhang@Intel.com>
> >> 
> >> The "acknowledge interrupt on exit" feature controls processor behavior
> >> for external interrupt acknowledgement. When this control is set, the
> >> processor acknowledges the interrupt controller to acquire the
> >> interrupt vector on VM exit.
> >> 
> >> After enabling this feature, an interrupt which arrived when target cpu
> >> is running in vmx non-root mode will be handled by vmx handler instead
> >> of handler in idt. Currently, vmx handler only fakes an interrupt stack
> >> and jump to idt table to let real handler to handle it. Further, we
> >> will recognize the interrupt and only delivery the interrupt which not
> >> belong to current vcpu through idt table. The interrupt which belonged
> >> to current vcpu will be handled inside vmx handler. This will reduce
> >> the interrupt handle cost of KVM.
> >> 
> >> Also, interrupt enable logic is changed if this feature is turnning on:
> >> Before this patch, hypervior call local_irq_enable() to enable it directly.
> >> Now IF bit is set on interrupt stack frame, and will be enabled on a return from
> >> interrupt handler if exterrupt interrupt exists. If no external interrupt, still
> >> call local_irq_enable() to enable it.
> >> 
> >> Refer to Intel SDM volum 3, chapter 33.2.
> >> 
> >> 
> >> +static void vmx_handle_external_intr(struct kvm_vcpu *vcpu) +{ +      
> >> u32 exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO); + +       /* +    
> >>    * If external interrupt exists, IF bit is set in rflags/eflags on
> >> the +        * interrupt stack frame, and interrupt will be enabled on
> >> a return +        * from interrupt handler. +        */ +       if
> >> ((exit_intr_info & (INTR_INFO_VALID_MASK | INTR_INFO_INTR_TYPE_MASK)) +
> >>                       == (INTR_INFO_VALID_MASK | INTR_TYPE_EXT_INTR)) {
> >> +               unsigned int vector; +               unsigned long
> >> entry; +               gate_desc *desc; +               struct vcpu_vmx
> >> *vmx = to_vmx(vcpu); + +               vector =  exit_intr_info &
> >> INTR_INFO_VECTOR_MASK; +#ifdef CONFIG_X86_64 +               desc =
> >> (void *)vmx->host_idt_base + vector * 16; +#else +               desc =
> >> (void *)vmx->host_idt_base + vector * 8; +#endif + +              
> >> entry = gate_offset(*desc); +               asm( +                     
> >>  "mov %0, %%" _ASM_DX " \n\t" +#ifdef CONFIG_X86_64 +                  
> >>     "mov %%" _ASM_SP ", %%" _ASM_BX " \n\t" +                      
> >> "and $0xfffffffffffffff0, %%" _ASM_SP " \n\t" +                      
> >> "mov %%ss, %%" _ASM_AX " \n\t" +                       "push %%"
> >> _ASM_AX " \n\t" +                       "push %%" _ASM_BX " \n\t"
> >> +#endif
> > 
> > Are we sure no interrupts are using the IST feature?  I guess it's unlikely.
> Linux uses IST for NMI, stack fault, machine-check, double fault and debug interrupt . No external interrupt will use it.
> This feature is only for external interrupt. So we don't need to check it here.
> 
> > 
> >> +                       "pushf \n\t"
> >> +                       "pop %%" _ASM_AX " \n\t"
> >> +                       "or $0x200, %%" _ASM_AX " \n\t"
> >> +                       "push %%" _ASM_AX " \n\t"
> > 
> > Can simplify to pushf; orl $0x200, %%rsp.
> Sure.
> 
> >> +                       "mov %%cs, %%" _ASM_AX " \n\t"
> >> +                       "push %%" _ASM_AX " \n\t"
> > 
> > push %%cs
> "push %%cs" is invalid in x86_64.
> 
> >> +                       "push intr_return \n\t"
> > 
> > push $1f.  Or even combine with the next instruction, and call %rdx.
> Which is faster? jmp or call?
> 
Wrong question. You need to compare push+jmp with call. I do not see why
later will be slower.

> >> +                       "jmp *%% " _ASM_DX " \n\t"
> >> +                       "1: \n\t"
> >> +                       ".pushsection .rodata \n\t"
> >> +                       ".global intr_return \n\t"
> >> +                       "intr_return: " _ASM_PTR " 1b \n\t"
> >> +                       ".popsection \n\t"
> >> +                       : : "m"(entry) :
> >> +#ifdef CONFIG_X86_64
> >> +                       "rax", "rbx", "rdx"
> >> +#else
> >> +                       "eax", "edx"
> >> +#endif
> >> +                       );
> >> +       } else
> >> +               local_irq_enable();
> >> +}
> >> +
> 
> 
> Best regards,
> Yang
> 

--
			Gleb.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH v3 1/2] KVM: VMX: enable acknowledge interupt on vmexit
  2013-02-20 10:10       ` Gleb Natapov
@ 2013-02-20 11:07         ` Zhang, Yang Z
  0 siblings, 0 replies; 16+ messages in thread
From: Zhang, Yang Z @ 2013-02-20 11:07 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Avi Kivity, kvm, Marcelo Tosatti

Gleb Natapov wrote on 2013-02-20:
> On Wed, Feb 20, 2013 at 02:46:05AM +0000, Zhang, Yang Z wrote:
>> Avi Kivity wrote on 2013-02-20:
>>> On Tue, Feb 19, 2013 at 3:39 PM, Yang Zhang <yang.z.zhang@intel.com>
> wrote:
>>>> From: Yang Zhang <yang.z.zhang@Intel.com>
>>>> 
>>>> The "acknowledge interrupt on exit" feature controls processor behavior
>>>> for external interrupt acknowledgement. When this control is set, the
>>>> processor acknowledges the interrupt controller to acquire the
>>>> interrupt vector on VM exit.
>>>> 
>>>> After enabling this feature, an interrupt which arrived when target cpu
>>>> is running in vmx non-root mode will be handled by vmx handler instead
>>>> of handler in idt. Currently, vmx handler only fakes an interrupt stack
>>>> and jump to idt table to let real handler to handle it. Further, we
>>>> will recognize the interrupt and only delivery the interrupt which not
>>>> belong to current vcpu through idt table. The interrupt which belonged
>>>> to current vcpu will be handled inside vmx handler. This will reduce
>>>> the interrupt handle cost of KVM.
>>>> 
>>>> Also, interrupt enable logic is changed if this feature is turnning
>>>> on: Before this patch, hypervior call local_irq_enable() to enable it
>>>> directly. Now IF bit is set on interrupt stack frame, and will be
>>>> enabled on a return from interrupt handler if exterrupt interrupt
>>>> exists. If no external interrupt, still call local_irq_enable() to
>>>> enable it.
>>>> 
>>>> Refer to Intel SDM volum 3, chapter 33.2.
>>>> 
>>>> 
>>>> +static void vmx_handle_external_intr(struct kvm_vcpu *vcpu) +{ +
>>>> u32 exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO); + +       /* +
>>>>    * If external interrupt exists, IF bit is set in rflags/eflags on
>>>> the +        * interrupt stack frame, and interrupt will be enabled on
>>>> a return +        * from interrupt handler. +        */ +       if
>>>> ((exit_intr_info & (INTR_INFO_VALID_MASK |
> INTR_INFO_INTR_TYPE_MASK)) +
>>>>                       == (INTR_INFO_VALID_MASK |
> INTR_TYPE_EXT_INTR)) {
>>>> +               unsigned int vector; +               unsigned long
>>>> entry; +               gate_desc *desc; +               struct
>>>> vcpu_vmx *vmx = to_vmx(vcpu); + +               vector = 
>>>> exit_intr_info & INTR_INFO_VECTOR_MASK; +#ifdef CONFIG_X86_64 + desc
>>>> = (void *)vmx->host_idt_base + vector * 16; +#else +              
>>>> desc = (void *)vmx->host_idt_base + vector * 8; +#endif + + entry =
>>>> gate_offset(*desc); +               asm( +
>>>>  "mov %0, %%" _ASM_DX " \n\t" +#ifdef CONFIG_X86_64 +
>>>>     "mov %%" _ASM_SP ", %%" _ASM_BX " \n\t" +
>>>> "and $0xfffffffffffffff0, %%" _ASM_SP " \n\t" +
>>>> "mov %%ss, %%" _ASM_AX " \n\t" +                       "push %%"
>>>> _ASM_AX " \n\t" +                       "push %%" _ASM_BX " \n\t"
>>>> +#endif
>>> 
>>> Are we sure no interrupts are using the IST feature?  I guess it's unlikely.
>> Linux uses IST for NMI, stack fault, machine-check, double fault and
>> debug interrupt . No external interrupt will use it. This feature is
>> only for external interrupt. So we don't need to check it here.
>> 
>>> 
>>>> +                       "pushf \n\t"
>>>> +                       "pop %%" _ASM_AX " \n\t"
>>>> +                       "or $0x200, %%" _ASM_AX " \n\t"
>>>> +                       "push %%" _ASM_AX " \n\t"
>>> 
>>> Can simplify to pushf; orl $0x200, %%rsp.
>> Sure.
>> 
>>>> +                       "mov %%cs, %%" _ASM_AX " \n\t"
>>>> +                       "push %%" _ASM_AX " \n\t"
>>> 
>>> push %%cs
>> "push %%cs" is invalid in x86_64.
>> 
>>>> +                       "push intr_return \n\t"
>>> 
>>> push $1f.  Or even combine with the next instruction, and call %rdx.
>> Which is faster? jmp or call?
>> 
> Wrong question. You need to compare push+jmp with call. I do not see why
Sorry, I didn't express it clearly.  Yes, I want to compare push+jmp with call.

> later will be slower.
I think so. If push+jmp is not faster than call, I will use the latter.

>>>> +                       "jmp *%% " _ASM_DX " \n\t"
>>>> +                       "1: \n\t"
>>>> +                       ".pushsection .rodata \n\t"
>>>> +                       ".global intr_return \n\t"
>>>> +                       "intr_return: " _ASM_PTR " 1b \n\t"
>>>> +                       ".popsection \n\t"
>>>> +                       : : "m"(entry) :
>>>> +#ifdef CONFIG_X86_64
>>>> +                       "rax", "rbx", "rdx"
>>>> +#else
>>>> +                       "eax", "edx"
>>>> +#endif
>>>> +                       );
>>>> +       } else
>>>> +               local_irq_enable();
>>>> +}
>>>> +
>> 
>> 
>> Best regards,
>> Yang
>> 
> 
> --
> 			Gleb.


Best regards,
Yang


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 1/2] KVM: VMX: enable acknowledge interupt on vmexit
  2013-02-20  2:46     ` Zhang, Yang Z
  2013-02-20 10:10       ` Gleb Natapov
@ 2013-02-20 12:35       ` Avi Kivity
  2013-02-20 13:10         ` Zhang, Yang Z
       [not found]       ` <CAG7+5M1c7mtENHao+1yFCQkQus78HXK+QQBi3vwE6chAr_ZxVA@mail.gmail.com>
  2 siblings, 1 reply; 16+ messages in thread
From: Avi Kivity @ 2013-02-20 12:35 UTC (permalink / raw)
  To: Zhang, Yang Z; +Cc: kvm, Gleb Natapov, Marcelo Tosatti

On Wed, Feb 20, 2013 at 4:46 AM, Zhang, Yang Z <yang.z.zhang@intel.com> wrote:
>>>
>>> +static void vmx_handle_external_intr(struct kvm_vcpu *vcpu) +{ +
>>> u32 exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO); + +       /* +
>>>    * If external interrupt exists, IF bit is set in rflags/eflags on
>>> the +        * interrupt stack frame, and interrupt will be enabled on
>>> a return +        * from interrupt handler. +        */ +       if
>>> ((exit_intr_info & (INTR_INFO_VALID_MASK | INTR_INFO_INTR_TYPE_MASK)) +
>>>                       == (INTR_INFO_VALID_MASK | INTR_TYPE_EXT_INTR)) {
>>> +               unsigned int vector; +               unsigned long
>>> entry; +               gate_desc *desc; +               struct vcpu_vmx
>>> *vmx = to_vmx(vcpu); + +               vector =  exit_intr_info &
>>> INTR_INFO_VECTOR_MASK; +#ifdef CONFIG_X86_64 +               desc =
>>> (void *)vmx->host_idt_base + vector * 16; +#else +               desc =
>>> (void *)vmx->host_idt_base + vector * 8; +#endif + +
>>> entry = gate_offset(*desc); +               asm( +
>>>  "mov %0, %%" _ASM_DX " \n\t" +#ifdef CONFIG_X86_64 +
>>>     "mov %%" _ASM_SP ", %%" _ASM_BX " \n\t" +
>>> "and $0xfffffffffffffff0, %%" _ASM_SP " \n\t" +
>>> "mov %%ss, %%" _ASM_AX " \n\t" +                       "push %%"
>>> _ASM_AX " \n\t" +                       "push %%" _ASM_BX " \n\t"
>>> +#endif
>>
>> Are we sure no interrupts are using the IST feature?  I guess it's unlikely.
> Linux uses IST for NMI, stack fault, machine-check, double fault and debug interrupt . No external interrupt will use it.
> This feature is only for external interrupt. So we don't need to check it here.

Ok, thanks for checking.

>
>>
>>> +                       "pushf \n\t"
>>> +                       "pop %%" _ASM_AX " \n\t"
>>> +                       "or $0x200, %%" _ASM_AX " \n\t"
>>> +                       "push %%" _ASM_AX " \n\t"
>>
>> Can simplify to pushf; orl $0x200, %%rsp.
> Sure.
>
>>> +                       "mov %%cs, %%" _ASM_AX " \n\t"
>>> +                       "push %%" _ASM_AX " \n\t"
>>
>> push %%cs
> "push %%cs" is invalid in x86_64.

Oops. 'push[lq] $__KERNEL_CS' then.

>
>>> +                       "push intr_return \n\t"
>>
>> push $1f.  Or even combine with the next instruction, and call %rdx.
> Which is faster? jmp or call?

Actually it doesn't matter, the processor is clever enough to minimize
the difference.  But the code is simpler and shorter with 'call'.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH v3 1/2] KVM: VMX: enable acknowledge interupt on vmexit
  2013-02-20 12:35       ` Avi Kivity
@ 2013-02-20 13:10         ` Zhang, Yang Z
  2013-02-20 15:10           ` Avi Kivity
  0 siblings, 1 reply; 16+ messages in thread
From: Zhang, Yang Z @ 2013-02-20 13:10 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, Gleb Natapov, Marcelo Tosatti

Avi Kivity wrote on 2013-02-20:
> On Wed, Feb 20, 2013 at 4:46 AM, Zhang, Yang Z <yang.z.zhang@intel.com>
> wrote:
>>>> 
>>>> +static void vmx_handle_external_intr(struct kvm_vcpu *vcpu) +{ +
>>>> u32 exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO); + +       /* +
>>>>    * If external interrupt exists, IF bit is set in rflags/eflags on
>>>> the +        * interrupt stack frame, and interrupt will be enabled on
>>>> a return +        * from interrupt handler. +        */ +       if
>>>> ((exit_intr_info & (INTR_INFO_VALID_MASK |
> INTR_INFO_INTR_TYPE_MASK)) +
>>>>                       == (INTR_INFO_VALID_MASK |
> INTR_TYPE_EXT_INTR)) {
>>>> +               unsigned int vector; +               unsigned long
>>>> entry; +               gate_desc *desc; +               struct
>>>> vcpu_vmx *vmx = to_vmx(vcpu); + +               vector = 
>>>> exit_intr_info & INTR_INFO_VECTOR_MASK; +#ifdef CONFIG_X86_64 + desc
>>>> = (void *)vmx->host_idt_base + vector * 16; +#else +              
>>>> desc = (void *)vmx->host_idt_base + vector * 8; +#endif + + entry =
>>>> gate_offset(*desc); +               asm( +
>>>>  "mov %0, %%" _ASM_DX " \n\t" +#ifdef CONFIG_X86_64 +
>>>>     "mov %%" _ASM_SP ", %%" _ASM_BX " \n\t" +
>>>> "and $0xfffffffffffffff0, %%" _ASM_SP " \n\t" +
>>>> "mov %%ss, %%" _ASM_AX " \n\t" +                       "push %%"
>>>> _ASM_AX " \n\t" +                       "push %%" _ASM_BX " \n\t"
>>>> +#endif
>>> 
>>> Are we sure no interrupts are using the IST feature?  I guess it's unlikely.
>> Linux uses IST for NMI, stack fault, machine-check, double fault and
>> debug interrupt . No external interrupt will use it. This feature is
>> only for external interrupt. So we don't need to check it here.
> 
> Ok, thanks for checking.
> 
>> 
>>> 
>>>> +                       "pushf \n\t"
>>>> +                       "pop %%" _ASM_AX " \n\t"
>>>> +                       "or $0x200, %%" _ASM_AX " \n\t"
>>>> +                       "push %%" _ASM_AX " \n\t"
>>> 
>>> Can simplify to pushf; orl $0x200, %%rsp.
>> Sure.
>> 
>>>> +                       "mov %%cs, %%" _ASM_AX " \n\t"
>>>> +                       "push %%" _ASM_AX " \n\t"
>>> 
>>> push %%cs
>> "push %%cs" is invalid in x86_64.
> 
> Oops. 'push[lq] $__KERNEL_CS' then.
Is this right? Just copy it from other file.

#define __STR(X) #X
#define STR(X) __STR(X)

#ifdef CONFIG_X86_64
                        "pushq $"STR(__KERNEL_CS)" \n\t"
#else
                        "pushl $"STR(__KERNEL_CS)" \n\t"
#endif

#undef STR
#undef __STR
 
>> 
>>>> +                       "push intr_return \n\t"
>>> 
>>> push $1f.  Or even combine with the next instruction, and call %rdx.
>> Which is faster? jmp or call?
> 
> Actually it doesn't matter, the processor is clever enough to minimize
> the difference.  But the code is simpler and shorter with 'call'. -- To
Yes, 'call' is better.

Best regards,
Yang



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 1/2] KVM: VMX: enable acknowledge interupt on vmexit
  2013-02-20 13:10         ` Zhang, Yang Z
@ 2013-02-20 15:10           ` Avi Kivity
  2013-02-21  8:58             ` Zhang, Yang Z
  0 siblings, 1 reply; 16+ messages in thread
From: Avi Kivity @ 2013-02-20 15:10 UTC (permalink / raw)
  To: Zhang, Yang Z; +Cc: kvm, Gleb Natapov, Marcelo Tosatti

On Wed, Feb 20, 2013 at 3:10 PM, Zhang, Yang Z <yang.z.zhang@intel.com> wrote:
>>>>
>>>> push %%cs
>>> "push %%cs" is invalid in x86_64.
>>
>> Oops. 'push[lq] $__KERNEL_CS' then.
> Is this right? Just copy it from other file.
>
> #define __STR(X) #X
> #define STR(X) __STR(X)
>
> #ifdef CONFIG_X86_64
>                         "pushq $"STR(__KERNEL_CS)" \n\t"
> #else
>                         "pushl $"STR(__KERNEL_CS)" \n\t"
> #endif
>
> #undef STR
> #undef __STR
>

Use __ASM_SIZE and an immediate operand for __KERNEL_CS:

 asm ( ... : : [cs]"i"(__KERNEL_CS) );

and the code will be cleaner.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH v3 2/2] KVM: VMX: Add Posted Interrupt supporting
  2013-02-19 13:39 ` [PATCH v3 2/2] KVM: VMX: Add Posted Interrupt supporting Yang Zhang
@ 2013-02-21  6:04   ` Zhang, Yang Z
  2013-02-21  6:22     ` Gleb Natapov
  0 siblings, 1 reply; 16+ messages in thread
From: Zhang, Yang Z @ 2013-02-21  6:04 UTC (permalink / raw)
  To: Zhang, Yang Z, kvm; +Cc: gleb, mtosatti

Hi Marcelo,

Can you help to review this patch? Many thanks if you can review it quickly.

Zhang, Yang Z wrote on 2013-02-19:
> From: Yang Zhang <yang.z.zhang@Intel.com>
> 
> Posted Interrupt allows APIC interrupts to inject into guest directly
> without any vmexit.
> 
> - When delivering a interrupt to guest, if target vcpu is running,
>   update Posted-interrupt requests bitmap and send a notification event
>   to the vcpu. Then the vcpu will handle this interrupt automatically,
>   without any software involvemnt.
> - If target vcpu is not running or there already a notification event
>   pending in the vcpu, do nothing. The interrupt will be handled by
>   next vm entry
> Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
> ---
>  arch/x86/include/asm/entry_arch.h  |    4 +
>  arch/x86/include/asm/hw_irq.h      |    1 +
>  arch/x86/include/asm/irq_vectors.h |    5 +
>  arch/x86/include/asm/kvm_host.h    |    3 + arch/x86/include/asm/vmx.h 
>         |    4 + arch/x86/kernel/entry_64.S         |    5 +
>  arch/x86/kernel/irq.c              |   20 +++++
>  arch/x86/kernel/irqinit.c          |    4 + arch/x86/kvm/lapic.c       
>         |   19 ++++- arch/x86/kvm/lapic.h               |    1 +
>  arch/x86/kvm/svm.c                 |   13 +++ arch/x86/kvm/vmx.c       
>           |  157 +++++++++++++++++++++++++++++++----- arch/x86/kvm/x86.c
>                  |    1 + 13 files changed, 214 insertions(+), 23
>  deletions(-)
> diff --git a/arch/x86/include/asm/entry_arch.h
> b/arch/x86/include/asm/entry_arch.h index 40afa00..9bd4eca 100644 ---
> a/arch/x86/include/asm/entry_arch.h +++
> b/arch/x86/include/asm/entry_arch.h @@ -19,6 +19,10 @@
> BUILD_INTERRUPT(reboot_interrupt,REBOOT_VECTOR)
> 
>  BUILD_INTERRUPT(x86_platform_ipi, X86_PLATFORM_IPI_VECTOR)
> +#ifdef CONFIG_HAVE_KVM +BUILD_INTERRUPT(kvm_posted_intr_ipi,
> POSTED_INTR_VECTOR) +#endif +
>  /*
>   * every pentium local APIC has two 'local interrupts', with a
>   * soft-definable vector attached to both interrupts, one of
> diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
> index eb92a6e..cebef02 100644
> --- a/arch/x86/include/asm/hw_irq.h
> +++ b/arch/x86/include/asm/hw_irq.h
> @@ -28,6 +28,7 @@
>  /* Interrupt handlers registered during init_IRQ */ extern void
>  apic_timer_interrupt(void); extern void x86_platform_ipi(void); +extern
>  void kvm_posted_intr_ipi(void); extern void error_interrupt(void);
>  extern void irq_work_interrupt(void);
> diff --git a/arch/x86/include/asm/irq_vectors.h
> b/arch/x86/include/asm/irq_vectors.h index 1508e51..774dc9f 100644 ---
> a/arch/x86/include/asm/irq_vectors.h +++
> b/arch/x86/include/asm/irq_vectors.h @@ -102,6 +102,11 @@
>   */
>  #define X86_PLATFORM_IPI_VECTOR		0xf7
> +/* Vector for KVM to deliver posted interrupt IPI */
> +#ifdef CONFIG_HAVE_KVM
> +#define POSTED_INTR_VECTOR		0xf2
> +#endif
> +
>  /*
>   * IRQ work vector:
>   */
> diff --git a/arch/x86/include/asm/kvm_host.h
> b/arch/x86/include/asm/kvm_host.h index b8388e9..79da55e 100644 ---
> a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h
> @@ -704,6 +704,9 @@ struct kvm_x86_ops {
>  	void (*hwapic_isr_update)(struct kvm *kvm, int isr);
>  	void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
>  	void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
> +	bool (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector,
> +					int *result, bool *delivered);
> +	void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
>  	int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
>  	int (*get_tdp_level)(void);
>  	u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
> diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
> index 5c9dbad..ce8ac80 100644
> --- a/arch/x86/include/asm/vmx.h
> +++ b/arch/x86/include/asm/vmx.h
> @@ -158,6 +158,7 @@
>  #define PIN_BASED_EXT_INTR_MASK                 0x00000001
>  #define PIN_BASED_NMI_EXITING                   0x00000008
>  #define PIN_BASED_VIRTUAL_NMIS                  0x00000020
> +#define PIN_BASED_POSTED_INTR                   0x00000080
> 
>  #define VM_EXIT_SAVE_DEBUG_CONTROLS             0x00000002 #define
>  VM_EXIT_HOST_ADDR_SPACE_SIZE            0x00000200 @@ -180,6 +181,7 @@
>  /* VMCS Encodings */ enum vmcs_field { 	VIRTUAL_PROCESSOR_ID           
>  = 0x00000000, +	POSTED_INTR_NV                  = 0x00000002,
>  	GUEST_ES_SELECTOR               = 0x00000800, 	GUEST_CS_SELECTOR      
>          = 0x00000802, 	GUEST_SS_SELECTOR               = 0x00000804, @@
>  -214,6 +216,8 @@ enum vmcs_field { 	VIRTUAL_APIC_PAGE_ADDR_HIGH     =
>  0x00002013, 	APIC_ACCESS_ADDR		= 0x00002014, 	APIC_ACCESS_ADDR_HIGH		=
>  0x00002015,
> +	POSTED_INTR_DESC_ADDR           = 0x00002016,
> +	POSTED_INTR_DESC_ADDR_HIGH      = 0x00002017,
>  	EPT_POINTER                     = 0x0000201a,
>  	EPT_POINTER_HIGH                = 0x0000201b,
>  	EOI_EXIT_BITMAP0                = 0x0000201c,
> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
> index 70641af..b409846 100644
> --- a/arch/x86/kernel/entry_64.S
> +++ b/arch/x86/kernel/entry_64.S
> @@ -1177,6 +1177,11 @@ apicinterrupt LOCAL_TIMER_VECTOR \
>  apicinterrupt X86_PLATFORM_IPI_VECTOR \
>  	x86_platform_ipi smp_x86_platform_ipi
> +#ifdef CONFIG_HAVE_KVM
> +apicinterrupt POSTED_INTR_VECTOR \
> +	kvm_posted_intr_ipi smp_posted_intr_ipi
> +#endif
> +
>  apicinterrupt THRESHOLD_APIC_VECTOR \
>  	threshold_interrupt smp_threshold_interrupt
>  apicinterrupt THERMAL_APIC_VECTOR \
> diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
> index e4595f1..da74d65 100644
> --- a/arch/x86/kernel/irq.c
> +++ b/arch/x86/kernel/irq.c
> @@ -228,6 +228,26 @@ void smp_x86_platform_ipi(struct pt_regs *regs)
>  	set_irq_regs(old_regs);
>  }
> +#ifdef CONFIG_HAVE_KVM
> +/*
> + * Handler for POSTED_INTERRUPT_VECTOR.
> + */
> +void smp_posted_intr_ipi(struct pt_regs *regs)
> +{
> +	struct pt_regs *old_regs = set_irq_regs(regs);
> +
> +	ack_APIC_irq();
> +
> +	irq_enter();
> +
> +	exit_idle();
> +
> +	irq_exit();
> +
> +	set_irq_regs(old_regs);
> +}
> +#endif
> +
>  EXPORT_SYMBOL_GPL(vector_used_by_percpu_irq);
>  
>  #ifdef CONFIG_HOTPLUG_CPU
> diff --git a/arch/x86/kernel/irqinit.c b/arch/x86/kernel/irqinit.c
> index 6e03b0d..2329a54 100644
> --- a/arch/x86/kernel/irqinit.c
> +++ b/arch/x86/kernel/irqinit.c
> @@ -205,6 +205,10 @@ static void __init apic_intr_init(void)
> 
>  	/* IPI for X86 platform specific use */
>  	alloc_intr_gate(X86_PLATFORM_IPI_VECTOR, x86_platform_ipi);
> +#ifdef CONFIG_HAVE_KVM +	/* IPI for KVM to deliver posted interrupt */
> +	alloc_intr_gate(POSTED_INTR_VECTOR, kvm_posted_intr_ipi); +#endif
> 
>  	/* IPI vectors for APIC spurious and error interrupts */
>  	alloc_intr_gate(SPURIOUS_APIC_VECTOR, spurious_interrupt);
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 02b51dd..ebc32bb 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -357,6 +357,12 @@ static u8 count_vectors(void *bitmap)
>  	return count;
>  }
> +int kvm_apic_test_irr(int vec, struct kvm_lapic *apic)
> +{
> +	return apic_test_vector(vec, apic->regs + APIC_IRR);
> +}
> +EXPORT_SYMBOL_GPL(kvm_apic_test_irr);
> +
>  static inline int apic_test_and_set_irr(int vec, struct kvm_lapic
>  *apic) { 	apic->irr_pending = true;
> @@ -379,6 +385,7 @@ static inline int apic_find_highest_irr(struct kvm_lapic
> *apic)
>  	if (!apic->irr_pending)
>  		return -1;
> +	kvm_x86_ops->sync_pir_to_irr(apic->vcpu);
>  	result = apic_search_irr(apic);
>  	ASSERT(result == -1 || result >= 16);
> @@ -685,6 +692,7 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int
> delivery_mode,
>  {
>  	int result = 0;
>  	struct kvm_vcpu *vcpu = apic->vcpu;
> +	bool delivered = false;
> 
>  	switch (delivery_mode) {
>  	case APIC_DM_LOWEST:
> @@ -700,7 +708,10 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int
> delivery_mode,
>  		} else
>  			apic_clear_vector(vector, apic->regs + APIC_TMR);
> -		result = !apic_test_and_set_irr(vector, apic);
> +		if (!kvm_x86_ops->deliver_posted_interrupt(vcpu, vector,
> +						&result, &delivered))
> +			result = !apic_test_and_set_irr(vector, apic);
> +
>  		trace_kvm_apic_accept_irq(vcpu->vcpu_id, delivery_mode,
>  					  trig_mode, vector, !result);
>  		if (!result) {
> @@ -710,8 +721,10 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int
> delivery_mode,
>  			break;
>  		}
> -		kvm_make_request(KVM_REQ_EVENT, vcpu); -		kvm_vcpu_kick(vcpu); +		if
> (!delivered) { +			kvm_make_request(KVM_REQ_EVENT, vcpu);
> +			kvm_vcpu_kick(vcpu); +		}
>  		break;
>  
>  	case APIC_DM_REMRD:
> diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
> index 1676d34..1a7016c 100644
> --- a/arch/x86/kvm/lapic.h
> +++ b/arch/x86/kvm/lapic.h
> @@ -157,5 +157,6 @@ static inline u16 apic_logical_id(struct kvm_apic_map
> *map, u32 ldr)
>  void kvm_calculate_eoi_exitmap(struct kvm_vcpu *vcpu,
>  				struct kvm_lapic_irq *irq,
>  				u64 *eoi_bitmap);
> +int kvm_apic_test_irr(int vec, struct kvm_lapic *apic);
> 
>  #endif
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index a7d60d7..9e705e3 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -3591,6 +3591,17 @@ static void svm_hwapic_isr_update(struct kvm *kvm,
> int isr)
>  	return;
>  }
> +static void svm_sync_pir_to_irr(struct kvm_vcpu *vcpu)
> +{
> +	return;
> +}
> +
> +static bool svm_deliver_posted_interrupt(struct kvm_vcpu *vcpu,
> +		int vector, int *result, bool *delivered)
> +{
> +	return false;
> +}
> +
>  static int svm_nmi_allowed(struct kvm_vcpu *vcpu) { 	struct vcpu_svm
>  *svm = to_svm(vcpu); @@ -4319,6 +4330,8 @@ static struct kvm_x86_ops
>  svm_x86_ops = { 	.vm_has_apicv = svm_vm_has_apicv, 	.load_eoi_exitmap =
>  svm_load_eoi_exitmap, 	.hwapic_isr_update = svm_hwapic_isr_update,
> +	.sync_pir_to_irr = svm_sync_pir_to_irr,
> +	.deliver_posted_interrupt = svm_deliver_posted_interrupt,
> 
>  	.set_tss_addr = svm_set_tss_addr,
>  	.get_tdp_level = get_npt_level,
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 436b134..2fdf537 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -84,7 +84,8 @@ module_param(vmm_exclusive, bool, S_IRUGO);
>  static bool __read_mostly fasteoi = 1;
>  module_param(fasteoi, bool, S_IRUGO);
> -static bool __read_mostly enable_apicv_reg_vid;
> +static bool __read_mostly enable_apicv = 1;
> +module_param(enable_apicv, bool, S_IRUGO);
> 
>  /*
>   * If nested=1, nested virtualization is supported, i.e., guests may use
> @@ -365,6 +366,36 @@ struct nested_vmx {
>  	struct page *apic_access_page;
>  };
> +#define POSTED_INTR_ON  0 +/* Posted-Interrupt Descriptor */ +struct
> pi_desc { +	u32 pir[8];     /* Posted interrupt requested */ +	union {
> +		struct { +			u8  on:1, +			    rsvd:7; +		} control; +		u32 rsvd[8];
> +	} u; +} __aligned(64); + +static bool pi_test_and_set_on(struct
> pi_desc *pi_desc) +{ +	return test_and_set_bit(POSTED_INTR_ON,
> +			(unsigned long *)&pi_desc->u.control); +} + +static bool
> pi_test_and_clear_on(struct pi_desc *pi_desc) +{ +	return
> test_and_clear_bit(POSTED_INTR_ON, +			(unsigned long
> *)&pi_desc->u.control); +} + +static int pi_test_and_set_pir(int vector,
> struct pi_desc *pi_desc) +{ +	return test_and_set_bit(vector, (unsigned
> long *)pi_desc->pir); +} +
>  struct vcpu_vmx {
>  	struct kvm_vcpu       vcpu;
>  	unsigned long         host_rsp;
> @@ -429,6 +460,9 @@ struct vcpu_vmx {
> 
>  	bool rdtscp_enabled;
> +	/* Posted interrupt descriptor */
> +	struct pi_desc pi_desc;
> +
>  	/* Support for a guest hypervisor (nested VMX) */
>  	struct nested_vmx nested;
>  };
> @@ -783,6 +817,18 @@ static inline bool
> cpu_has_vmx_virtual_intr_delivery(void)
>  		SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY;
>  }
> +static inline bool cpu_has_vmx_posted_intr(void) +{ +	return
> vmcs_config.pin_based_exec_ctrl & PIN_BASED_POSTED_INTR; +} + +static
> inline bool cpu_has_vmx_apicv(void) +{ +	return
> cpu_has_vmx_apic_register_virt() &&
> +		cpu_has_vmx_virtual_intr_delivery() && +		cpu_has_vmx_posted_intr();
> +} +
>  static inline bool cpu_has_vmx_flexpriority(void)
>  {
>  	return cpu_has_vmx_tpr_shadow() &&
> @@ -2530,12 +2576,6 @@ static __init int setup_vmcs_config(struct vmcs_config
> *vmcs_conf)
>  	u32 _vmexit_control = 0;
>  	u32 _vmentry_control = 0;
> -	min = PIN_BASED_EXT_INTR_MASK | PIN_BASED_NMI_EXITING;
> -	opt = PIN_BASED_VIRTUAL_NMIS;
> -	if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_PINBASED_CTLS,
> -				&_pin_based_exec_control) < 0)
> -		return -EIO;
> -
>  	min = CPU_BASED_HLT_EXITING |
>  #ifdef CONFIG_X86_64
>  	      CPU_BASED_CR8_LOAD_EXITING |
> @@ -2612,6 +2652,17 @@ static __init int setup_vmcs_config(struct vmcs_config
> *vmcs_conf)
>  				&_vmexit_control) < 0)
>  		return -EIO;
> +	min = PIN_BASED_EXT_INTR_MASK | PIN_BASED_NMI_EXITING;
> +	opt = PIN_BASED_VIRTUAL_NMIS | PIN_BASED_POSTED_INTR;
> +	if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_PINBASED_CTLS,
> +				&_pin_based_exec_control) < 0)
> +		return -EIO;
> +
> +	if (!(_cpu_based_2nd_exec_control &
> +		SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY) ||
> +		!(_vmexit_control & VM_EXIT_ACK_INTR_ON_EXIT))
> +		_pin_based_exec_control &= ~PIN_BASED_POSTED_INTR;
> +
>  	min = 0; 	opt = VM_ENTRY_LOAD_IA32_PAT; 	if (adjust_vmx_controls(min,
>  opt, MSR_IA32_VMX_ENTRY_CTLS, @@ -2790,11 +2841,10 @@ static __init int
>  hardware_setup(void) 	if (!cpu_has_vmx_ple()) 		ple_gap = 0;
> -	if (!cpu_has_vmx_apic_register_virt() ||
> -				!cpu_has_vmx_virtual_intr_delivery()) -		enable_apicv_reg_vid = 0;
> +	if (!cpu_has_vmx_apicv()) +		enable_apicv = 0;
> 
> -	if (enable_apicv_reg_vid)
> +	if (enable_apicv)
>  		kvm_x86_ops->update_cr8_intercept = NULL;
>  	else
>  		kvm_x86_ops->hwapic_irr_update = NULL;
> @@ -3871,6 +3921,62 @@ static void
> vmx_disable_intercept_msr_write_x2apic(u32 msr)
>  			msr, MSR_TYPE_W);
>  }
> +static int vmx_vm_has_apicv(struct kvm *kvm) +{ +	return enable_apicv
> && irqchip_in_kernel(kvm); +} + +static bool
> vmx_deliver_posted_interrupt(struct kvm_vcpu *vcpu, +		int vector, int
> *result, bool *delivered) +{ +	struct vcpu_vmx *vmx = to_vmx(vcpu); +
> +	if (!vmx_vm_has_apicv(vcpu->kvm)) +		return false; + +	if
> (kvm_apic_test_irr(vector, vcpu->arch.apic)) +		goto out; +	else {
> +		*result = !pi_test_and_set_pir(vector, &vmx->pi_desc); +		if
> (!*result) +			goto out; +	} + +	if (!pi_test_and_set_on(&vmx->pi_desc)
> && +			(vcpu->mode == IN_GUEST_MODE)) {
> +		kvm_make_request(KVM_REQ_EVENT, vcpu);
> +		apic->send_IPI_mask(get_cpu_mask(vcpu->cpu), +				   
> POSTED_INTR_VECTOR); +		*delivered = true; +	} +out: +	return true; +} +
> +static void vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu) +{ +	struct
> vcpu_vmx *vmx = to_vmx(vcpu); +	struct kvm_lapic *apic =
> vcpu->arch.apic; +	unsigned int i, old, new, ret_val, irr_offset,
> pir_val; + +	if (!vmx_vm_has_apicv(vcpu->kvm) ||
> +			!pi_test_and_clear_on(&vmx->pi_desc)) +		return; + +	for (i = 0; i
> <= 7; i++) { +		pir_val = xchg(&vmx->pi_desc.pir[i], 0); +		if (pir_val)
> { +			irr_offset = APIC_IRR + i * 0x10; +			do { +				old =
> kvm_apic_get_reg(apic, irr_offset); +				new = old | pir_val;
> +				ret_val = cmpxchg((u32 *)(apic->regs + +						irr_offset), old,
> new); +			} while (unlikely(ret_val != old)); +		} +	} +} +
>  /*
>   * Set up the vmcs's constant host-state fields, i.e., host-state fields that
>   * will not change in the lifetime of the guest.
> @@ -3931,6 +4037,15 @@ static void set_cr4_guest_host_mask(struct vcpu_vmx
> *vmx)
>  	vmcs_writel(CR4_GUEST_HOST_MASK,
>  ~vmx->vcpu.arch.cr4_guest_owned_bits); }
> +static u32 vmx_pin_based_exec_ctrl(struct vcpu_vmx *vmx)
> +{
> +	u32 pin_based_exec_ctrl = vmcs_config.pin_based_exec_ctrl;
> +
> +	if (!vmx_vm_has_apicv(vmx->vcpu.kvm))
> +		pin_based_exec_ctrl &= ~PIN_BASED_POSTED_INTR;
> +	return pin_based_exec_ctrl;
> +}
> +
>  static u32 vmx_exec_control(struct vcpu_vmx *vmx) { 	u32 exec_control =
>  vmcs_config.cpu_based_exec_ctrl; @@ -3948,11 +4063,6 @@ static u32
>  vmx_exec_control(struct vcpu_vmx *vmx) 	return exec_control; }
> -static int vmx_vm_has_apicv(struct kvm *kvm)
> -{
> -	return enable_apicv_reg_vid && irqchip_in_kernel(kvm);
> -}
> -
>  static u32 vmx_secondary_exec_control(struct vcpu_vmx *vmx) { 	u32
>  exec_control = vmcs_config.cpu_based_2nd_exec_ctrl; @@ -4008,8 +4118,7
>  @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
>  	vmcs_write64(VMCS_LINK_POINTER, -1ull); /* 22.3.1.5 */
>  
>  	/* Control */
> -	vmcs_write32(PIN_BASED_VM_EXEC_CONTROL,
> -		vmcs_config.pin_based_exec_ctrl);
> +	vmcs_write32(PIN_BASED_VM_EXEC_CONTROL, vmx_pin_based_exec_ctrl(vmx));
> 
>  	vmcs_write32(CPU_BASED_VM_EXEC_CONTROL,
> vmx_exec_control(vmx));
> 
> @@ -4018,13 +4127,16 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
>  				vmx_secondary_exec_control(vmx));
>  	}
> -	if (enable_apicv_reg_vid) {
> +	if (vmx_vm_has_apicv(vmx->vcpu.kvm)) {
>  		vmcs_write64(EOI_EXIT_BITMAP0, 0);
>  		vmcs_write64(EOI_EXIT_BITMAP1, 0);
>  		vmcs_write64(EOI_EXIT_BITMAP2, 0);
>  		vmcs_write64(EOI_EXIT_BITMAP3, 0);
>  
>  		vmcs_write16(GUEST_INTR_STATUS, 0);
> +
> +		vmcs_write64(POSTED_INTR_NV, POSTED_INTR_VECTOR);
> +		vmcs_write64(POSTED_INTR_DESC_ADDR, __pa((&vmx->pi_desc)));
>  	}
>  
>  	if (ple_gap) { @@ -4174,6 +4286,9 @@ static int vmx_vcpu_reset(struct
>  kvm_vcpu *vcpu) 		vmcs_write64(APIC_ACCESS_ADDR, 			    
>  page_to_phys(vmx->vcpu.kvm->arch.apic_access_page));
> +	if (vmx_vm_has_apicv(vcpu->kvm))
> +		memset(&vmx->pi_desc, 0, sizeof(struct pi_desc));
> +
>  	if (vmx->vpid != 0)
>  		vmcs_write16(VIRTUAL_PROCESSOR_ID, vmx->vpid);
> @@ -7650,6 +7765,8 @@ static struct kvm_x86_ops vmx_x86_ops = {
>  	.load_eoi_exitmap = vmx_load_eoi_exitmap,
>  	.hwapic_irr_update = vmx_hwapic_irr_update,
>  	.hwapic_isr_update = vmx_hwapic_isr_update,
> +	.sync_pir_to_irr = vmx_sync_pir_to_irr,
> +	.deliver_posted_interrupt = vmx_deliver_posted_interrupt,
> 
>  	.set_tss_addr = vmx_set_tss_addr, 	.get_tdp_level = get_ept_level, @@
>  -7753,7 +7870,7 @@ static int __init vmx_init(void)
>  	memcpy(vmx_msr_bitmap_longmode_x2apic, 			vmx_msr_bitmap_longmode,
>  PAGE_SIZE);
> -	if (enable_apicv_reg_vid) {
> +	if (enable_apicv) {
>  		for (msr = 0x800; msr <= 0x8ff; msr++)
>  			vmx_disable_intercept_msr_read_x2apic(msr);
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index f1fa37e..62f8c94 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2679,6 +2679,7 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>  static int kvm_vcpu_ioctl_get_lapic(struct kvm_vcpu *vcpu, 				   
>  struct kvm_lapic_state *s) { +	kvm_x86_ops->sync_pir_to_irr(vcpu);
>  	memcpy(s->regs, vcpu->arch.apic->regs, sizeof *s);
>  
>  	return 0;
> --
> 1.7.1


Best regards,
Yang



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 2/2] KVM: VMX: Add Posted Interrupt supporting
  2013-02-21  6:04   ` Zhang, Yang Z
@ 2013-02-21  6:22     ` Gleb Natapov
  0 siblings, 0 replies; 16+ messages in thread
From: Gleb Natapov @ 2013-02-21  6:22 UTC (permalink / raw)
  To: Zhang, Yang Z; +Cc: kvm, mtosatti

On Thu, Feb 21, 2013 at 06:04:52AM +0000, Zhang, Yang Z wrote:
> Hi Marcelo,
> 
> Can you help to review this patch? Many thanks if you can review it quickly.
> 
The patch is only 2 days on the list. Be patient.

> Zhang, Yang Z wrote on 2013-02-19:
> > From: Yang Zhang <yang.z.zhang@Intel.com>
> > 
> > Posted Interrupt allows APIC interrupts to inject into guest directly
> > without any vmexit.
> > 
> > - When delivering a interrupt to guest, if target vcpu is running,
> >   update Posted-interrupt requests bitmap and send a notification event
> >   to the vcpu. Then the vcpu will handle this interrupt automatically,
> >   without any software involvemnt.
> > - If target vcpu is not running or there already a notification event
> >   pending in the vcpu, do nothing. The interrupt will be handled by
> >   next vm entry
> > Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
> > ---
> >  arch/x86/include/asm/entry_arch.h  |    4 +
> >  arch/x86/include/asm/hw_irq.h      |    1 +
> >  arch/x86/include/asm/irq_vectors.h |    5 +
> >  arch/x86/include/asm/kvm_host.h    |    3 + arch/x86/include/asm/vmx.h 
> >         |    4 + arch/x86/kernel/entry_64.S         |    5 +
> >  arch/x86/kernel/irq.c              |   20 +++++
> >  arch/x86/kernel/irqinit.c          |    4 + arch/x86/kvm/lapic.c       
> >         |   19 ++++- arch/x86/kvm/lapic.h               |    1 +
> >  arch/x86/kvm/svm.c                 |   13 +++ arch/x86/kvm/vmx.c       
> >           |  157 +++++++++++++++++++++++++++++++----- arch/x86/kvm/x86.c
> >                  |    1 + 13 files changed, 214 insertions(+), 23
> >  deletions(-)
> > diff --git a/arch/x86/include/asm/entry_arch.h
> > b/arch/x86/include/asm/entry_arch.h index 40afa00..9bd4eca 100644 ---
> > a/arch/x86/include/asm/entry_arch.h +++
> > b/arch/x86/include/asm/entry_arch.h @@ -19,6 +19,10 @@
> > BUILD_INTERRUPT(reboot_interrupt,REBOOT_VECTOR)
> > 
> >  BUILD_INTERRUPT(x86_platform_ipi, X86_PLATFORM_IPI_VECTOR)
> > +#ifdef CONFIG_HAVE_KVM +BUILD_INTERRUPT(kvm_posted_intr_ipi,
> > POSTED_INTR_VECTOR) +#endif +
> >  /*
> >   * every pentium local APIC has two 'local interrupts', with a
> >   * soft-definable vector attached to both interrupts, one of
> > diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
> > index eb92a6e..cebef02 100644
> > --- a/arch/x86/include/asm/hw_irq.h
> > +++ b/arch/x86/include/asm/hw_irq.h
> > @@ -28,6 +28,7 @@
> >  /* Interrupt handlers registered during init_IRQ */ extern void
> >  apic_timer_interrupt(void); extern void x86_platform_ipi(void); +extern
> >  void kvm_posted_intr_ipi(void); extern void error_interrupt(void);
> >  extern void irq_work_interrupt(void);
> > diff --git a/arch/x86/include/asm/irq_vectors.h
> > b/arch/x86/include/asm/irq_vectors.h index 1508e51..774dc9f 100644 ---
> > a/arch/x86/include/asm/irq_vectors.h +++
> > b/arch/x86/include/asm/irq_vectors.h @@ -102,6 +102,11 @@
> >   */
> >  #define X86_PLATFORM_IPI_VECTOR		0xf7
> > +/* Vector for KVM to deliver posted interrupt IPI */
> > +#ifdef CONFIG_HAVE_KVM
> > +#define POSTED_INTR_VECTOR		0xf2
> > +#endif
> > +
> >  /*
> >   * IRQ work vector:
> >   */
> > diff --git a/arch/x86/include/asm/kvm_host.h
> > b/arch/x86/include/asm/kvm_host.h index b8388e9..79da55e 100644 ---
> > a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -704,6 +704,9 @@ struct kvm_x86_ops {
> >  	void (*hwapic_isr_update)(struct kvm *kvm, int isr);
> >  	void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
> >  	void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
> > +	bool (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector,
> > +					int *result, bool *delivered);
> > +	void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
> >  	int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
> >  	int (*get_tdp_level)(void);
> >  	u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
> > diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
> > index 5c9dbad..ce8ac80 100644
> > --- a/arch/x86/include/asm/vmx.h
> > +++ b/arch/x86/include/asm/vmx.h
> > @@ -158,6 +158,7 @@
> >  #define PIN_BASED_EXT_INTR_MASK                 0x00000001
> >  #define PIN_BASED_NMI_EXITING                   0x00000008
> >  #define PIN_BASED_VIRTUAL_NMIS                  0x00000020
> > +#define PIN_BASED_POSTED_INTR                   0x00000080
> > 
> >  #define VM_EXIT_SAVE_DEBUG_CONTROLS             0x00000002 #define
> >  VM_EXIT_HOST_ADDR_SPACE_SIZE            0x00000200 @@ -180,6 +181,7 @@
> >  /* VMCS Encodings */ enum vmcs_field { 	VIRTUAL_PROCESSOR_ID           
> >  = 0x00000000, +	POSTED_INTR_NV                  = 0x00000002,
> >  	GUEST_ES_SELECTOR               = 0x00000800, 	GUEST_CS_SELECTOR      
> >          = 0x00000802, 	GUEST_SS_SELECTOR               = 0x00000804, @@
> >  -214,6 +216,8 @@ enum vmcs_field { 	VIRTUAL_APIC_PAGE_ADDR_HIGH     =
> >  0x00002013, 	APIC_ACCESS_ADDR		= 0x00002014, 	APIC_ACCESS_ADDR_HIGH		=
> >  0x00002015,
> > +	POSTED_INTR_DESC_ADDR           = 0x00002016,
> > +	POSTED_INTR_DESC_ADDR_HIGH      = 0x00002017,
> >  	EPT_POINTER                     = 0x0000201a,
> >  	EPT_POINTER_HIGH                = 0x0000201b,
> >  	EOI_EXIT_BITMAP0                = 0x0000201c,
> > diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
> > index 70641af..b409846 100644
> > --- a/arch/x86/kernel/entry_64.S
> > +++ b/arch/x86/kernel/entry_64.S
> > @@ -1177,6 +1177,11 @@ apicinterrupt LOCAL_TIMER_VECTOR \
> >  apicinterrupt X86_PLATFORM_IPI_VECTOR \
> >  	x86_platform_ipi smp_x86_platform_ipi
> > +#ifdef CONFIG_HAVE_KVM
> > +apicinterrupt POSTED_INTR_VECTOR \
> > +	kvm_posted_intr_ipi smp_posted_intr_ipi
> > +#endif
> > +
> >  apicinterrupt THRESHOLD_APIC_VECTOR \
> >  	threshold_interrupt smp_threshold_interrupt
> >  apicinterrupt THERMAL_APIC_VECTOR \
> > diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
> > index e4595f1..da74d65 100644
> > --- a/arch/x86/kernel/irq.c
> > +++ b/arch/x86/kernel/irq.c
> > @@ -228,6 +228,26 @@ void smp_x86_platform_ipi(struct pt_regs *regs)
> >  	set_irq_regs(old_regs);
> >  }
> > +#ifdef CONFIG_HAVE_KVM
> > +/*
> > + * Handler for POSTED_INTERRUPT_VECTOR.
> > + */
> > +void smp_posted_intr_ipi(struct pt_regs *regs)
> > +{
> > +	struct pt_regs *old_regs = set_irq_regs(regs);
> > +
> > +	ack_APIC_irq();
> > +
> > +	irq_enter();
> > +
> > +	exit_idle();
> > +
> > +	irq_exit();
> > +
> > +	set_irq_regs(old_regs);
> > +}
> > +#endif
> > +
> >  EXPORT_SYMBOL_GPL(vector_used_by_percpu_irq);
> >  
> >  #ifdef CONFIG_HOTPLUG_CPU
> > diff --git a/arch/x86/kernel/irqinit.c b/arch/x86/kernel/irqinit.c
> > index 6e03b0d..2329a54 100644
> > --- a/arch/x86/kernel/irqinit.c
> > +++ b/arch/x86/kernel/irqinit.c
> > @@ -205,6 +205,10 @@ static void __init apic_intr_init(void)
> > 
> >  	/* IPI for X86 platform specific use */
> >  	alloc_intr_gate(X86_PLATFORM_IPI_VECTOR, x86_platform_ipi);
> > +#ifdef CONFIG_HAVE_KVM +	/* IPI for KVM to deliver posted interrupt */
> > +	alloc_intr_gate(POSTED_INTR_VECTOR, kvm_posted_intr_ipi); +#endif
> > 
> >  	/* IPI vectors for APIC spurious and error interrupts */
> >  	alloc_intr_gate(SPURIOUS_APIC_VECTOR, spurious_interrupt);
> > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > index 02b51dd..ebc32bb 100644
> > --- a/arch/x86/kvm/lapic.c
> > +++ b/arch/x86/kvm/lapic.c
> > @@ -357,6 +357,12 @@ static u8 count_vectors(void *bitmap)
> >  	return count;
> >  }
> > +int kvm_apic_test_irr(int vec, struct kvm_lapic *apic)
> > +{
> > +	return apic_test_vector(vec, apic->regs + APIC_IRR);
> > +}
> > +EXPORT_SYMBOL_GPL(kvm_apic_test_irr);
> > +
> >  static inline int apic_test_and_set_irr(int vec, struct kvm_lapic
> >  *apic) { 	apic->irr_pending = true;
> > @@ -379,6 +385,7 @@ static inline int apic_find_highest_irr(struct kvm_lapic
> > *apic)
> >  	if (!apic->irr_pending)
> >  		return -1;
> > +	kvm_x86_ops->sync_pir_to_irr(apic->vcpu);
> >  	result = apic_search_irr(apic);
> >  	ASSERT(result == -1 || result >= 16);
> > @@ -685,6 +692,7 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int
> > delivery_mode,
> >  {
> >  	int result = 0;
> >  	struct kvm_vcpu *vcpu = apic->vcpu;
> > +	bool delivered = false;
> > 
> >  	switch (delivery_mode) {
> >  	case APIC_DM_LOWEST:
> > @@ -700,7 +708,10 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int
> > delivery_mode,
> >  		} else
> >  			apic_clear_vector(vector, apic->regs + APIC_TMR);
> > -		result = !apic_test_and_set_irr(vector, apic);
> > +		if (!kvm_x86_ops->deliver_posted_interrupt(vcpu, vector,
> > +						&result, &delivered))
> > +			result = !apic_test_and_set_irr(vector, apic);
> > +
> >  		trace_kvm_apic_accept_irq(vcpu->vcpu_id, delivery_mode,
> >  					  trig_mode, vector, !result);
> >  		if (!result) {
> > @@ -710,8 +721,10 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int
> > delivery_mode,
> >  			break;
> >  		}
> > -		kvm_make_request(KVM_REQ_EVENT, vcpu); -		kvm_vcpu_kick(vcpu); +		if
> > (!delivered) { +			kvm_make_request(KVM_REQ_EVENT, vcpu);
> > +			kvm_vcpu_kick(vcpu); +		}
> >  		break;
> >  
> >  	case APIC_DM_REMRD:
> > diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
> > index 1676d34..1a7016c 100644
> > --- a/arch/x86/kvm/lapic.h
> > +++ b/arch/x86/kvm/lapic.h
> > @@ -157,5 +157,6 @@ static inline u16 apic_logical_id(struct kvm_apic_map
> > *map, u32 ldr)
> >  void kvm_calculate_eoi_exitmap(struct kvm_vcpu *vcpu,
> >  				struct kvm_lapic_irq *irq,
> >  				u64 *eoi_bitmap);
> > +int kvm_apic_test_irr(int vec, struct kvm_lapic *apic);
> > 
> >  #endif
> > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > index a7d60d7..9e705e3 100644
> > --- a/arch/x86/kvm/svm.c
> > +++ b/arch/x86/kvm/svm.c
> > @@ -3591,6 +3591,17 @@ static void svm_hwapic_isr_update(struct kvm *kvm,
> > int isr)
> >  	return;
> >  }
> > +static void svm_sync_pir_to_irr(struct kvm_vcpu *vcpu)
> > +{
> > +	return;
> > +}
> > +
> > +static bool svm_deliver_posted_interrupt(struct kvm_vcpu *vcpu,
> > +		int vector, int *result, bool *delivered)
> > +{
> > +	return false;
> > +}
> > +
> >  static int svm_nmi_allowed(struct kvm_vcpu *vcpu) { 	struct vcpu_svm
> >  *svm = to_svm(vcpu); @@ -4319,6 +4330,8 @@ static struct kvm_x86_ops
> >  svm_x86_ops = { 	.vm_has_apicv = svm_vm_has_apicv, 	.load_eoi_exitmap =
> >  svm_load_eoi_exitmap, 	.hwapic_isr_update = svm_hwapic_isr_update,
> > +	.sync_pir_to_irr = svm_sync_pir_to_irr,
> > +	.deliver_posted_interrupt = svm_deliver_posted_interrupt,
> > 
> >  	.set_tss_addr = svm_set_tss_addr,
> >  	.get_tdp_level = get_npt_level,
> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > index 436b134..2fdf537 100644
> > --- a/arch/x86/kvm/vmx.c
> > +++ b/arch/x86/kvm/vmx.c
> > @@ -84,7 +84,8 @@ module_param(vmm_exclusive, bool, S_IRUGO);
> >  static bool __read_mostly fasteoi = 1;
> >  module_param(fasteoi, bool, S_IRUGO);
> > -static bool __read_mostly enable_apicv_reg_vid;
> > +static bool __read_mostly enable_apicv = 1;
> > +module_param(enable_apicv, bool, S_IRUGO);
> > 
> >  /*
> >   * If nested=1, nested virtualization is supported, i.e., guests may use
> > @@ -365,6 +366,36 @@ struct nested_vmx {
> >  	struct page *apic_access_page;
> >  };
> > +#define POSTED_INTR_ON  0 +/* Posted-Interrupt Descriptor */ +struct
> > pi_desc { +	u32 pir[8];     /* Posted interrupt requested */ +	union {
> > +		struct { +			u8  on:1, +			    rsvd:7; +		} control; +		u32 rsvd[8];
> > +	} u; +} __aligned(64); + +static bool pi_test_and_set_on(struct
> > pi_desc *pi_desc) +{ +	return test_and_set_bit(POSTED_INTR_ON,
> > +			(unsigned long *)&pi_desc->u.control); +} + +static bool
> > pi_test_and_clear_on(struct pi_desc *pi_desc) +{ +	return
> > test_and_clear_bit(POSTED_INTR_ON, +			(unsigned long
> > *)&pi_desc->u.control); +} + +static int pi_test_and_set_pir(int vector,
> > struct pi_desc *pi_desc) +{ +	return test_and_set_bit(vector, (unsigned
> > long *)pi_desc->pir); +} +
> >  struct vcpu_vmx {
> >  	struct kvm_vcpu       vcpu;
> >  	unsigned long         host_rsp;
> > @@ -429,6 +460,9 @@ struct vcpu_vmx {
> > 
> >  	bool rdtscp_enabled;
> > +	/* Posted interrupt descriptor */
> > +	struct pi_desc pi_desc;
> > +
> >  	/* Support for a guest hypervisor (nested VMX) */
> >  	struct nested_vmx nested;
> >  };
> > @@ -783,6 +817,18 @@ static inline bool
> > cpu_has_vmx_virtual_intr_delivery(void)
> >  		SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY;
> >  }
> > +static inline bool cpu_has_vmx_posted_intr(void) +{ +	return
> > vmcs_config.pin_based_exec_ctrl & PIN_BASED_POSTED_INTR; +} + +static
> > inline bool cpu_has_vmx_apicv(void) +{ +	return
> > cpu_has_vmx_apic_register_virt() &&
> > +		cpu_has_vmx_virtual_intr_delivery() && +		cpu_has_vmx_posted_intr();
> > +} +
> >  static inline bool cpu_has_vmx_flexpriority(void)
> >  {
> >  	return cpu_has_vmx_tpr_shadow() &&
> > @@ -2530,12 +2576,6 @@ static __init int setup_vmcs_config(struct vmcs_config
> > *vmcs_conf)
> >  	u32 _vmexit_control = 0;
> >  	u32 _vmentry_control = 0;
> > -	min = PIN_BASED_EXT_INTR_MASK | PIN_BASED_NMI_EXITING;
> > -	opt = PIN_BASED_VIRTUAL_NMIS;
> > -	if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_PINBASED_CTLS,
> > -				&_pin_based_exec_control) < 0)
> > -		return -EIO;
> > -
> >  	min = CPU_BASED_HLT_EXITING |
> >  #ifdef CONFIG_X86_64
> >  	      CPU_BASED_CR8_LOAD_EXITING |
> > @@ -2612,6 +2652,17 @@ static __init int setup_vmcs_config(struct vmcs_config
> > *vmcs_conf)
> >  				&_vmexit_control) < 0)
> >  		return -EIO;
> > +	min = PIN_BASED_EXT_INTR_MASK | PIN_BASED_NMI_EXITING;
> > +	opt = PIN_BASED_VIRTUAL_NMIS | PIN_BASED_POSTED_INTR;
> > +	if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_PINBASED_CTLS,
> > +				&_pin_based_exec_control) < 0)
> > +		return -EIO;
> > +
> > +	if (!(_cpu_based_2nd_exec_control &
> > +		SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY) ||
> > +		!(_vmexit_control & VM_EXIT_ACK_INTR_ON_EXIT))
> > +		_pin_based_exec_control &= ~PIN_BASED_POSTED_INTR;
> > +
> >  	min = 0; 	opt = VM_ENTRY_LOAD_IA32_PAT; 	if (adjust_vmx_controls(min,
> >  opt, MSR_IA32_VMX_ENTRY_CTLS, @@ -2790,11 +2841,10 @@ static __init int
> >  hardware_setup(void) 	if (!cpu_has_vmx_ple()) 		ple_gap = 0;
> > -	if (!cpu_has_vmx_apic_register_virt() ||
> > -				!cpu_has_vmx_virtual_intr_delivery()) -		enable_apicv_reg_vid = 0;
> > +	if (!cpu_has_vmx_apicv()) +		enable_apicv = 0;
> > 
> > -	if (enable_apicv_reg_vid)
> > +	if (enable_apicv)
> >  		kvm_x86_ops->update_cr8_intercept = NULL;
> >  	else
> >  		kvm_x86_ops->hwapic_irr_update = NULL;
> > @@ -3871,6 +3921,62 @@ static void
> > vmx_disable_intercept_msr_write_x2apic(u32 msr)
> >  			msr, MSR_TYPE_W);
> >  }
> > +static int vmx_vm_has_apicv(struct kvm *kvm) +{ +	return enable_apicv
> > && irqchip_in_kernel(kvm); +} + +static bool
> > vmx_deliver_posted_interrupt(struct kvm_vcpu *vcpu, +		int vector, int
> > *result, bool *delivered) +{ +	struct vcpu_vmx *vmx = to_vmx(vcpu); +
> > +	if (!vmx_vm_has_apicv(vcpu->kvm)) +		return false; + +	if
> > (kvm_apic_test_irr(vector, vcpu->arch.apic)) +		goto out; +	else {
> > +		*result = !pi_test_and_set_pir(vector, &vmx->pi_desc); +		if
> > (!*result) +			goto out; +	} + +	if (!pi_test_and_set_on(&vmx->pi_desc)
> > && +			(vcpu->mode == IN_GUEST_MODE)) {
> > +		kvm_make_request(KVM_REQ_EVENT, vcpu);
> > +		apic->send_IPI_mask(get_cpu_mask(vcpu->cpu), +				   
> > POSTED_INTR_VECTOR); +		*delivered = true; +	} +out: +	return true; +} +
> > +static void vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu) +{ +	struct
> > vcpu_vmx *vmx = to_vmx(vcpu); +	struct kvm_lapic *apic =
> > vcpu->arch.apic; +	unsigned int i, old, new, ret_val, irr_offset,
> > pir_val; + +	if (!vmx_vm_has_apicv(vcpu->kvm) ||
> > +			!pi_test_and_clear_on(&vmx->pi_desc)) +		return; + +	for (i = 0; i
> > <= 7; i++) { +		pir_val = xchg(&vmx->pi_desc.pir[i], 0); +		if (pir_val)
> > { +			irr_offset = APIC_IRR + i * 0x10; +			do { +				old =
> > kvm_apic_get_reg(apic, irr_offset); +				new = old | pir_val;
> > +				ret_val = cmpxchg((u32 *)(apic->regs + +						irr_offset), old,
> > new); +			} while (unlikely(ret_val != old)); +		} +	} +} +
> >  /*
> >   * Set up the vmcs's constant host-state fields, i.e., host-state fields that
> >   * will not change in the lifetime of the guest.
> > @@ -3931,6 +4037,15 @@ static void set_cr4_guest_host_mask(struct vcpu_vmx
> > *vmx)
> >  	vmcs_writel(CR4_GUEST_HOST_MASK,
> >  ~vmx->vcpu.arch.cr4_guest_owned_bits); }
> > +static u32 vmx_pin_based_exec_ctrl(struct vcpu_vmx *vmx)
> > +{
> > +	u32 pin_based_exec_ctrl = vmcs_config.pin_based_exec_ctrl;
> > +
> > +	if (!vmx_vm_has_apicv(vmx->vcpu.kvm))
> > +		pin_based_exec_ctrl &= ~PIN_BASED_POSTED_INTR;
> > +	return pin_based_exec_ctrl;
> > +}
> > +
> >  static u32 vmx_exec_control(struct vcpu_vmx *vmx) { 	u32 exec_control =
> >  vmcs_config.cpu_based_exec_ctrl; @@ -3948,11 +4063,6 @@ static u32
> >  vmx_exec_control(struct vcpu_vmx *vmx) 	return exec_control; }
> > -static int vmx_vm_has_apicv(struct kvm *kvm)
> > -{
> > -	return enable_apicv_reg_vid && irqchip_in_kernel(kvm);
> > -}
> > -
> >  static u32 vmx_secondary_exec_control(struct vcpu_vmx *vmx) { 	u32
> >  exec_control = vmcs_config.cpu_based_2nd_exec_ctrl; @@ -4008,8 +4118,7
> >  @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
> >  	vmcs_write64(VMCS_LINK_POINTER, -1ull); /* 22.3.1.5 */
> >  
> >  	/* Control */
> > -	vmcs_write32(PIN_BASED_VM_EXEC_CONTROL,
> > -		vmcs_config.pin_based_exec_ctrl);
> > +	vmcs_write32(PIN_BASED_VM_EXEC_CONTROL, vmx_pin_based_exec_ctrl(vmx));
> > 
> >  	vmcs_write32(CPU_BASED_VM_EXEC_CONTROL,
> > vmx_exec_control(vmx));
> > 
> > @@ -4018,13 +4127,16 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
> >  				vmx_secondary_exec_control(vmx));
> >  	}
> > -	if (enable_apicv_reg_vid) {
> > +	if (vmx_vm_has_apicv(vmx->vcpu.kvm)) {
> >  		vmcs_write64(EOI_EXIT_BITMAP0, 0);
> >  		vmcs_write64(EOI_EXIT_BITMAP1, 0);
> >  		vmcs_write64(EOI_EXIT_BITMAP2, 0);
> >  		vmcs_write64(EOI_EXIT_BITMAP3, 0);
> >  
> >  		vmcs_write16(GUEST_INTR_STATUS, 0);
> > +
> > +		vmcs_write64(POSTED_INTR_NV, POSTED_INTR_VECTOR);
> > +		vmcs_write64(POSTED_INTR_DESC_ADDR, __pa((&vmx->pi_desc)));
> >  	}
> >  
> >  	if (ple_gap) { @@ -4174,6 +4286,9 @@ static int vmx_vcpu_reset(struct
> >  kvm_vcpu *vcpu) 		vmcs_write64(APIC_ACCESS_ADDR, 			    
> >  page_to_phys(vmx->vcpu.kvm->arch.apic_access_page));
> > +	if (vmx_vm_has_apicv(vcpu->kvm))
> > +		memset(&vmx->pi_desc, 0, sizeof(struct pi_desc));
> > +
> >  	if (vmx->vpid != 0)
> >  		vmcs_write16(VIRTUAL_PROCESSOR_ID, vmx->vpid);
> > @@ -7650,6 +7765,8 @@ static struct kvm_x86_ops vmx_x86_ops = {
> >  	.load_eoi_exitmap = vmx_load_eoi_exitmap,
> >  	.hwapic_irr_update = vmx_hwapic_irr_update,
> >  	.hwapic_isr_update = vmx_hwapic_isr_update,
> > +	.sync_pir_to_irr = vmx_sync_pir_to_irr,
> > +	.deliver_posted_interrupt = vmx_deliver_posted_interrupt,
> > 
> >  	.set_tss_addr = vmx_set_tss_addr, 	.get_tdp_level = get_ept_level, @@
> >  -7753,7 +7870,7 @@ static int __init vmx_init(void)
> >  	memcpy(vmx_msr_bitmap_longmode_x2apic, 			vmx_msr_bitmap_longmode,
> >  PAGE_SIZE);
> > -	if (enable_apicv_reg_vid) {
> > +	if (enable_apicv) {
> >  		for (msr = 0x800; msr <= 0x8ff; msr++)
> >  			vmx_disable_intercept_msr_read_x2apic(msr);
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index f1fa37e..62f8c94 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -2679,6 +2679,7 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
> >  static int kvm_vcpu_ioctl_get_lapic(struct kvm_vcpu *vcpu, 				   
> >  struct kvm_lapic_state *s) { +	kvm_x86_ops->sync_pir_to_irr(vcpu);
> >  	memcpy(s->regs, vcpu->arch.apic->regs, sizeof *s);
> >  
> >  	return 0;
> > --
> > 1.7.1
> 
> 
> Best regards,
> Yang
> 

--
			Gleb.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH v3 1/2] KVM: VMX: enable acknowledge interupt on vmexit
       [not found]       ` <CAG7+5M1c7mtENHao+1yFCQkQus78HXK+QQBi3vwE6chAr_ZxVA@mail.gmail.com>
@ 2013-02-21  8:06         ` Zhang, Yang Z
  0 siblings, 0 replies; 16+ messages in thread
From: Zhang, Yang Z @ 2013-02-21  8:06 UTC (permalink / raw)
  To: Eric Northup, kvm; +Cc: Avi Kivity, Gleb Natapov, mtosatti

Eric Northup wrote on 2013-02-21:
> On Tue, Feb 19, 2013 at 6:46 PM, Zhang, Yang Z <yang.z.zhang@intel.com> wrote:
>> 
>> Avi Kivity wrote on 2013-02-20:
>>> On Tue, Feb 19, 2013 at 3:39 PM, Yang Zhang <yang.z.zhang@intel.com>
>>> wrote:
>>>> From: Yang Zhang <yang.z.zhang@Intel.com>
>>>> 
>>>> The "acknowledge interrupt on exit" feature controls processor behavior
>>>> for external interrupt acknowledgement. When this control is set, the
>>>> processor acknowledges the interrupt controller to acquire the
>>>> interrupt vector on VM exit.
>>>> 
>>>> After enabling this feature, an interrupt which arrived when target cpu
>>>> is running in vmx non-root mode will be handled by vmx handler instead
>>>> of handler in idt. Currently, vmx handler only fakes an interrupt stack
>>>> and jump to idt table to let real handler to handle it. Further, we
>>>> will recognize the interrupt and only delivery the interrupt which not
>>>> belong to current vcpu through idt table. The interrupt which belonged
>>>> to current vcpu will be handled inside vmx handler. This will reduce
>>>> the interrupt handle cost of KVM.
>>>> 
>>>> Also, interrupt enable logic is changed if this feature is turnning on:
>>>> Before this patch, hypervior call local_irq_enable() to enable it
>>>> directly.
>>>> Now IF bit is set on interrupt stack frame, and will be enabled on a
>>>> return from
>>>> interrupt handler if exterrupt interrupt exists. If no external
>>>> interrupt, still
>>>> call local_irq_enable() to enable it.
>>>> 
>>>> Refer to Intel SDM volum 3, chapter 33.2.
>>>> 
>>>> 
>>>> +static void vmx_handle_external_intr(struct kvm_vcpu *vcpu) +{ +
>>>> u32 exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO); + +       /* +
>>>>    * If external interrupt exists, IF bit is set in rflags/eflags on
>>>> the +        * interrupt stack frame, and interrupt will be enabled on
>>>> a return +        * from interrupt handler. +        */ +       if
>>>> ((exit_intr_info & (INTR_INFO_VALID_MASK |
> INTR_INFO_INTR_TYPE_MASK)) +
>>>>                       == (INTR_INFO_VALID_MASK |
> INTR_TYPE_EXT_INTR)) {
>>>> +               unsigned int vector; +               unsigned long
>>>> entry; +               gate_desc *desc; +               struct
>>>> vcpu_vmx *vmx = to_vmx(vcpu); + +               vector = 
>>>> exit_intr_info & INTR_INFO_VECTOR_MASK; +#ifdef CONFIG_X86_64 + desc
>>>> = (void *)vmx->host_idt_base + vector * 16; +#else +              
>>>> desc = (void *)vmx->host_idt_base + vector * 8; +#endif + + entry =
>>>> gate_offset(*desc); +               asm( +
>>>>  "mov %0, %%" _ASM_DX " \n\t" +#ifdef CONFIG_X86_64 +
>>>>     "mov %%" _ASM_SP ", %%" _ASM_BX " \n\t" +
>>>> "and $0xfffffffffffffff0, %%" _ASM_SP " \n\t" +
>>>> "mov %%ss, %%" _ASM_AX " \n\t" +                       "push %%"
>>>> _ASM_AX " \n\t" +                       "push %%" _ASM_BX " \n\t"
>>>> +#endif
>>> 
>>> Are we sure no interrupts are using the IST feature?  I guess it's
>>> unlikely.
>> Linux uses IST for NMI, stack fault, machine-check, double fault and debug
>> interrupt . No external interrupt will use it.
>> This feature is only for external interrupt. So we don't need to check it
>> here.
> 
> Would it be silly paranoia to BUG_ON(desc->ist) ?
I think the BUG_ON is reasonable. I will add it in next version.

Best regards,
Yang



^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH v3 1/2] KVM: VMX: enable acknowledge interupt on vmexit
  2013-02-20 15:10           ` Avi Kivity
@ 2013-02-21  8:58             ` Zhang, Yang Z
  2013-02-21  9:22               ` Avi Kivity
  0 siblings, 1 reply; 16+ messages in thread
From: Zhang, Yang Z @ 2013-02-21  8:58 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, Gleb Natapov, Marcelo Tosatti

Avi Kivity wrote on 2013-02-20:
> On Wed, Feb 20, 2013 at 3:10 PM, Zhang, Yang Z <yang.z.zhang@intel.com>
> wrote:
>>>>> 
>>>>> push %%cs
>>>> "push %%cs" is invalid in x86_64.
>>> 
>>> Oops. 'push[lq] $__KERNEL_CS' then.
>> Is this right? Just copy it from other file.
>> 
>> #define __STR(X) #X
>> #define STR(X) __STR(X)
>> 
>> #ifdef CONFIG_X86_64
>>                         "pushq $"STR(__KERNEL_CS)" \n\t" #else "pushl
>>                         $"STR(__KERNEL_CS)" \n\t"
>> #endif
>> 
>> #undef STR
>> #undef __STR
>> 
> 
> Use __ASM_SIZE and an immediate operand for __KERNEL_CS:
> 
>  asm ( ... : : [cs]"i"(__KERNEL_CS) );
> and the code will be cleaner.
Thanks. Here is code after changing, please review it:

                asm(
                        "mov %0, %%" _ASM_DX " \n\t"
#ifdef CONFIG_X86_64
                        "mov %%" _ASM_SP ", %%" _ASM_BX " \n\t"
                        "and $0xfffffffffffffff0, %%" _ASM_SP " \n\t"
                        "mov %%ss, %%" _ASM_AX " \n\t"
                        "push %%" _ASM_AX " \n\t"
                        "push %%" _ASM_BX " \n\t"
#endif
                        "pushf \n\t"
                        "orl $0x200, (%%" _ASM_SP ") \n\t"
                        __ASM_SIZE(push) " %c[cs] \n\t"
                        "call *%% " _ASM_DX " \n\t"
                        : : "m"(entry), [cs]"i"(__KERNEL_CS) :
#ifdef CONFIG_X86_64
                        "rax", "rbx", "rdx"
#else
                        "edx"
#endif
                        );


Best regards,
Yang



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 1/2] KVM: VMX: enable acknowledge interupt on vmexit
  2013-02-21  8:58             ` Zhang, Yang Z
@ 2013-02-21  9:22               ` Avi Kivity
  2013-02-22  2:50                 ` Zhang, Yang Z
  0 siblings, 1 reply; 16+ messages in thread
From: Avi Kivity @ 2013-02-21  9:22 UTC (permalink / raw)
  To: Zhang, Yang Z; +Cc: kvm, Gleb Natapov, Marcelo Tosatti

On Thu, Feb 21, 2013 at 10:58 AM, Zhang, Yang Z <yang.z.zhang@intel.com> wrote:
> Thanks. Here is code after changing, please review it:
>
>                 asm(
>                         "mov %0, %%" _ASM_DX " \n\t"
> #ifdef CONFIG_X86_64
>                         "mov %%" _ASM_SP ", %%" _ASM_BX " \n\t"
>                         "and $0xfffffffffffffff0, %%" _ASM_SP " \n\t"
>                         "mov %%ss, %%" _ASM_AX " \n\t"
>                         "push %%" _ASM_AX " \n\t"
>                         "push %%" _ASM_BX " \n\t"
> #endif
>                         "pushf \n\t"
>                         "orl $0x200, (%%" _ASM_SP ") \n\t"
>                         __ASM_SIZE(push) " %c[cs] \n\t"
>                         "call *%% " _ASM_DX " \n\t"
>                         : : "m"(entry), [cs]"i"(__KERNEL_CS) :
> #ifdef CONFIG_X86_64
>                         "rax", "rbx", "rdx"
> #else
>                         "edx"
> #endif
>                         );

For cleanliness, you can provide more parameters via the constraints:

  "m"(entry) -> [entry]"r"(entry) (and then use "call *[entry]")
  use "push %c[ss]" for ss
  and [sp]"=&r"(tmp) instead of rdx/edx as the temporary for rsp

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH v3 1/2] KVM: VMX: enable acknowledge interupt on vmexit
  2013-02-21  9:22               ` Avi Kivity
@ 2013-02-22  2:50                 ` Zhang, Yang Z
  0 siblings, 0 replies; 16+ messages in thread
From: Zhang, Yang Z @ 2013-02-22  2:50 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, Gleb Natapov, Marcelo Tosatti

Avi Kivity wrote on 2013-02-21:
> On Thu, Feb 21, 2013 at 10:58 AM, Zhang, Yang Z <yang.z.zhang@intel.com>
> wrote:
>> Thanks. Here is code after changing, please review it:
>> 
>>                 asm(
>>                         "mov %0, %%" _ASM_DX " \n\t" #ifdef
>>                         CONFIG_X86_64 "mov %%" _ASM_SP ", %%" _ASM_BX "
>>                         \n\t" "and $0xfffffffffffffff0, %%" _ASM_SP "
>>                         \n\t" "mov %%ss, %%" _ASM_AX " \n\t" "push %%"
>>                         _ASM_AX " \n\t" "push %%" _ASM_BX " \n\t"
>>                         #endif "pushf \n\t" "orl $0x200, (%%" _ASM_SP
>>                         ") \n\t" __ASM_SIZE(push) " %c[cs] \n\t" "call
>>                         *%% " _ASM_DX " \n\t" : : "m"(entry),
>>                         [cs]"i"(__KERNEL_CS) : #ifdef CONFIG_X86_64
>>                         "rax", "rbx", "rdx" #else "edx" #endif );
> 
> For cleanliness, you can provide more parameters via the constraints:
> 
>   "m"(entry) -> [entry]"r"(entry) (and then use "call *[entry]")
>   use "push %c[ss]" for ss
>   and [sp]"=&r"(tmp) instead of rdx/edx as the temporary for rsp
I didn't found the __KERNEL_SS definition. But I noticed that KVM use __KERNEL_DS as HOST_SS_SELECTOR.
Please review the following changes.

                asm volatile(
#ifdef CONFIG_X86_64
                        "mov %%" _ASM_SP ", %[sp] \n\t"
                        "and $0xfffffffffffffff0, %%" _ASM_SP " \n\t"
                        "push $%c[ss] \n\t"
                        "push %[sp] \n\t"
#endif
                        "pushf \n\t"
                        "orl $0x200, (%%" _ASM_SP ") \n\t"
                        __ASM_SIZE(push) " $%c[cs] \n\t"
                        "call *%[entry] \n\t"
                        :
#ifdef CONFIG_X86_64
                        [sp]"=&r"(tmp)
#endif
                        : [entry]"r"(entry),
                        [ss]"i"(__KERNEL_DS),
                        [cs]"i"(__KERNEL_CS)
                        );


Best regards,
Yang



^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2013-02-22  2:51 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-19 13:39 [PATCH v3 0/2] KVM: VMX: Add Posted Interrupt supporting Yang Zhang
2013-02-19 13:39 ` [PATCH v3 1/2] KVM: VMX: enable acknowledge interupt on vmexit Yang Zhang
2013-02-19 17:35   ` Avi Kivity
2013-02-20  2:46     ` Zhang, Yang Z
2013-02-20 10:10       ` Gleb Natapov
2013-02-20 11:07         ` Zhang, Yang Z
2013-02-20 12:35       ` Avi Kivity
2013-02-20 13:10         ` Zhang, Yang Z
2013-02-20 15:10           ` Avi Kivity
2013-02-21  8:58             ` Zhang, Yang Z
2013-02-21  9:22               ` Avi Kivity
2013-02-22  2:50                 ` Zhang, Yang Z
     [not found]       ` <CAG7+5M1c7mtENHao+1yFCQkQus78HXK+QQBi3vwE6chAr_ZxVA@mail.gmail.com>
2013-02-21  8:06         ` Zhang, Yang Z
2013-02-19 13:39 ` [PATCH v3 2/2] KVM: VMX: Add Posted Interrupt supporting Yang Zhang
2013-02-21  6:04   ` Zhang, Yang Z
2013-02-21  6:22     ` Gleb Natapov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.