All of lore.kernel.org
 help / color / mirror / Atom feed
* [PART1 RFC 0/9] KVM: x86: Introduce SVM AVIC support
@ 2016-02-12 13:59 Suravee Suthikulpanit
  2016-02-12 13:59 ` [PART1 RFC 1/9] KVM: x86: Misc LAPIC changes to exposes helper functions Suravee Suthikulpanit
                   ` (9 more replies)
  0 siblings, 10 replies; 52+ messages in thread
From: Suravee Suthikulpanit @ 2016-02-12 13:59 UTC (permalink / raw)
  To: joro, alex.williamson, gleb, pbonzini
  Cc: kvm, linux-kernel, wei, sherry.hurwitz, Suravee Suthikulpanit

From: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>

OVERVIEW
========
This patch set is the first of the two-part patch series to introduce 
the new AMD Advance Virtual Interrupt Controller (AVIC) support.

Basically, SVM AVIC hardware virtualizes local APIC registers of each
vCPU via the virtual APIC (vAPIC) backing page. This allows guest access
to certain APIC registers without the need to emulate the hardware behavior
in the hypervisor. More information about AVIC can be found in the
AMD64 Architecture Programmer’s Manual Volume 2 - System Programming.

  http://support.amd.com/TechDocs/24593.pdf

For SVM AVIC, we extend the existing kvm_amd driver to:
  * Check CPUID to detect AVIC support in the processor
  * Program new fields in VMCB to enable AVIC
  * Introduce new AVIC data structures and add code to manage them
  * Handle two new AVIC #VMEXITs
  * Add new interrupt intjection code using vAPIC backing page
    instead of the existing V_IRQ, V_INTR_PRIO, V_INTR_VECTOR,
    and V_IGN_TPR fileds

Currently, this patch series does not enable AVIC by default.
Users can enable SVM AVIC by specifying avic=1 during insmod kvm-amd.

Later, in part 2, we will introduce the IOMMU AVIC support, which
provides speed up for PCI device passthrough use case by allowing
the IOMMU hardware to inject interrupt directly into the guest via
the vAPIC backing page.

PERFORMANCE RESULTS
===================
Currently, AVIC is supported in the AMD family 15h models 6Xh
(Carrizo) processors. Therefore, it is used to collect the 
perforamance data shown below.

Generaly, SVM AVIC alone (w/o IOMMU AVIC) should provide speedup for
IPI interrupt since hypervisor does not require VMEXIT to inject
these interrupts. Also, it should speed up the case when hypervisor
wants to inject an interrupt into a running guest by setting the
corresponded IRR bit in the vAPIC backing page and trigger
AVIC_DOORBELL MSR.

IPI PERFORMANCE
===============
For IPI, I have collected some performance number on 2 and 4 CPU running
hackbech with the following detail:

  hackbench -p -l 100000
  Running in process mode with 10 groups using 40 file descriptors each (== 400 tasks)
  Each sender will pass 100000 messages of 100 bytes

                |    2 vcpus    |    4 vcpus 
 ------------------------------------------------
         Vanila |  273.76       |  190.21	
  AVIC disabled |  260.51 (~5%) |  184.40 (~5%)
          AVIC  |  239.03 (~10%)|  166.37 (~10%)

OVERALL PERFORMANCE
===================
Enabling AVIC should helps speeding up workloads, which generate
large amount of interrupts. However, it requires additional logics to:
  * Maintain AVIC-specific data structures during vCPU load/unload
    due to schedule in/out.
  * Track and manange interrupt pending in vAPIC backing page.

The goal is to minimize the overhead of AVIC in most cases, so that
we can achieve equivalent or improvement in overall performance when
enabling AVIC.

This is an on-going investigation and to be discussed.

CURRENT UNSUPPORT USE-CASES
===========================
    - Nested VM
    - VM Migration

GITHUB
======
Latest git tree can be found at:
    http://github.com/ssuthiku/linux.git    avic_part1_rfc

Any feedback and comments are very much appreciated.

Thank you,
Suravee

Suravee Suthikulpanit (9):
  KVM: x86: Misc LAPIC changes to exposes helper functions
  svm: Introduce new AVIC VMCB registers
  svm: clean up V_TPR, V_IRQ, V_INTR_PRIO, and V_INTR_MASKING
  KVM: x86: Detect and Initialize AVIC support
  svm: Add VMEXIT handlers for AVIC
  svm: Add interrupt injection via AVIC
  svm: Do not expose x2APIC when enable AVIC
  svm: Do not intercept CR8 when enable AVIC
  svm: Manage vcpu load/unload when enable AVIC

 arch/x86/include/asm/cpufeature.h |   1 +
 arch/x86/include/asm/kvm_host.h   |   4 +
 arch/x86/include/asm/msr-index.h  |   1 +
 arch/x86/include/asm/svm.h        |  38 +-
 arch/x86/include/uapi/asm/svm.h   |   9 +-
 arch/x86/kernel/cpu/scattered.c   |   1 +
 arch/x86/kvm/lapic.c              |  53 +--
 arch/x86/kvm/lapic.h              |   5 +
 arch/x86/kvm/svm.c                | 882 +++++++++++++++++++++++++++++++++++++-
 arch/x86/kvm/x86.c                |   4 +-
 10 files changed, 940 insertions(+), 58 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PART1 RFC 1/9] KVM: x86: Misc LAPIC changes to exposes helper functions
  2016-02-12 13:59 [PART1 RFC 0/9] KVM: x86: Introduce SVM AVIC support Suravee Suthikulpanit
@ 2016-02-12 13:59 ` Suravee Suthikulpanit
  2016-02-12 13:59 ` [PART1 RFC 2/9] svm: Introduce new AVIC VMCB registers Suravee Suthikulpanit
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 52+ messages in thread
From: Suravee Suthikulpanit @ 2016-02-12 13:59 UTC (permalink / raw)
  To: joro, alex.williamson, gleb, pbonzini
  Cc: kvm, linux-kernel, wei, sherry.hurwitz, Suravee Suthikulpanit,
	Suravee Suthikulpanit

Exporting LAPIC utility functions and macros to reuse.

Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
---
 arch/x86/kvm/lapic.c | 50 ++++++++++++++++++++++++++------------------------
 arch/x86/kvm/lapic.h |  5 +++++
 2 files changed, 31 insertions(+), 24 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 36591fa..fc313a0 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -59,9 +59,8 @@
 /* #define apic_debug(fmt,arg...) printk(KERN_WARNING fmt,##arg) */
 #define apic_debug(fmt, arg...)
 
-#define APIC_LVT_NUM			6
 /* 14 is the version for Xeon and Pentium 8.4.8*/
-#define APIC_VERSION			(0x14UL | ((APIC_LVT_NUM - 1) << 16))
+#define APIC_VERSION			(0x14UL | ((KVM_APIC_LVT_NUM - 1) << 16))
 #define LAPIC_MMIO_LENGTH		(1 << 12)
 /* followed define is not in apicdef.h */
 #define APIC_SHORT_MASK			0xc0000
@@ -94,10 +93,11 @@ bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector)
 		apic_test_vector(vector, apic->regs + APIC_IRR);
 }
 
-static inline void apic_set_vector(int vec, void *bitmap)
+void kvm_lapic_set_vector(int vec, void *bitmap)
 {
 	set_bit(VEC_POS(vec), (bitmap) + REG_POS(vec));
 }
+EXPORT_SYMBOL_GPL(kvm_lapic_set_vector);
 
 static inline void apic_clear_vector(int vec, void *bitmap)
 {
@@ -290,7 +290,7 @@ void kvm_apic_set_version(struct kvm_vcpu *vcpu)
 	apic_set_reg(apic, APIC_LVR, v);
 }
 
-static const unsigned int apic_lvt_mask[APIC_LVT_NUM] = {
+static const unsigned int apic_lvt_mask[KVM_APIC_LVT_NUM] = {
 	LVT_MASK ,      /* part LVTT mask, timer mode mask added at runtime */
 	LVT_MASK | APIC_MODE_MASK,	/* LVTTHMR */
 	LVT_MASK | APIC_MODE_MASK,	/* LVTPC */
@@ -351,7 +351,7 @@ EXPORT_SYMBOL_GPL(kvm_apic_update_irr);
 
 static inline void apic_set_irr(int vec, struct kvm_lapic *apic)
 {
-	apic_set_vector(vec, apic->regs + APIC_IRR);
+	kvm_lapic_set_vector(vec, apic->regs + APIC_IRR);
 	/*
 	 * irr_pending must be true if any interrupt is pending; set it after
 	 * APIC_IRR to avoid race with apic_clear_irr
@@ -844,7 +844,7 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
 
 		if (apic_test_vector(vector, apic->regs + APIC_TMR) != !!trig_mode) {
 			if (trig_mode)
-				apic_set_vector(vector, apic->regs + APIC_TMR);
+				kvm_lapic_set_vector(vector, apic->regs + APIC_TMR);
 			else
 				apic_clear_vector(vector, apic->regs + APIC_TMR);
 		}
@@ -1109,7 +1109,7 @@ static inline struct kvm_lapic *to_lapic(struct kvm_io_device *dev)
 	return container_of(dev, struct kvm_lapic, dev);
 }
 
-static int apic_reg_read(struct kvm_lapic *apic, u32 offset, int len,
+int kvm_lapic_reg_read(struct kvm_lapic *apic, u32 offset, int len,
 		void *data)
 {
 	unsigned char alignment = offset & 0xf;
@@ -1146,6 +1146,7 @@ static int apic_reg_read(struct kvm_lapic *apic, u32 offset, int len,
 	}
 	return 0;
 }
+EXPORT_SYMBOL_GPL(kvm_lapic_reg_read);
 
 static int apic_mmio_in_range(struct kvm_lapic *apic, gpa_t addr)
 {
@@ -1163,7 +1164,7 @@ static int apic_mmio_read(struct kvm_vcpu *vcpu, struct kvm_io_device *this,
 	if (!apic_mmio_in_range(apic, address))
 		return -EOPNOTSUPP;
 
-	apic_reg_read(apic, offset, len, data);
+	kvm_lapic_reg_read(apic, offset, len, data);
 
 	return 0;
 }
@@ -1348,7 +1349,7 @@ static void apic_manage_nmi_watchdog(struct kvm_lapic *apic, u32 lvt0_val)
 	}
 }
 
-static int apic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
+int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
 {
 	int ret = 0;
 
@@ -1395,7 +1396,7 @@ static int apic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
 			int i;
 			u32 lvt_val;
 
-			for (i = 0; i < APIC_LVT_NUM; i++) {
+			for (i = 0; i < KVM_APIC_LVT_NUM; i++) {
 				lvt_val = kvm_apic_get_reg(apic,
 						       APIC_LVTT + 0x10 * i);
 				apic_set_reg(apic, APIC_LVTT + 0x10 * i,
@@ -1467,7 +1468,7 @@ static int apic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
 
 	case APIC_SELF_IPI:
 		if (apic_x2apic_mode(apic)) {
-			apic_reg_write(apic, APIC_ICR, 0x40000 | (val & 0xff));
+			kvm_lapic_reg_write(apic, APIC_ICR, 0x40000 | (val & 0xff));
 		} else
 			ret = 1;
 		break;
@@ -1479,6 +1480,7 @@ static int apic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
 		apic_debug("Local APIC Write to read-only register %x\n", reg);
 	return ret;
 }
+EXPORT_SYMBOL_GPL(kvm_lapic_reg_write);
 
 static int apic_mmio_write(struct kvm_vcpu *vcpu, struct kvm_io_device *this,
 			    gpa_t address, int len, const void *data)
@@ -1508,7 +1510,7 @@ static int apic_mmio_write(struct kvm_vcpu *vcpu, struct kvm_io_device *this,
 		apic_debug("%s: offset 0x%x with length 0x%x, and value is "
 			   "0x%x\n", __func__, offset, len, val);
 
-	apic_reg_write(apic, offset & 0xff0, val);
+	kvm_lapic_reg_write(apic, offset & 0xff0, val);
 
 	return 0;
 }
@@ -1516,7 +1518,7 @@ static int apic_mmio_write(struct kvm_vcpu *vcpu, struct kvm_io_device *this,
 void kvm_lapic_set_eoi(struct kvm_vcpu *vcpu)
 {
 	if (kvm_vcpu_has_lapic(vcpu))
-		apic_reg_write(vcpu->arch.apic, APIC_EOI, 0);
+		kvm_lapic_reg_write(vcpu->arch.apic, APIC_EOI, 0);
 }
 EXPORT_SYMBOL_GPL(kvm_lapic_set_eoi);
 
@@ -1528,10 +1530,10 @@ void kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset)
 	/* hw has done the conditional check and inst decode */
 	offset &= 0xff0;
 
-	apic_reg_read(vcpu->arch.apic, offset, 4, &val);
+	kvm_lapic_reg_read(vcpu->arch.apic, offset, 4, &val);
 
 	/* TODO: optimize to just emulate side effect w/o one more write */
-	apic_reg_write(vcpu->arch.apic, offset, val);
+	kvm_lapic_reg_write(vcpu->arch.apic, offset, val);
 }
 EXPORT_SYMBOL_GPL(kvm_apic_write_nodecode);
 
@@ -1670,7 +1672,7 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event)
 		kvm_apic_set_id(apic, vcpu->vcpu_id);
 	kvm_apic_set_version(apic->vcpu);
 
-	for (i = 0; i < APIC_LVT_NUM; i++)
+	for (i = 0; i < KVM_APIC_LVT_NUM; i++)
 		apic_set_reg(apic, APIC_LVTT + 0x10 * i, APIC_LVT_MASKED);
 	apic_update_lvtt(apic);
 	if (kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_LINT0_REENABLED))
@@ -2073,8 +2075,8 @@ int kvm_x2apic_msr_write(struct kvm_vcpu *vcpu, u32 msr, u64 data)
 
 	/* if this is ICR write vector before command */
 	if (reg == APIC_ICR)
-		apic_reg_write(apic, APIC_ICR2, (u32)(data >> 32));
-	return apic_reg_write(apic, reg, (u32)data);
+		kvm_lapic_reg_write(apic, APIC_ICR2, (u32)(data >> 32));
+	return kvm_lapic_reg_write(apic, reg, (u32)data);
 }
 
 int kvm_x2apic_msr_read(struct kvm_vcpu *vcpu, u32 msr, u64 *data)
@@ -2091,10 +2093,10 @@ int kvm_x2apic_msr_read(struct kvm_vcpu *vcpu, u32 msr, u64 *data)
 		return 1;
 	}
 
-	if (apic_reg_read(apic, reg, 4, &low))
+	if (kvm_lapic_reg_read(apic, reg, 4, &low))
 		return 1;
 	if (reg == APIC_ICR)
-		apic_reg_read(apic, APIC_ICR2, 4, &high);
+		kvm_lapic_reg_read(apic, APIC_ICR2, 4, &high);
 
 	*data = (((u64)high) << 32) | low;
 
@@ -2110,8 +2112,8 @@ int kvm_hv_vapic_msr_write(struct kvm_vcpu *vcpu, u32 reg, u64 data)
 
 	/* if this is ICR write vector before command */
 	if (reg == APIC_ICR)
-		apic_reg_write(apic, APIC_ICR2, (u32)(data >> 32));
-	return apic_reg_write(apic, reg, (u32)data);
+		kvm_lapic_reg_write(apic, APIC_ICR2, (u32)(data >> 32));
+	return kvm_lapic_reg_write(apic, reg, (u32)data);
 }
 
 int kvm_hv_vapic_msr_read(struct kvm_vcpu *vcpu, u32 reg, u64 *data)
@@ -2122,10 +2124,10 @@ int kvm_hv_vapic_msr_read(struct kvm_vcpu *vcpu, u32 reg, u64 *data)
 	if (!kvm_vcpu_has_lapic(vcpu))
 		return 1;
 
-	if (apic_reg_read(apic, reg, 4, &low))
+	if (kvm_lapic_reg_read(apic, reg, 4, &low))
 		return 1;
 	if (reg == APIC_ICR)
-		apic_reg_read(apic, APIC_ICR2, 4, &high);
+		kvm_lapic_reg_read(apic, APIC_ICR2, 4, &high);
 
 	*data = (((u64)high) << 32) | low;
 
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 41bdb35..0b68f5a 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -7,6 +7,7 @@
 
 #define KVM_APIC_INIT		0
 #define KVM_APIC_SIPI		1
+#define KVM_APIC_LVT_NUM	6
 
 struct kvm_timer {
 	struct hrtimer timer;
@@ -56,6 +57,10 @@ void kvm_lapic_set_eoi(struct kvm_vcpu *vcpu);
 void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value);
 u64 kvm_lapic_get_base(struct kvm_vcpu *vcpu);
 void kvm_apic_set_version(struct kvm_vcpu *vcpu);
+void kvm_lapic_set_vector(int vec, void *bitmap);
+int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val);
+int kvm_lapic_reg_read(struct kvm_lapic *apic, u32 offset, int len,
+		       void *data);
 
 void __kvm_apic_update_irr(u32 *pir, void *regs);
 void kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PART1 RFC 2/9] svm: Introduce new AVIC VMCB registers
  2016-02-12 13:59 [PART1 RFC 0/9] KVM: x86: Introduce SVM AVIC support Suravee Suthikulpanit
  2016-02-12 13:59 ` [PART1 RFC 1/9] KVM: x86: Misc LAPIC changes to exposes helper functions Suravee Suthikulpanit
@ 2016-02-12 13:59 ` Suravee Suthikulpanit
  2016-02-12 13:59 ` [PART1 RFC 3/9] svm: clean up V_TPR, V_IRQ, V_INTR_PRIO, and V_INTR_MASKING Suravee Suthikulpanit
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 52+ messages in thread
From: Suravee Suthikulpanit @ 2016-02-12 13:59 UTC (permalink / raw)
  To: joro, alex.williamson, gleb, pbonzini
  Cc: kvm, linux-kernel, wei, sherry.hurwitz, Suravee Suthikulpanit,
	Suravee Suthikulpanit

Introduce new AVIC VMCB registers. Also breakdown int_ctl register
into bit-field for ease of use.

Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
---
 arch/x86/include/asm/svm.h | 29 ++++++++++++++++++++++++-----
 1 file changed, 24 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 6136d99..db5d7af 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -67,10 +67,24 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
 	u32 asid;
 	u8 tlb_ctl;
 	u8 reserved_2[3];
-	u32 int_ctl;
+	union { /* Offset 0x60 */
+		u32 int_ctl;
+
+		struct __attribute__ ((__packed__)) {
+			u32 v_tpr		: 8,
+			    v_irq		: 1,
+			    reserved_3		: 7,
+			    v_intr_prio		: 4,
+			    v_ign_tpr		: 1,
+			    reserved_4		: 3,
+			    v_intr_masking	: 1,
+			    reserved_5		: 6,
+			    avic_enable		: 1;
+		};
+	};
 	u32 int_vector;
 	u32 int_state;
-	u8 reserved_3[4];
+	u8 reserved_6[4];
 	u32 exit_code;
 	u32 exit_code_hi;
 	u64 exit_info_1;
@@ -78,17 +92,22 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
 	u32 exit_int_info;
 	u32 exit_int_info_err;
 	u64 nested_ctl;
-	u8 reserved_4[16];
+	u64 avic_vapic_bar;
+	u8 reserved_7[8];
 	u32 event_inj;
 	u32 event_inj_err;
 	u64 nested_cr3;
 	u64 lbr_ctl;
 	u32 clean;
-	u32 reserved_5;
+	u32 reserved_8;
 	u64 next_rip;
 	u8 insn_len;
 	u8 insn_bytes[15];
-	u8 reserved_6[800];
+	u64 avic_bk_page;	/* Offset 0xe0 */
+	u8 reserved_9[8];	/* Offset 0xe8 */
+	u64 avic_log_apic_id;	/* Offset 0xf0 */
+	u64 avic_phy_apic_id;	/* Offset 0xf8 */
+	u8 reserved_10[768];
 };
 
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PART1 RFC 3/9] svm: clean up V_TPR, V_IRQ, V_INTR_PRIO, and V_INTR_MASKING
  2016-02-12 13:59 [PART1 RFC 0/9] KVM: x86: Introduce SVM AVIC support Suravee Suthikulpanit
  2016-02-12 13:59 ` [PART1 RFC 1/9] KVM: x86: Misc LAPIC changes to exposes helper functions Suravee Suthikulpanit
  2016-02-12 13:59 ` [PART1 RFC 2/9] svm: Introduce new AVIC VMCB registers Suravee Suthikulpanit
@ 2016-02-12 13:59 ` Suravee Suthikulpanit
  2016-02-12 13:59 ` [PART1 RFC 4/9] KVM: x86: Detect and Initialize AVIC support Suravee Suthikulpanit
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 52+ messages in thread
From: Suravee Suthikulpanit @ 2016-02-12 13:59 UTC (permalink / raw)
  To: joro, alex.williamson, gleb, pbonzini
  Cc: kvm, linux-kernel, wei, sherry.hurwitz, Suravee Suthikulpanit,
	Suravee Suthikulpanit

Now that we have defined the bitfield, use them to replace existing
macros. This patch should not have functional change.

Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
---
 arch/x86/include/asm/svm.h |  9 ---------
 arch/x86/kvm/svm.c         | 24 ++++++++++++------------
 2 files changed, 12 insertions(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index db5d7af..7bb34c9 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -118,18 +118,9 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
 
 #define V_TPR_MASK 0x0f
 
-#define V_IRQ_SHIFT 8
-#define V_IRQ_MASK (1 << V_IRQ_SHIFT)
-
-#define V_INTR_PRIO_SHIFT 16
-#define V_INTR_PRIO_MASK (0x0f << V_INTR_PRIO_SHIFT)
-
 #define V_IGN_TPR_SHIFT 20
 #define V_IGN_TPR_MASK (1 << V_IGN_TPR_SHIFT)
 
-#define V_INTR_MASKING_SHIFT 24
-#define V_INTR_MASKING_MASK (1 << V_INTR_MASKING_SHIFT)
-
 #define SVM_INTERRUPT_SHADOW_MASK 1
 
 #define SVM_IOIO_STR_SHIFT 2
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index c13a64b..ca185fb 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1052,7 +1052,7 @@ static void init_vmcb(struct vcpu_svm *svm)
 
 	control->iopm_base_pa = iopm_base;
 	control->msrpm_base_pa = __pa(svm->msrpm);
-	control->int_ctl = V_INTR_MASKING_MASK;
+	control->v_intr_masking = 1;
 
 	init_seg(&save->es);
 	init_seg(&save->ss);
@@ -2306,7 +2306,7 @@ static int nested_svm_vmexit(struct vcpu_svm *svm)
 
 	/* We always set V_INTR_MASKING and remember the old value in hflags */
 	if (!(svm->vcpu.arch.hflags & HF_VINTR_MASK))
-		nested_vmcb->control.int_ctl &= ~V_INTR_MASKING_MASK;
+		nested_vmcb->control.v_intr_masking = 0;
 
 	/* Restore the original control entries */
 	copy_vmcb_control_area(vmcb, hsave);
@@ -2516,8 +2516,9 @@ static bool nested_svm_vmrun(struct vcpu_svm *svm)
 	svm->nested.intercept            = nested_vmcb->control.intercept;
 
 	svm_flush_tlb(&svm->vcpu);
-	svm->vmcb->control.int_ctl = nested_vmcb->control.int_ctl | V_INTR_MASKING_MASK;
-	if (nested_vmcb->control.int_ctl & V_INTR_MASKING_MASK)
+	svm->vmcb->control.int_ctl = nested_vmcb->control.int_ctl;
+	svm->vmcb->control.v_intr_masking = 1;
+	if (nested_vmcb->control.v_intr_masking)
 		svm->vcpu.arch.hflags |= HF_VINTR_MASK;
 	else
 		svm->vcpu.arch.hflags &= ~HF_VINTR_MASK;
@@ -2670,7 +2671,7 @@ static int clgi_interception(struct vcpu_svm *svm)
 
 	/* After a CLGI no interrupts should come */
 	svm_clear_vintr(svm);
-	svm->vmcb->control.int_ctl &= ~V_IRQ_MASK;
+	svm->vmcb->control.v_irq = 0;
 
 	mark_dirty(svm->vmcb, VMCB_INTR);
 
@@ -3247,7 +3248,7 @@ static int interrupt_window_interception(struct vcpu_svm *svm)
 {
 	kvm_make_request(KVM_REQ_EVENT, &svm->vcpu);
 	svm_clear_vintr(svm);
-	svm->vmcb->control.int_ctl &= ~V_IRQ_MASK;
+	svm->vmcb->control.v_irq = 0;
 	mark_dirty(svm->vmcb, VMCB_INTR);
 	++svm->vcpu.stat.irq_window_exits;
 	return 1;
@@ -3558,11 +3559,11 @@ static inline void svm_inject_irq(struct vcpu_svm *svm, int irq)
 {
 	struct vmcb_control_area *control;
 
+
 	control = &svm->vmcb->control;
 	control->int_vector = irq;
-	control->int_ctl &= ~V_INTR_PRIO_MASK;
-	control->int_ctl |= V_IRQ_MASK |
-		((/*control->int_vector >> 4*/ 0xf) << V_INTR_PRIO_SHIFT);
+	control->v_intr_prio = 0xf;
+	control->v_irq = 1;
 	mark_dirty(svm->vmcb, VMCB_INTR);
 }
 
@@ -3728,7 +3729,7 @@ static inline void sync_cr8_to_lapic(struct kvm_vcpu *vcpu)
 		return;
 
 	if (!is_cr_intercept(svm, INTERCEPT_CR8_WRITE)) {
-		int cr8 = svm->vmcb->control.int_ctl & V_TPR_MASK;
+		int cr8 = svm->vmcb->control.v_tpr & V_TPR_MASK;
 		kvm_set_cr8(vcpu, cr8);
 	}
 }
@@ -3742,8 +3743,7 @@ static inline void sync_lapic_to_cr8(struct kvm_vcpu *vcpu)
 		return;
 
 	cr8 = kvm_get_cr8(vcpu);
-	svm->vmcb->control.int_ctl &= ~V_TPR_MASK;
-	svm->vmcb->control.int_ctl |= cr8 & V_TPR_MASK;
+	svm->vmcb->control.v_tpr = cr8 & V_TPR_MASK;
 }
 
 static void svm_complete_interrupts(struct vcpu_svm *svm)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PART1 RFC 4/9] KVM: x86: Detect and Initialize AVIC support
  2016-02-12 13:59 [PART1 RFC 0/9] KVM: x86: Introduce SVM AVIC support Suravee Suthikulpanit
                   ` (2 preceding siblings ...)
  2016-02-12 13:59 ` [PART1 RFC 3/9] svm: clean up V_TPR, V_IRQ, V_INTR_PRIO, and V_INTR_MASKING Suravee Suthikulpanit
@ 2016-02-12 13:59 ` Suravee Suthikulpanit
  2016-02-12 14:13   ` Borislav Petkov
  2016-02-12 13:59 ` [PART1 RFC 5/9] svm: Add VMEXIT handlers for AVIC Suravee Suthikulpanit
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 52+ messages in thread
From: Suravee Suthikulpanit @ 2016-02-12 13:59 UTC (permalink / raw)
  To: joro, alex.williamson, gleb, pbonzini
  Cc: kvm, linux-kernel, wei, sherry.hurwitz, Suravee Suthikulpanit

From: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>

This patch introduces AVIC-related data structure, and AVIC
intitialization code.

There are three main data structures for AVIC:
    * Virtual APIC (vAPIC) backing page (per-VCPU)
    * Physical APIC ID table (per-VM)
    * Logical APIC ID table (per-VM)

In order to accommodate the new per-VM tables, we introduce
a new per-VM arch-specific void pointer, struct kvm_arch.arch_data.
This will point to the newly introduced struct svm_vm_data.

This patch also introduces code to detect the new new SVM feature CPUID
Fn8000_000A_EDX[13], which identifies support for AMD Advance Virtual
Interrupt Controller (AVIC).

Currently, AVIC is disabled by default. Users can manually
enable AVIC via kernel boot option kvm-amd.avic=1 or during
kvm-amd module loading with parameter avic=1.

Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
---
 arch/x86/include/asm/cpufeature.h |   1 +
 arch/x86/include/asm/kvm_host.h   |   2 +
 arch/x86/kernel/cpu/scattered.c   |   1 +
 arch/x86/kvm/svm.c                | 404 +++++++++++++++++++++++++++++++++++++-
 4 files changed, 407 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index 7ad8c94..ee85900 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -203,6 +203,7 @@
 
 #define X86_FEATURE_VMMCALL     ( 8*32+15) /* Prefer vmmcall to vmcall */
 #define X86_FEATURE_XENPV       ( 8*32+16) /* "" Xen paravirtual guest */
+#define X86_FEATURE_AVIC        ( 8*32+17) /* AMD Virtual Interrupt Controller support */
 
 
 /* Intel-defined CPU features, CPUID level 0x00000007:0 (ebx), word 9 */
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 44adbb8..7b78328 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -754,6 +754,8 @@ struct kvm_arch {
 
 	bool irqchip_split;
 	u8 nr_reserved_ioapic_pins;
+
+	void *arch_data;
 };
 
 struct kvm_vm_stat {
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index 8cb57df..88cfbe7 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -37,6 +37,7 @@ void init_scattered_cpuid_features(struct cpuinfo_x86 *c)
 		{ X86_FEATURE_HW_PSTATE,	CR_EDX, 7, 0x80000007, 0 },
 		{ X86_FEATURE_CPB,		CR_EDX, 9, 0x80000007, 0 },
 		{ X86_FEATURE_PROC_FEEDBACK,	CR_EDX,11, 0x80000007, 0 },
+		{ X86_FEATURE_AVIC,		CR_EDX,13, 0x8000000a, 0 },
 		{ 0, 0, 0, 0, 0 }
 	};
 
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index ca185fb..9440b48 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -78,6 +78,11 @@ MODULE_DEVICE_TABLE(x86cpu, svm_cpu_id);
 #define TSC_RATIO_MIN		0x0000000000000001ULL
 #define TSC_RATIO_MAX		0x000000ffffffffffULL
 
+#define AVIC_HPA_MASK	~((0xFFFULL << 52) || 0xFFF)
+
+/* NOTE: Current max index allowed for physical APIC ID table is 255 */
+#define AVIC_PHY_APIC_ID_MAX	0xFF
+
 static bool erratum_383_found __read_mostly;
 
 static const u32 host_save_user_msrs[] = {
@@ -162,6 +167,36 @@ struct vcpu_svm {
 
 	/* cached guest cpuid flags for faster access */
 	bool nrips_enabled	: 1;
+
+	struct page *avic_bk_page;
+};
+
+struct __attribute__ ((__packed__))
+svm_avic_log_ait_entry {
+	u32 guest_phy_apic_id	: 8,
+	    res			: 23,
+	    valid		: 1;
+};
+
+struct __attribute__ ((__packed__))
+svm_avic_phy_ait_entry {
+	u64 host_phy_apic_id	: 8,
+	    res1		: 4,
+	    bk_pg_ptr		: 40,
+	    res2		: 10,
+	    is_running		: 1,
+	    valid		: 1;
+};
+
+/* Note: This structure is per VM */
+struct svm_vm_data {
+	atomic_t count;
+	u32 ldr_mode;
+	u32 avic_max_vcpu_id;
+	u32 avic_tag;
+
+	struct page *avic_log_ait_page;
+	struct page *avic_phy_ait_page;
 };
 
 static DEFINE_PER_CPU(u64, current_tsc_ratio);
@@ -205,6 +240,10 @@ module_param(npt, int, S_IRUGO);
 static int nested = true;
 module_param(nested, int, S_IRUGO);
 
+/* enable / disable AVIC */
+static int avic = false;
+module_param(avic, int, S_IRUGO);
+
 static void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0);
 static void svm_flush_tlb(struct kvm_vcpu *vcpu);
 static void svm_complete_interrupts(struct vcpu_svm *svm);
@@ -234,6 +273,13 @@ enum {
 /* TPR and CR2 are always written before VMRUN */
 #define VMCB_ALWAYS_DIRTY_MASK	((1U << VMCB_INTR) | (1U << VMCB_CR2))
 
+#define VMCB_AVIC_APIC_BAR_MASK		0xFFFFFFFFFF000ULL
+
+static inline void avic_update_vapic_bar(struct vcpu_svm *svm, u64 data)
+{
+	svm->vmcb->control.avic_vapic_bar = data & VMCB_AVIC_APIC_BAR_MASK;
+}
+
 static inline void mark_all_dirty(struct vmcb *vmcb)
 {
 	vmcb->control.clean = 0;
@@ -923,6 +969,13 @@ static __init int svm_hardware_setup(void)
 	} else
 		kvm_disable_tdp();
 
+	if (avic && (!npt_enabled || !boot_cpu_has(X86_FEATURE_AVIC)))
+		avic = false;
+
+	if (avic) {
+		printk(KERN_INFO "kvm: AVIC enabled\n");
+	}
+
 	return 0;
 
 err:
@@ -1000,6 +1053,27 @@ static void svm_adjust_tsc_offset_guest(struct kvm_vcpu *vcpu, s64 adjustment)
 	mark_dirty(svm->vmcb, VMCB_INTERCEPTS);
 }
 
+static void avic_init_vmcb(struct vcpu_svm *svm)
+{
+	struct vmcb *vmcb = svm->vmcb;
+	struct svm_vm_data *vm_data = svm->vcpu.kvm->arch.arch_data;
+	phys_addr_t bpa = PFN_PHYS(page_to_pfn(svm->avic_bk_page));
+	phys_addr_t lpa = PFN_PHYS(page_to_pfn(vm_data->avic_log_ait_page));
+	phys_addr_t ppa = PFN_PHYS(page_to_pfn(vm_data->avic_phy_ait_page));
+
+	if (!vmcb)
+		return;
+
+	pr_debug("SVM: %s: bpa=%#llx, lpa=%#llx, ppa=%#llx\n",
+		 __func__, bpa, lpa, ppa);
+
+	vmcb->control.avic_enable = 1;
+	vmcb->control.avic_bk_page = bpa & AVIC_HPA_MASK;
+	vmcb->control.avic_log_apic_id = lpa & AVIC_HPA_MASK;
+	vmcb->control.avic_phy_apic_id = ppa & AVIC_HPA_MASK;
+	vmcb->control.avic_phy_apic_id |= AVIC_PHY_APIC_ID_MAX;
+}
+
 static void init_vmcb(struct vcpu_svm *svm)
 {
 	struct vmcb_control_area *control = &svm->vmcb->control;
@@ -1113,6 +1187,309 @@ static void init_vmcb(struct vcpu_svm *svm)
 	mark_all_dirty(svm->vmcb);
 
 	enable_gif(svm);
+
+	if (avic)
+		avic_init_vmcb(svm);
+}
+
+static struct svm_avic_phy_ait_entry *
+avic_get_phy_ait_entry(struct kvm_vcpu *vcpu, int index)
+{
+	struct svm_avic_phy_ait_entry *avic_phy_ait;
+	struct svm_vm_data *vm_data = vcpu->kvm->arch.arch_data;
+
+	if (!vm_data)
+		return NULL;
+
+	/* Note: APIC ID = 0xff is used for broadcast.
+	 *       APIC ID > 0xff is reserved.
+	 */
+	if (index >= 0xff)
+		return NULL;
+
+	avic_phy_ait = page_address(vm_data->avic_phy_ait_page);
+
+	return &avic_phy_ait[index];
+}
+
+struct svm_avic_log_ait_entry *
+avic_get_log_ait_entry(struct kvm_vcpu *vcpu, u8 mda, bool is_flat)
+{
+	struct svm_vm_data *vm_data = vcpu->kvm->arch.arch_data;
+	int index;
+	struct svm_avic_log_ait_entry *avic_log_ait;
+
+	if (!vm_data)
+		return NULL;
+
+	if (is_flat) { /* flat */
+		if (mda > 7)
+			return NULL;
+		index = mda;
+	} else { /* cluster */
+		int apic_id = mda & 0xf;
+		int cluster_id = (mda & 0xf0) >> 8;
+
+		if (apic_id > 4 || cluster_id >= 0xf)
+			return NULL;
+		index = (cluster_id << 2) + apic_id;
+	}
+	avic_log_ait = (struct svm_avic_log_ait_entry *)
+				page_address(vm_data->avic_log_ait_page);
+
+	return &avic_log_ait[index];
+}
+
+static inline void avic_set_bk_page_entry(struct vcpu_svm *svm, int reg_off, u32 val)
+{
+	void *avic_bk = page_address(svm->avic_bk_page);
+
+	*((u32 *) (avic_bk + reg_off)) = val;
+}
+
+static inline u32 *avic_get_bk_page_entry(struct vcpu_svm *svm, u32 offset)
+{
+	char *tmp = (char*)page_address(svm->avic_bk_page);
+
+	return (u32*)(tmp+offset);
+}
+
+static int avic_init_log_apic_entry(struct kvm_vcpu *vcpu, u8 g_phy_apic_id,
+				    u8 log_apic_id)
+{
+	u32 mod;
+	struct svm_avic_log_ait_entry *entry;
+	struct vcpu_svm *svm = to_svm(vcpu);
+
+	if (!svm)
+		return -EINVAL;
+
+	mod = (*avic_get_bk_page_entry(svm, APIC_DFR) >> 28) & 0xf;
+	entry = avic_get_log_ait_entry(vcpu, log_apic_id, (mod == 0xf));
+	if (!entry)
+		return -EINVAL;
+	entry->guest_phy_apic_id = g_phy_apic_id;
+	entry->valid = 1;
+
+	return 0;
+}
+
+static int avic_init_bk_page(struct kvm_vcpu *vcpu)
+{
+	int i;
+	u64 addr;
+	struct page *page;
+	int id = vcpu->vcpu_id;
+	struct kvm *kvm = vcpu->kvm;
+	struct kvm_lapic *apic = vcpu->arch.apic;
+	struct vcpu_svm *svm = to_svm(vcpu);
+
+	addr = APIC_DEFAULT_PHYS_BASE + (id * PAGE_SIZE);
+	page = gfn_to_page(kvm, addr >> PAGE_SHIFT);
+	if (is_error_page(page))
+		return -EFAULT;
+
+	/*
+	 * Do not pin the page in memory, so that memory hot-unplug
+	 * is able to migrate it.
+	 */
+	put_page(page);
+
+	/* Setting up AVIC Backing Page */
+	svm->avic_bk_page = page;
+	clear_page(kmap(page));
+	pr_debug("SVM: %s: vAPIC bk page: cpu=%u, addr=%#llx, pa=%#llx\n",
+		 __func__, id, addr,
+		 (unsigned long long) PFN_PHYS(page_to_pfn(page)));
+
+	avic_set_bk_page_entry(svm, APIC_ID, kvm_apic_get_reg(apic, APIC_ID));
+	avic_set_bk_page_entry(svm, APIC_LVR, kvm_apic_get_reg(apic, APIC_LVR));
+	for (i = 0; i < KVM_APIC_LVT_NUM; i++)
+		avic_set_bk_page_entry(svm, APIC_LVTT + 0x10 * i, APIC_LVT_MASKED);
+	avic_set_bk_page_entry(svm, APIC_LVT0,
+			 SET_APIC_DELIVERY_MODE(0, APIC_MODE_EXTINT));
+	avic_set_bk_page_entry(svm, APIC_DFR, 0xffffffffU);
+	avic_set_bk_page_entry(svm, APIC_SPIV, 0xff);
+	avic_set_bk_page_entry(svm, APIC_TASKPRI, 0);
+	avic_set_bk_page_entry(svm, APIC_LDR, kvm_apic_get_reg(apic, APIC_LDR));
+	avic_set_bk_page_entry(svm, APIC_ESR, 0);
+	avic_set_bk_page_entry(svm, APIC_ICR, 0);
+	avic_set_bk_page_entry(svm, APIC_ICR2, 0);
+	avic_set_bk_page_entry(svm, APIC_TDCR, 0);
+	avic_set_bk_page_entry(svm, APIC_TMICT, 0);
+	for (i = 0; i < 8; i++) {
+		avic_set_bk_page_entry(svm, APIC_IRR + 0x10 * i, 0);
+		avic_set_bk_page_entry(svm, APIC_ISR + 0x10 * i, 0);
+		avic_set_bk_page_entry(svm, APIC_TMR + 0x10 * i, 0);
+	}
+
+	avic_init_vmcb(svm);
+
+	return 0;
+}
+
+static inline void avic_unalloc_bk_page(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+
+	if (svm->avic_bk_page)
+		kunmap(svm->avic_bk_page);
+}
+
+static int avic_alloc_bk_page(struct vcpu_svm *svm, int id)
+{
+	int ret = 0, i;
+	bool realloc = false;
+	struct kvm_vcpu *vcpu;
+	struct kvm *kvm = svm->vcpu.kvm;
+	struct svm_vm_data *vm_data = kvm->arch.arch_data;
+
+	mutex_lock(&kvm->slots_lock);
+
+	/* Check if we have already allocated vAPIC backing
+	 * page for this vCPU. If not, we need to realloc
+	 * a new one and re-assign all other vCPU.
+	 */
+	if (kvm->arch.apic_access_page_done &&
+	    (id > vm_data->avic_max_vcpu_id)) {
+		kvm_for_each_vcpu(i, vcpu, kvm)
+			avic_unalloc_bk_page(vcpu);
+
+		__x86_set_memory_region(kvm,
+					APIC_ACCESS_PAGE_PRIVATE_MEMSLOT,
+					0 , 0);
+		realloc = true;
+		vm_data->avic_max_vcpu_id = 0;
+	}
+
+	/*
+	 * We are allocating vAPIC backing page
+	 * upto the max vCPU ID
+	 */
+	if (id >= vm_data->avic_max_vcpu_id) {
+		ret = __x86_set_memory_region(kvm,
+					      APIC_ACCESS_PAGE_PRIVATE_MEMSLOT,
+					      APIC_DEFAULT_PHYS_BASE,
+					      PAGE_SIZE * (id + 1));
+		if (ret)
+			goto out;
+
+		vm_data->avic_max_vcpu_id = id;
+	}
+
+	/* Reinit vAPIC backing page for exisinting vcpus */
+	if (realloc)
+		kvm_for_each_vcpu(i, vcpu, kvm)
+			avic_init_bk_page(vcpu);
+
+	avic_init_bk_page(&svm->vcpu);
+
+	kvm->arch.apic_access_page_done = true;
+
+out:
+	mutex_unlock(&kvm->slots_lock);
+	return ret;
+}
+
+static void avic_vm_uninit(struct kvm *kvm)
+{
+	struct svm_vm_data *vm_data = kvm->arch.arch_data;
+
+	if (!vm_data)
+		return;
+
+	if (vm_data->avic_log_ait_page)
+		__free_page(vm_data->avic_log_ait_page);
+	if (vm_data->avic_phy_ait_page)
+		__free_page(vm_data->avic_phy_ait_page);
+	kfree(vm_data);
+	kvm->arch.arch_data = NULL;
+}
+
+static void avic_vcpu_uninit(struct kvm_vcpu *vcpu)
+{
+	struct svm_vm_data *vm_data = vcpu->kvm->arch.arch_data;
+
+	avic_unalloc_bk_page(vcpu);
+
+	if (vm_data &&
+	    (atomic_read(&vm_data->count) == 0 ||
+	     atomic_dec_and_test(&vm_data->count)))
+		avic_vm_uninit(vcpu->kvm);
+}
+
+static atomic_t avic_tag_gen = ATOMIC_INIT(1);
+
+static inline u32 avic_get_next_tag(void)
+{
+	u32 tag = atomic_read(&avic_tag_gen);
+
+	atomic_inc(&avic_tag_gen);
+	return tag;
+}
+
+static int avic_vm_init(struct kvm *kvm)
+{
+	int err = -ENOMEM;
+	struct svm_vm_data *vm_data;
+	struct page *avic_phy_ait_page;
+	struct page *avic_log_ait_page;
+
+	vm_data = kzalloc(sizeof(struct svm_vm_data),
+				      GFP_KERNEL);
+	if (!vm_data)
+		return err;
+
+	kvm->arch.arch_data = vm_data;
+	atomic_set(&vm_data->count, 0);
+
+	/* Allocating physical APIC ID table (4KB) */
+	avic_phy_ait_page = alloc_page(GFP_KERNEL);
+	if (!avic_phy_ait_page)
+		goto free_avic;
+
+	vm_data->avic_phy_ait_page = avic_phy_ait_page;
+	clear_page(page_address(avic_phy_ait_page));
+
+	/* Allocating logical APIC ID table (4KB) */
+	avic_log_ait_page = alloc_page(GFP_KERNEL);
+	if (!avic_log_ait_page)
+		goto free_avic;
+
+	vm_data->avic_log_ait_page = avic_log_ait_page;
+	clear_page(page_address(avic_log_ait_page));
+
+	vm_data->avic_tag = avic_get_next_tag();
+
+	return 0;
+
+free_avic:
+	avic_vm_uninit(kvm);
+	return err;
+}
+
+static int avic_vcpu_init(struct kvm *kvm, struct vcpu_svm *svm, int id)
+{
+	int err;
+	struct svm_vm_data *vm_data = NULL;
+
+	/* Note: svm_vm_data is per VM */
+	if (!kvm->arch.arch_data) {
+		err = avic_vm_init(kvm);
+		if (err)
+			return err;
+	}
+
+	err = avic_alloc_bk_page(svm, id);
+	if (err) {
+		avic_vcpu_uninit(&svm->vcpu);
+		return err;
+	}
+
+	vm_data = kvm->arch.arch_data;
+	atomic_inc(&vm_data->count);
+
+	return 0;
 }
 
 static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
@@ -1131,6 +1508,9 @@ static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 
 	kvm_cpuid(vcpu, &eax, &dummy, &dummy, &dummy);
 	kvm_register_write(vcpu, VCPU_REGS_RDX, eax);
+
+	if (avic && !init_event)
+		avic_update_vapic_bar(svm, APIC_DEFAULT_PHYS_BASE);
 }
 
 static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id)
@@ -1169,6 +1549,12 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id)
 	if (!hsave_page)
 		goto free_page3;
 
+	if (avic) {
+		err = avic_vcpu_init(kvm, svm, id);
+		if (err)
+			goto free_page4;
+	}
+
 	svm->nested.hsave = page_address(hsave_page);
 
 	svm->msrpm = page_address(msrpm_pages);
@@ -1187,6 +1573,8 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id)
 
 	return &svm->vcpu;
 
+free_page4:
+	__free_page(hsave_page);
 free_page3:
 	__free_pages(nested_msrpm_pages, MSRPM_ALLOC_ORDER);
 free_page2:
@@ -1209,6 +1597,7 @@ static void svm_free_vcpu(struct kvm_vcpu *vcpu)
 	__free_pages(virt_to_page(svm->msrpm), MSRPM_ALLOC_ORDER);
 	__free_page(virt_to_page(svm->nested.hsave));
 	__free_pages(virt_to_page(svm->nested.msrpm), MSRPM_ALLOC_ORDER);
+	avic_vcpu_uninit(vcpu);
 	kvm_vcpu_uninit(vcpu);
 	kmem_cache_free(kvm_vcpu_cache, svm);
 }
@@ -3372,6 +3761,7 @@ static void dump_vmcb(struct kvm_vcpu *vcpu)
 	pr_err("%-20s%08x\n", "exit_int_info_err:", control->exit_int_info_err);
 	pr_err("%-20s%lld\n", "nested_ctl:", control->nested_ctl);
 	pr_err("%-20s%016llx\n", "nested_cr3:", control->nested_cr3);
+	pr_err("%-20s%016llx\n", "avic_vapic_bar:", control->avic_vapic_bar);
 	pr_err("%-20s%08x\n", "event_inj:", control->event_inj);
 	pr_err("%-20s%08x\n", "event_inj_err:", control->event_inj_err);
 	pr_err("%-20s%lld\n", "lbr_ctl:", control->lbr_ctl);
@@ -3603,7 +3993,17 @@ static void svm_set_virtual_x2apic_mode(struct kvm_vcpu *vcpu, bool set)
 
 static bool svm_get_enable_apicv(void)
 {
-	return false;
+	return avic;
+}
+
+static void svm_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr)
+{
+	return;
+}
+
+static void svm_hwapic_isr_update(struct kvm *kvm, int isr)
+{
+	return;
 }
 
 static void svm_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
@@ -4375,6 +4775,8 @@ static struct kvm_x86_ops svm_x86_ops = {
 	.refresh_apicv_exec_ctrl = svm_refresh_apicv_exec_ctrl,
 	.load_eoi_exitmap = svm_load_eoi_exitmap,
 	.sync_pir_to_irr = svm_sync_pir_to_irr,
+	.hwapic_irr_update = svm_hwapic_irr_update,
+	.hwapic_isr_update = svm_hwapic_isr_update,
 
 	.set_tss_addr = svm_set_tss_addr,
 	.get_tdp_level = get_npt_level,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PART1 RFC 5/9] svm: Add VMEXIT handlers for AVIC
  2016-02-12 13:59 [PART1 RFC 0/9] KVM: x86: Introduce SVM AVIC support Suravee Suthikulpanit
                   ` (3 preceding siblings ...)
  2016-02-12 13:59 ` [PART1 RFC 4/9] KVM: x86: Detect and Initialize AVIC support Suravee Suthikulpanit
@ 2016-02-12 13:59 ` Suravee Suthikulpanit
  2016-02-12 15:38   ` Paolo Bonzini
  2016-02-12 13:59 ` [PART1 RFC 6/9] svm: Add interrupt injection via AVIC Suravee Suthikulpanit
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 52+ messages in thread
From: Suravee Suthikulpanit @ 2016-02-12 13:59 UTC (permalink / raw)
  To: joro, alex.williamson, gleb, pbonzini
  Cc: kvm, linux-kernel, wei, sherry.hurwitz, Suravee Suthikulpanit,
	Suravee Suthikulpanit

Introduce VMEXIT handlers, avic_incp_ipi_interception() and
avic_noaccel_interception().

Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
---
 arch/x86/include/uapi/asm/svm.h |   9 +-
 arch/x86/kvm/svm.c              | 241 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 249 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
index 8a4add8..ebfdf8d 100644
--- a/arch/x86/include/uapi/asm/svm.h
+++ b/arch/x86/include/uapi/asm/svm.h
@@ -73,6 +73,8 @@
 #define SVM_EXIT_MWAIT_COND    0x08c
 #define SVM_EXIT_XSETBV        0x08d
 #define SVM_EXIT_NPF           0x400
+#define SVM_EXIT_AVIC_INCMP_IPI 0x401
+#define SVM_EXIT_AVIC_NOACCEL  0x402
 
 #define SVM_EXIT_ERR           -1
 
@@ -107,8 +109,10 @@
 	{ SVM_EXIT_SMI,         "smi" }, \
 	{ SVM_EXIT_INIT,        "init" }, \
 	{ SVM_EXIT_VINTR,       "vintr" }, \
+	{ SVM_EXIT_CR0_SEL_WRITE, "cr0_sec_write" }, \
 	{ SVM_EXIT_CPUID,       "cpuid" }, \
 	{ SVM_EXIT_INVD,        "invd" }, \
+	{ SVM_EXIT_PAUSE,       "pause" }, \
 	{ SVM_EXIT_HLT,         "hlt" }, \
 	{ SVM_EXIT_INVLPG,      "invlpg" }, \
 	{ SVM_EXIT_INVLPGA,     "invlpga" }, \
@@ -127,7 +131,10 @@
 	{ SVM_EXIT_MONITOR,     "monitor" }, \
 	{ SVM_EXIT_MWAIT,       "mwait" }, \
 	{ SVM_EXIT_XSETBV,      "xsetbv" }, \
-	{ SVM_EXIT_NPF,         "npf" }
+	{ SVM_EXIT_NPF,         "npf" }, \
+	{ SVM_EXIT_RSM,         "rsm" }, \
+	{ SVM_EXIT_AVIC_INCMP_IPI, "avic_incomp_ipi" }, \
+	{ SVM_EXIT_AVIC_NOACCEL,   "avic_noaccel" }
 
 
 #endif /* _UAPI__SVM_H */
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 9440b48..bedf52b 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3667,6 +3667,245 @@ static int mwait_interception(struct vcpu_svm *svm)
 	return nop_interception(svm);
 }
 
+enum avic_incmp_ipi_err_code {
+	AVIC_INCMP_IPI_ERR_INVALID_INT_TYPE,
+	AVIC_INCMP_IPI_ERR_TARGET_NOT_RUN,
+	AVIC_INCMP_IPI_ERR_INV_TARGET,
+	AVIC_INCMP_IPI_ERR_INV_BK_PAGE,
+};
+
+static int avic_incomp_ipi_interception(struct vcpu_svm *svm)
+{
+	u32 icrh = svm->vmcb->control.exit_info_1 >> 32;
+	u32 icrl = svm->vmcb->control.exit_info_1;
+	u32 id = svm->vmcb->control.exit_info_2 >> 32;
+	u32 index = svm->vmcb->control.exit_info_2 && 0xFF;
+	struct kvm_lapic *apic = svm->vcpu.arch.apic;
+
+	pr_debug("SVM: %s: cpu=%#x, vcpu=%#x, "
+		 "icrh:icrl=%#010x:%08x, id=%u, index=%u\n",
+		 __func__, svm->vcpu.cpu, svm->vcpu.vcpu_id,
+		 icrh, icrl, id, index);
+
+	switch (id) {
+	case AVIC_INCMP_IPI_ERR_INVALID_INT_TYPE:
+		/*
+		 * AVIC hardware handles the generation of
+		 * IPIs when the specified Message Type is Fixed
+		 * (also known as fixed delivery mode) and
+		 * the Trigger Mode is edge-triggered. The hardware
+		 * also supports self and broadcast delivery modes
+		 * specified via the Destination Shorthand(DSH)
+		 * field of the ICRL. Logical and physical APIC ID
+		 * formats are supported. All other IPI types cause
+		 * a #VMEXIT, which needs to emulated.
+		 */
+		kvm_lapic_reg_write(apic, APIC_ICR2, icrh);
+		kvm_lapic_reg_write(apic, APIC_ICR, icrl);
+		break;
+	case AVIC_INCMP_IPI_ERR_TARGET_NOT_RUN:
+		kvm_lapic_reg_write(apic, APIC_ICR2, icrh);
+		kvm_lapic_reg_write(apic, APIC_ICR, icrl);
+		break;
+	case AVIC_INCMP_IPI_ERR_INV_TARGET:
+		pr_err("SVM: %s: Invalid IPI target (icr=%#08x:%08x, idx=%u)\n",
+		       __func__, icrh, icrl, index);
+		BUG();
+		break;
+	case AVIC_INCMP_IPI_ERR_INV_BK_PAGE:
+		pr_err("SVM: %s: Invalid bk page (icr=%#08x:%08x, idx=%u)\n",
+		       __func__, icrh, icrl, index);
+		BUG();
+		break;
+	default:
+		pr_err("SVM: Unknown IPI interception\n");
+	}
+
+	return 1;
+}
+
+static int avic_noaccel_trap_write(struct vcpu_svm *svm)
+{
+	u32 offset = svm->vmcb->control.exit_info_1 & 0xFF0;
+	struct svm_vm_data *vm_data = svm->vcpu.kvm->arch.arch_data;
+	struct kvm_lapic *apic = svm->vcpu.arch.apic;
+	u32 reg = *avic_get_bk_page_entry(svm, offset);
+
+	pr_debug("SVN: %s: offset=%#x, val=%#x, (cpu=%x) (vcpu_id=%x)\n",
+		 __func__, offset, reg, svm->vcpu.cpu, svm->vcpu.vcpu_id);
+
+	switch (offset) {
+	case APIC_ID: {
+		u32 aid = (reg >> 24) & 0xff;
+		struct svm_avic_phy_ait_entry *o_ent =
+				avic_get_phy_ait_entry(&svm->vcpu, svm->vcpu.vcpu_id);
+		struct svm_avic_phy_ait_entry *n_ent =
+				avic_get_phy_ait_entry(&svm->vcpu, aid);
+
+		if (!n_ent || !o_ent)
+			return 0;
+
+		pr_debug("SVN: %s: APIC_ID=%#x (id=%x)\n", __func__, reg, aid);
+
+		/* We need to move phy_apic_entry to new offset */
+		*n_ent = *o_ent;
+		*((u64 *)o_ent) = 0ULL;
+		break;
+	}
+	case APIC_LDR: {
+		int ret, lid;
+		int dlid = (reg >> 24) & 0xff;
+
+		if (!dlid)
+			return 0;
+
+		lid = ffs(dlid) - 1;
+		pr_debug("SVM: %s: LDR=%0#10x (lid=%x)\n", __func__, reg, lid);
+		ret = avic_init_log_apic_entry(&svm->vcpu, svm->vcpu.vcpu_id,
+					       lid);
+		if (ret)
+			return 0;
+
+		break;
+	}
+	case APIC_DFR: {
+		u32 mod = (*avic_get_bk_page_entry(svm, offset) >> 28) & 0xf;
+
+		pr_debug("SVM: %s: DFR=%#x (%s)\n", __func__,
+			 mod, mod == 0xf? "flat": "cluster");
+
+		/*
+		 * We assume that all local APICs are using the same type.
+		 * If this changes, we need to rebuild the AVIC logical
+		 * APID id table with subsequent write to APIC_LDR.
+		 */
+		if (vm_data->ldr_mode != mod) {
+			clear_page(page_address(vm_data->avic_log_ait_page));
+			vm_data->ldr_mode = mod;
+		}
+		break;
+	}
+	case APIC_TMICT: {
+		u32 val = kvm_apic_get_reg(apic, APIC_TMICT);
+
+		pr_debug("SVM: %s: TMICT=%#x,%#x\n", __func__, val, reg);
+		break;
+	}
+	case APIC_ESR: {
+		u32 val = kvm_apic_get_reg(apic, APIC_ESR);
+
+		pr_debug("SVM: %s: ESR=%#x,%#x\n", __func__, val, reg);
+		break;
+	}
+	case APIC_LVTERR: {
+		u32 val = kvm_apic_get_reg(apic, APIC_LVTERR);
+
+		pr_debug("SVM: %s: LVTERR=%#x,%#x\n", __func__, val, reg);
+		break;
+	}
+	default:
+		break;
+	}
+
+	kvm_lapic_reg_write(apic, offset, reg);
+
+	return 1;
+}
+
+static int avic_noaccel_fault_read(struct vcpu_svm *svm)
+{
+	u32 val;
+	u32 offset = svm->vmcb->control.exit_info_1 & 0xFF0;
+	struct kvm_lapic *apic = svm->vcpu.arch.apic;
+
+	pr_debug("SVM: %s: offset=%x\n", __func__, offset);
+
+	switch (offset) {
+	case APIC_TMCCT: {
+		if (kvm_lapic_reg_read(apic, offset, 4, &val))
+			return 0;
+
+		pr_debug("SVM: %s: TMCCT: rip=%#lx, next_rip=%#llx, val=%#x)\n",
+			 __func__, kvm_rip_read(&svm->vcpu), svm->next_rip, val);
+
+		*avic_get_bk_page_entry(svm, offset) = val;
+		break;
+	}
+	default:
+		pr_debug("SVM: %s: (rip=%#lx), offset=%#x\n", __func__,
+			 kvm_rip_read(&svm->vcpu), offset);
+		break;
+	}
+
+	return 1;
+}
+
+static int avic_noaccel_fault_write(struct vcpu_svm *svm)
+{
+	u32 offset = svm->vmcb->control.exit_info_1 & 0xFF0;
+
+	pr_debug("SVM: %s: offset=%x\n", __func__, offset);
+
+	switch (offset) {
+	case APIC_ARBPRI: /* APR: Arbitration Priority Register */
+	case APIC_TMCCT: /* Timer Current Count */
+		/* TODO */
+		break;
+	default:
+		BUG();
+	}
+
+	return 1;
+}
+
+static int avic_noaccel_interception(struct vcpu_svm *svm)
+{
+	int ret = 0;
+	u32 offset = svm->vmcb->control.exit_info_1 & 0xFF0;
+	u32 rw = (svm->vmcb->control.exit_info_1 >> 32) & 0x1;
+	u32 vector = svm->vmcb->control.exit_info_2 & 0xFFFFFFFF;
+
+	pr_debug("SVM: %s: offset=%#x, rw=%#x, vector=%#x, vcpu_id=%#x, cpu=%#x\n",
+		 __func__, offset, rw, vector, svm->vcpu.vcpu_id, svm->vcpu.cpu);
+
+	BUG_ON (offset >= 0x400);
+
+	switch(offset) {
+	case APIC_ID:
+	case APIC_EOI:
+	case APIC_RRR:
+	case APIC_LDR:
+	case APIC_DFR:
+	case APIC_SPIV:
+	case APIC_ESR:
+	case APIC_ICR:
+	case APIC_LVTT:
+	case APIC_LVTTHMR:
+	case APIC_LVTPC:
+	case APIC_LVT0:
+	case APIC_LVT1:
+	case APIC_LVTERR:
+	case APIC_TMICT:
+	case APIC_TDCR: {
+		/* Handling Trap */
+		if (!rw) /* Trap read should never happens */
+			BUG();
+		ret = avic_noaccel_trap_write(svm);
+		break;
+	}
+	default: {
+		/* Handling Fault */
+		if (rw)
+			ret = avic_noaccel_fault_write(svm);
+		else
+			ret = avic_noaccel_fault_read(svm);
+		skip_emulated_instruction(&svm->vcpu);
+	}
+	}
+
+	return ret;
+}
+
 static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = {
 	[SVM_EXIT_READ_CR0]			= cr_interception,
 	[SVM_EXIT_READ_CR3]			= cr_interception,
@@ -3730,6 +3969,8 @@ static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = {
 	[SVM_EXIT_XSETBV]			= xsetbv_interception,
 	[SVM_EXIT_NPF]				= pf_interception,
 	[SVM_EXIT_RSM]                          = emulate_on_interception,
+	[SVM_EXIT_AVIC_INCMP_IPI]		= avic_incomp_ipi_interception,
+	[SVM_EXIT_AVIC_NOACCEL]			= avic_noaccel_interception,
 };
 
 static void dump_vmcb(struct kvm_vcpu *vcpu)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PART1 RFC 6/9] svm: Add interrupt injection via AVIC
  2016-02-12 13:59 [PART1 RFC 0/9] KVM: x86: Introduce SVM AVIC support Suravee Suthikulpanit
                   ` (4 preceding siblings ...)
  2016-02-12 13:59 ` [PART1 RFC 5/9] svm: Add VMEXIT handlers for AVIC Suravee Suthikulpanit
@ 2016-02-12 13:59 ` Suravee Suthikulpanit
  2016-02-12 14:16   ` Borislav Petkov
  2016-02-12 15:55   ` [PART1 RFC 6/9] svm: Add interrupt injection via AVIC Paolo Bonzini
  2016-02-12 13:59 ` [PART1 RFC 7/9] svm: Do not expose x2APIC when enable AVIC Suravee Suthikulpanit
                   ` (3 subsequent siblings)
  9 siblings, 2 replies; 52+ messages in thread
From: Suravee Suthikulpanit @ 2016-02-12 13:59 UTC (permalink / raw)
  To: joro, alex.williamson, gleb, pbonzini
  Cc: kvm, linux-kernel, wei, sherry.hurwitz, Suravee Suthikulpanit,
	Suravee Suthikulpanit

From: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>

VINTR is not supported when enable AVIC. Therefore, we need to inject
interrupt via APIC backing page instead. Also, adding AVIC doorbell
support to signal running vcpu to check IRR for injected interrupts.

This patch also introduces kvm_x86_ops.apicv_intr_pending() to allow SVM
to provide a function hook to query AVIC interrupt pending status.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
 arch/x86/include/asm/kvm_host.h  |   2 +
 arch/x86/include/asm/msr-index.h |   1 +
 arch/x86/kvm/lapic.c             |   3 +-
 arch/x86/kvm/svm.c               | 110 +++++++++++++++++++++++++++++++++++++--
 arch/x86/kvm/x86.c               |   4 +-
 5 files changed, 114 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 7b78328..a7c8852 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -973,6 +973,8 @@ struct kvm_x86_ops {
 	void (*post_block)(struct kvm_vcpu *vcpu);
 	int (*update_pi_irte)(struct kvm *kvm, unsigned int host_irq,
 			      uint32_t guest_irq, bool set);
+
+	bool (*apicv_intr_pending)(struct kvm_vcpu *vcpu);
 };
 
 struct kvm_arch_async_pf {
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index b05402e..605b869 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -284,6 +284,7 @@
 #define MSR_AMD64_TSC_RATIO		0xc0000104
 #define MSR_AMD64_NB_CFG		0xc001001f
 #define MSR_AMD64_PATCH_LOADER		0xc0010020
+#define MSR_AMD64_AVIC_DOORBELL		0xc001011b
 #define MSR_AMD64_OSVW_ID_LENGTH	0xc0010140
 #define MSR_AMD64_OSVW_STATUS		0xc0010141
 #define MSR_AMD64_LS_CFG		0xc0011020
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index fc313a0..f6deb04 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1828,7 +1828,8 @@ int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu)
 	struct kvm_lapic *apic = vcpu->arch.apic;
 	int highest_irr;
 
-	if (!kvm_vcpu_has_lapic(vcpu) || !apic_enabled(apic))
+	if (!kvm_vcpu_has_lapic(vcpu) || !apic_enabled(apic) ||
+	    kvm_x86_ops->apicv_intr_pending)
 		return -1;
 
 	apic_update_ppr(apic);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index bedf52b..5d7b049 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -169,6 +169,7 @@ struct vcpu_svm {
 	bool nrips_enabled	: 1;
 
 	struct page *avic_bk_page;
+	atomic_t avic_pending_cnt;
 };
 
 struct __attribute__ ((__packed__))
@@ -232,6 +233,8 @@ static bool npt_enabled = true;
 static bool npt_enabled;
 #endif
 
+static struct kvm_x86_ops svm_x86_ops;
+
 /* allow nested paging (virtualized MMU) for all guests */
 static int npt = true;
 module_param(npt, int, S_IRUGO);
@@ -974,6 +977,9 @@ static __init int svm_hardware_setup(void)
 
 	if (avic) {
 		printk(KERN_INFO "kvm: AVIC enabled\n");
+	} else {
+		svm_x86_ops.deliver_posted_interrupt = NULL;
+		svm_x86_ops.apicv_intr_pending = NULL;
 	}
 
 	return 0;
@@ -1188,8 +1194,10 @@ static void init_vmcb(struct vcpu_svm *svm)
 
 	enable_gif(svm);
 
-	if (avic)
+	if (avic) {
 		avic_init_vmcb(svm);
+		atomic_set(&svm->avic_pending_cnt, 0);
+	}
 }
 
 static struct svm_avic_phy_ait_entry *
@@ -3059,8 +3067,10 @@ static int clgi_interception(struct vcpu_svm *svm)
 	disable_gif(svm);
 
 	/* After a CLGI no interrupts should come */
-	svm_clear_vintr(svm);
-	svm->vmcb->control.v_irq = 0;
+	if (!avic) {
+		svm_clear_vintr(svm);
+		svm->vmcb->control.v_irq = 0;
+	}
 
 	mark_dirty(svm->vmcb, VMCB_INTR);
 
@@ -3635,6 +3645,9 @@ static int msr_interception(struct vcpu_svm *svm)
 
 static int interrupt_window_interception(struct vcpu_svm *svm)
 {
+	if (avic)
+		BUG_ON(1);
+
 	kvm_make_request(KVM_REQ_EVENT, &svm->vcpu);
 	svm_clear_vintr(svm);
 	svm->vmcb->control.v_irq = 0;
@@ -4190,7 +4203,7 @@ static inline void svm_inject_irq(struct vcpu_svm *svm, int irq)
 {
 	struct vmcb_control_area *control;
 
-
+	/* The following fields are ignored when AVIC is enabled */
 	control = &svm->vmcb->control;
 	control->int_vector = irq;
 	control->v_intr_prio = 0xf;
@@ -4261,6 +4274,27 @@ static void svm_sync_pir_to_irr(struct kvm_vcpu *vcpu)
 	return;
 }
 
+static void svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+
+	kvm_lapic_set_vector(vec, avic_get_bk_page_entry(svm, APIC_IRR));
+
+	/* Note:
+	 * This gives us a hint to check for pending interrupts
+	 * during #VMEXIT.
+	 */
+	atomic_inc(&svm->avic_pending_cnt);
+
+	if (vcpu->mode == IN_GUEST_MODE) {
+		wrmsrl(MSR_AMD64_AVIC_DOORBELL,
+		       __default_cpu_present_to_apicid(vcpu->cpu));
+	} else {
+		kvm_make_request(KVM_REQ_EVENT, vcpu);
+		kvm_vcpu_kick(vcpu);
+	}
+}
+
 static int svm_nmi_allowed(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
@@ -4321,6 +4355,9 @@ static void enable_irq_window(struct kvm_vcpu *vcpu)
 	 * get that intercept, this function will be called again though and
 	 * we'll get the vintr intercept.
 	 */
+	if (avic)
+		return;
+
 	if (gif_set(svm) && nested_svm_intr(svm)) {
 		svm_set_vintr(svm);
 		svm_inject_irq(svm, 0x0);
@@ -4462,6 +4499,67 @@ static void svm_cancel_injection(struct kvm_vcpu *vcpu)
 	svm_complete_interrupts(svm);
 }
 
+static bool avic_check_irr_pending(struct kvm_vcpu *vcpu)
+{
+	int i;
+	u32 irr;
+	struct vcpu_svm *svm = to_svm(vcpu);
+
+	for (i = 0; i < 8; i++) {
+		irr = *(avic_get_bk_page_entry(svm,
+					APIC_IRR + (0x10 * i)));
+		if (irr)
+			return true;
+	}
+
+	return false;
+}
+
+static bool svm_avic_check_ppr(struct vcpu_svm *svm)
+{
+	u32 tpr = *(avic_get_bk_page_entry(svm, APIC_TASKPRI));
+	u32 ppr = *(avic_get_bk_page_entry(svm, APIC_PROCPRI));
+
+	if (ppr && (ppr != tpr))
+		return true;
+
+	return false;
+}
+
+/* Note: Returns true means do not block */
+static bool svm_apicv_intr_pending (struct kvm_vcpu *vcpu)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+
+	if (!avic)
+		return false;
+
+	if (atomic_read(&svm->avic_pending_cnt))
+		return true;
+
+	return avic_check_irr_pending(vcpu);
+}
+
+static void avic_post_vmrun(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+
+	if (!avic)
+		return;
+
+	if (atomic_read(&svm->avic_pending_cnt)) {
+		if (svm_avic_check_ppr(svm))
+			return;
+		if (avic_check_irr_pending(vcpu))
+			return;
+		/*
+		 * At this point, if there is no interrupt pending.
+		 * So, we decrement the pending count
+		 */
+		atomic_dec(&svm->avic_pending_cnt);
+	}
+}
+
 static void svm_vcpu_run(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
@@ -4588,6 +4686,8 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
 	if (unlikely(svm->vmcb->control.exit_code == SVM_EXIT_NMI))
 		kvm_after_handle_nmi(&svm->vcpu);
 
+	avic_post_vmrun(vcpu);
+
 	sync_cr8_to_lapic(vcpu);
 
 	svm->next_rip = 0;
@@ -5050,7 +5150,9 @@ static struct kvm_x86_ops svm_x86_ops = {
 
 	.sched_in = svm_sched_in,
 
+	.apicv_intr_pending = svm_apicv_intr_pending,
 	.pmu_ops = &amd_pmu_ops,
+	.deliver_posted_interrupt = svm_deliver_avic_intr,
 };
 
 static int __init svm_init(void)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4244c2b..2def290 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8087,7 +8087,9 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
 	if (is_guest_mode(vcpu) && kvm_x86_ops->check_nested_events)
 		kvm_x86_ops->check_nested_events(vcpu, false);
 
-	return kvm_vcpu_running(vcpu) || kvm_vcpu_has_events(vcpu);
+	return (kvm_vcpu_running(vcpu) || kvm_vcpu_has_events(vcpu) ||
+		(kvm_x86_ops->apicv_intr_pending &&
+		 kvm_x86_ops->apicv_intr_pending(vcpu)));
 }
 
 int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PART1 RFC 7/9] svm: Do not expose x2APIC when enable AVIC
  2016-02-12 13:59 [PART1 RFC 0/9] KVM: x86: Introduce SVM AVIC support Suravee Suthikulpanit
                   ` (5 preceding siblings ...)
  2016-02-12 13:59 ` [PART1 RFC 6/9] svm: Add interrupt injection via AVIC Suravee Suthikulpanit
@ 2016-02-12 13:59 ` Suravee Suthikulpanit
  2016-02-12 13:59 ` [PART1 RFC 8/9] svm: Do not intercept CR8 " Suravee Suthikulpanit
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 52+ messages in thread
From: Suravee Suthikulpanit @ 2016-02-12 13:59 UTC (permalink / raw)
  To: joro, alex.williamson, gleb, pbonzini
  Cc: kvm, linux-kernel, wei, sherry.hurwitz, Suravee Suthikulpanit,
	Suravee Suthikulpanit

Since AVIC only virtualizes xAPIC hardware for the guest, we need to:
    * Intercept APIC BAR msr accesses to disable x2APIC
    * Intercept CPUID access to not advertise x2APIC support
    * Hide x2APIC support when checking via KVM ioctl

Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
---
 arch/x86/kvm/svm.c | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 5d7b049..0998e67 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -223,6 +223,7 @@ static const struct svm_direct_access_msrs {
 	{ .index = MSR_IA32_LASTBRANCHTOIP,		.always = false },
 	{ .index = MSR_IA32_LASTINTFROMIP,		.always = false },
 	{ .index = MSR_IA32_LASTINTTOIP,		.always = false },
+	{ .index = MSR_IA32_APICBASE,			.always = false },
 	{ .index = MSR_INVALID,				.always = false },
 };
 
@@ -850,6 +851,9 @@ static void svm_vcpu_init_msrpm(u32 *msrpm)
 
 		set_msr_interception(msrpm, direct_access_msrs[i].index, 1, 1);
 	}
+
+	if (avic)
+		set_msr_interception(msrpm, MSR_IA32_APICBASE, 1, 1);
 }
 
 static void add_msr_offset(u32 offset)
@@ -3480,6 +3484,18 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 			msr_info->data = 0x1E;
 		}
 		break;
+	case MSR_IA32_APICBASE:
+		if (avic) {
+			/* Note:
+			 * For AVIC, we need to disable X2APIC
+			 * and enable XAPIC
+			 */
+			kvm_get_msr_common(vcpu, msr_info);
+			msr_info->data &= ~X2APIC_ENABLE;
+			msr_info->data |= XAPIC_ENABLE;
+			break;
+		}
+		/* Follow through if not AVIC */
 	default:
 		return kvm_get_msr_common(vcpu, msr_info);
 	}
@@ -3608,6 +3624,10 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
 	case MSR_VM_IGNNE:
 		vcpu_unimpl(vcpu, "unimplemented wrmsr: 0x%x data 0x%llx\n", ecx, data);
 		break;
+	case MSR_IA32_APICBASE:
+		if (avic)
+			avic_update_vapic_bar(to_svm(vcpu), data);
+		/* Follow through */
 	default:
 		return kvm_set_msr_common(vcpu, msr);
 	}
@@ -4785,11 +4805,26 @@ static void svm_cpuid_update(struct kvm_vcpu *vcpu)
 
 	/* Update nrips enabled cache */
 	svm->nrips_enabled = !!guest_cpuid_has_nrips(&svm->vcpu);
+
+	/* Do not support X2APIC when enable AVIC */
+	if (avic) {
+		int i;
+
+		for (i = 0 ; i < vcpu->arch.cpuid_nent ; i++) {
+			if (vcpu->arch.cpuid_entries[i].function == 1)
+				vcpu->arch.cpuid_entries[i].ecx &= ~(1 << 21);
+		}
+	}
 }
 
 static void svm_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 *entry)
 {
 	switch (func) {
+	case 0x00000001:
+		/* Do not support X2APIC when enable AVIC */
+		if (avic)
+			entry->ecx &= ~(1 << 21);
+		break;
 	case 0x80000001:
 		if (nested)
 			entry->ecx |= (1 << 2); /* Set SVM bit */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PART1 RFC 8/9] svm: Do not intercept CR8 when enable AVIC
  2016-02-12 13:59 [PART1 RFC 0/9] KVM: x86: Introduce SVM AVIC support Suravee Suthikulpanit
                   ` (6 preceding siblings ...)
  2016-02-12 13:59 ` [PART1 RFC 7/9] svm: Do not expose x2APIC when enable AVIC Suravee Suthikulpanit
@ 2016-02-12 13:59 ` Suravee Suthikulpanit
  2016-02-12 15:48   ` Paolo Bonzini
  2016-02-12 13:59 ` [PART1 RFC 9/9] svm: Manage vcpu load/unload " Suravee Suthikulpanit
  2016-02-12 18:13 ` [PART1 RFC 0/9] KVM: x86: Introduce SVM AVIC support Paolo Bonzini
  9 siblings, 1 reply; 52+ messages in thread
From: Suravee Suthikulpanit @ 2016-02-12 13:59 UTC (permalink / raw)
  To: joro, alex.williamson, gleb, pbonzini
  Cc: kvm, linux-kernel, wei, sherry.hurwitz, Suravee Suthikulpanit

From: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>

When enable AVIC:
    * Do not intercept CR8 since this should be handled by AVIC HW.
    * Also update TPR in APIC backing page when syncing CR8 before VMRUN

Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
---
 arch/x86/kvm/svm.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 0998e67..32da657 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -981,6 +981,9 @@ static __init int svm_hardware_setup(void)
 
 	if (avic) {
 		printk(KERN_INFO "kvm: AVIC enabled\n");
+
+		/* Do not do cr8 intercept if AVIC is enabled. */
+		svm_x86_ops.update_cr8_intercept = NULL;
 	} else {
 		svm_x86_ops.deliver_posted_interrupt = NULL;
 		svm_x86_ops.apicv_intr_pending = NULL;
@@ -1098,7 +1101,8 @@ static void init_vmcb(struct vcpu_svm *svm)
 	set_cr_intercept(svm, INTERCEPT_CR0_WRITE);
 	set_cr_intercept(svm, INTERCEPT_CR3_WRITE);
 	set_cr_intercept(svm, INTERCEPT_CR4_WRITE);
-	set_cr_intercept(svm, INTERCEPT_CR8_WRITE);
+	if (!avic)
+		set_cr_intercept(svm, INTERCEPT_CR8_WRITE);
 
 	set_dr_intercepts(svm);
 
@@ -4248,7 +4252,8 @@ static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 
-	if (is_guest_mode(vcpu) && (vcpu->arch.hflags & HF_VINTR_MASK))
+	if ((is_guest_mode(vcpu) && (vcpu->arch.hflags & HF_VINTR_MASK)) ||
+	     avic)
 		return;
 
 	clr_cr_intercept(svm, INTERCEPT_CR8_WRITE);
@@ -4440,8 +4445,10 @@ static inline void sync_lapic_to_cr8(struct kvm_vcpu *vcpu)
 	if (is_guest_mode(vcpu) && (vcpu->arch.hflags & HF_VINTR_MASK))
 		return;
 
-	cr8 = kvm_get_cr8(vcpu);
-	svm->vmcb->control.v_tpr = cr8 & V_TPR_MASK;
+	svm->vmcb->control.v_tpr = cr8 = kvm_get_cr8(vcpu) & V_TPR_MASK;
+
+	if (avic)
+		*(avic_get_bk_page_entry(svm, APIC_TASKPRI)) = (u32)cr8 << 4;
 }
 
 static void svm_complete_interrupts(struct vcpu_svm *svm)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PART1 RFC 9/9] svm: Manage vcpu load/unload when enable AVIC
  2016-02-12 13:59 [PART1 RFC 0/9] KVM: x86: Introduce SVM AVIC support Suravee Suthikulpanit
                   ` (7 preceding siblings ...)
  2016-02-12 13:59 ` [PART1 RFC 8/9] svm: Do not intercept CR8 " Suravee Suthikulpanit
@ 2016-02-12 13:59 ` Suravee Suthikulpanit
  2016-02-12 15:46   ` Paolo Bonzini
  2016-02-12 18:13 ` [PART1 RFC 0/9] KVM: x86: Introduce SVM AVIC support Paolo Bonzini
  9 siblings, 1 reply; 52+ messages in thread
From: Suravee Suthikulpanit @ 2016-02-12 13:59 UTC (permalink / raw)
  To: joro, alex.williamson, gleb, pbonzini
  Cc: kvm, linux-kernel, wei, sherry.hurwitz, Suravee Suthikulpanit,
	Suravee Suthikulpanit

When a vcpu is loaded/unloaded to a physical core, we need to update
information in the Physical APIC-ID table accordingly.

Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
---
 arch/x86/kvm/svm.c | 61 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 61 insertions(+)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 32da657..41e68d2 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -32,6 +32,7 @@
 #include <linux/trace_events.h>
 #include <linux/slab.h>
 
+#include <asm/apic.h>
 #include <asm/perf_event.h>
 #include <asm/tlbflush.h>
 #include <asm/desc.h>
@@ -1508,6 +1509,61 @@ static int avic_vcpu_init(struct kvm *kvm, struct vcpu_svm *svm, int id)
 	return 0;
 }
 
+static inline int avic_update_iommu(struct kvm_vcpu *vcpu, int cpu,
+				    phys_addr_t pa, bool is_running)
+{
+	if (!kvm_arch_has_assigned_device(vcpu->kvm))
+		return 0;
+
+	/* TODO: We will hook up with IOMMU API at later time */
+	return 0;
+}
+
+static int avic_set_running(struct kvm_vcpu *vcpu, int cpu, bool is_running)
+{
+	int g_phy_apic_id, h_phy_apic_id;
+	struct svm_avic_phy_ait_entry *entry;
+	struct vcpu_svm *svm = to_svm(vcpu);
+	int ret;
+
+	if (!avic)
+		return 0;
+
+	if (!svm)
+		return -EINVAL;
+
+	/* Note: APIC ID = 0xff is used for broadcast.
+	 *       APIC ID > 0xff is reserved.
+	 */
+	g_phy_apic_id = vcpu->vcpu_id;
+	h_phy_apic_id = __default_cpu_present_to_apicid(cpu);
+
+	if ((g_phy_apic_id >= AVIC_PHY_APIC_ID_MAX) ||
+	    (h_phy_apic_id >= AVIC_PHY_APIC_ID_MAX))
+		return -EINVAL;
+
+	entry = avic_get_phy_ait_entry(vcpu, g_phy_apic_id);
+	if (!entry)
+		return -EINVAL;
+
+	if (is_running) {
+		phys_addr_t pa = PFN_PHYS(page_to_pfn(svm->avic_bk_page));
+
+		entry->bk_pg_ptr = (pa >> 12) & 0xffffffffff;
+		entry->valid = 1;
+		entry->host_phy_apic_id = h_phy_apic_id;
+		barrier();
+		entry->is_running = is_running;
+		ret = avic_update_iommu(vcpu, h_phy_apic_id, pa, is_running);
+	} else {
+		ret = avic_update_iommu(vcpu, 0, 0, is_running);
+		barrier();
+		entry->is_running = is_running;
+	}
+
+	return ret;
+}
+
 static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
@@ -1628,6 +1684,8 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 		mark_all_dirty(svm->vmcb);
 	}
 
+	avic_set_running(vcpu, cpu, true);
+
 #ifdef CONFIG_X86_64
 	rdmsrl(MSR_GS_BASE, to_svm(vcpu)->host.gs_base);
 #endif
@@ -1668,6 +1726,9 @@ static void svm_vcpu_put(struct kvm_vcpu *vcpu)
 #endif
 	for (i = 0; i < NR_HOST_SAVE_USER_MSRS; i++)
 		wrmsrl(host_save_user_msrs[i], svm->host_user_msrs[i]);
+
+	avic_set_running(vcpu, 0, false);
+
 }
 
 static unsigned long svm_get_rflags(struct kvm_vcpu *vcpu)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PART1 RFC 4/9] KVM: x86: Detect and Initialize AVIC support
  2016-02-12 13:59 ` [PART1 RFC 4/9] KVM: x86: Detect and Initialize AVIC support Suravee Suthikulpanit
@ 2016-02-12 14:13   ` Borislav Petkov
  2016-02-12 15:46     ` Suravee Suthikulpanit
  0 siblings, 1 reply; 52+ messages in thread
From: Borislav Petkov @ 2016-02-12 14:13 UTC (permalink / raw)
  To: Suravee Suthikulpanit
  Cc: joro, alex.williamson, gleb, pbonzini, kvm, linux-kernel, wei,
	sherry.hurwitz

On Fri, Feb 12, 2016 at 08:59:29PM +0700, Suravee Suthikulpanit wrote:
> From: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
> 
> This patch introduces AVIC-related data structure, and AVIC
> intitialization code.
> 
> There are three main data structures for AVIC:
>     * Virtual APIC (vAPIC) backing page (per-VCPU)
>     * Physical APIC ID table (per-VM)
>     * Logical APIC ID table (per-VM)
> 
> In order to accommodate the new per-VM tables, we introduce
> a new per-VM arch-specific void pointer, struct kvm_arch.arch_data.
> This will point to the newly introduced struct svm_vm_data.
> 
> This patch also introduces code to detect the new new SVM feature CPUID
> Fn8000_000A_EDX[13], which identifies support for AMD Advance Virtual
> Interrupt Controller (AVIC).
> 
> Currently, AVIC is disabled by default. Users can manually
> enable AVIC via kernel boot option kvm-amd.avic=1 or during
> kvm-amd module loading with parameter avic=1.
> 
> Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
> ---
>  arch/x86/include/asm/cpufeature.h |   1 +
>  arch/x86/include/asm/kvm_host.h   |   2 +
>  arch/x86/kernel/cpu/scattered.c   |   1 +
>  arch/x86/kvm/svm.c                | 404 +++++++++++++++++++++++++++++++++++++-
>  4 files changed, 407 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
> index 7ad8c94..ee85900 100644
> --- a/arch/x86/include/asm/cpufeature.h
> +++ b/arch/x86/include/asm/cpufeature.h
> @@ -203,6 +203,7 @@
>  
>  #define X86_FEATURE_VMMCALL     ( 8*32+15) /* Prefer vmmcall to vmcall */
>  #define X86_FEATURE_XENPV       ( 8*32+16) /* "" Xen paravirtual guest */
> +#define X86_FEATURE_AVIC        ( 8*32+17) /* AMD Virtual Interrupt Controller support */
>  
>  
>  /* Intel-defined CPU features, CPUID level 0x00000007:0 (ebx), word 9 */
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 44adbb8..7b78328 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -754,6 +754,8 @@ struct kvm_arch {
>  
>  	bool irqchip_split;
>  	u8 nr_reserved_ioapic_pins;
> +
> +	void *arch_data;
>  };
>  
>  struct kvm_vm_stat {
> diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
> index 8cb57df..88cfbe7 100644
> --- a/arch/x86/kernel/cpu/scattered.c
> +++ b/arch/x86/kernel/cpu/scattered.c
> @@ -37,6 +37,7 @@ void init_scattered_cpuid_features(struct cpuinfo_x86 *c)
>  		{ X86_FEATURE_HW_PSTATE,	CR_EDX, 7, 0x80000007, 0 },
>  		{ X86_FEATURE_CPB,		CR_EDX, 9, 0x80000007, 0 },
>  		{ X86_FEATURE_PROC_FEEDBACK,	CR_EDX,11, 0x80000007, 0 },
> +		{ X86_FEATURE_AVIC,		CR_EDX,13, 0x8000000a, 0 },
>  		{ 0, 0, 0, 0, 0 }
>  	};

You need to check tip/master when/before/after touching arch/x86/:

a1ff57260818 ("x86/cpufeature: Add AMD AVIC bit")

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PART1 RFC 6/9] svm: Add interrupt injection via AVIC
  2016-02-12 13:59 ` [PART1 RFC 6/9] svm: Add interrupt injection via AVIC Suravee Suthikulpanit
@ 2016-02-12 14:16   ` Borislav Petkov
  2016-02-12 15:54     ` Suravee Suthikulpanit
  2016-02-12 15:55   ` [PART1 RFC 6/9] svm: Add interrupt injection via AVIC Paolo Bonzini
  1 sibling, 1 reply; 52+ messages in thread
From: Borislav Petkov @ 2016-02-12 14:16 UTC (permalink / raw)
  To: Suravee Suthikulpanit
  Cc: joro, alex.williamson, gleb, pbonzini, kvm, linux-kernel, wei,
	sherry.hurwitz

On Fri, Feb 12, 2016 at 08:59:31PM +0700, Suravee Suthikulpanit wrote:
> From: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
> 
> VINTR is not supported when enable AVIC. Therefore, we need to inject
> interrupt via APIC backing page instead. Also, adding AVIC doorbell
> support to signal running vcpu to check IRR for injected interrupts.
> 
> This patch also introduces kvm_x86_ops.apicv_intr_pending() to allow SVM
> to provide a function hook to query AVIC interrupt pending status.
> 
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> ---
>  arch/x86/include/asm/kvm_host.h  |   2 +
>  arch/x86/include/asm/msr-index.h |   1 +
>  arch/x86/kvm/lapic.c             |   3 +-
>  arch/x86/kvm/svm.c               | 110 +++++++++++++++++++++++++++++++++++++--
>  arch/x86/kvm/x86.c               |   4 +-
>  5 files changed, 114 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 7b78328..a7c8852 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -973,6 +973,8 @@ struct kvm_x86_ops {
>  	void (*post_block)(struct kvm_vcpu *vcpu);
>  	int (*update_pi_irte)(struct kvm *kvm, unsigned int host_irq,
>  			      uint32_t guest_irq, bool set);
> +
> +	bool (*apicv_intr_pending)(struct kvm_vcpu *vcpu);
>  };
>  
>  struct kvm_arch_async_pf {
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index b05402e..605b869 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -284,6 +284,7 @@
>  #define MSR_AMD64_TSC_RATIO		0xc0000104
>  #define MSR_AMD64_NB_CFG		0xc001001f
>  #define MSR_AMD64_PATCH_LOADER		0xc0010020
> +#define MSR_AMD64_AVIC_DOORBELL		0xc001011b

AFAICT, that MSR is being used only in arch/x86/kvm/lapic.c

Please add it there.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PART1 RFC 5/9] svm: Add VMEXIT handlers for AVIC
  2016-02-12 13:59 ` [PART1 RFC 5/9] svm: Add VMEXIT handlers for AVIC Suravee Suthikulpanit
@ 2016-02-12 15:38   ` Paolo Bonzini
  2016-02-15 19:22     ` Radim Krčmář
  2016-02-16  6:29     ` Suravee Suthikulpanit
  0 siblings, 2 replies; 52+ messages in thread
From: Paolo Bonzini @ 2016-02-12 15:38 UTC (permalink / raw)
  To: Suravee Suthikulpanit, joro, alex.williamson, gleb
  Cc: kvm, linux-kernel, wei, sherry.hurwitz



On 12/02/2016 14:59, Suravee Suthikulpanit wrote:
> +		 "icrh:icrl=%#010x:%08x, id=%u, index=%u\n",
> +		 __func__, svm->vcpu.cpu, svm->vcpu.vcpu_id,
> +		 icrh, icrl, id, index);
> +
> +	switch (id) {
> +	case AVIC_INCMP_IPI_ERR_INVALID_INT_TYPE:
> +		/*
> +		 * AVIC hardware handles the generation of
> +		 * IPIs when the specified Message Type is Fixed
> +		 * (also known as fixed delivery mode) and
> +		 * the Trigger Mode is edge-triggered. The hardware
> +		 * also supports self and broadcast delivery modes
> +		 * specified via the Destination Shorthand(DSH)
> +		 * field of the ICRL. Logical and physical APIC ID
> +		 * formats are supported. All other IPI types cause
> +		 * a #VMEXIT, which needs to emulated.
> +		 */
> +		kvm_lapic_reg_write(apic, APIC_ICR2, icrh);
> +		kvm_lapic_reg_write(apic, APIC_ICR, icrl);
> +		break;
> +	case AVIC_INCMP_IPI_ERR_TARGET_NOT_RUN:
> +		kvm_lapic_reg_write(apic, APIC_ICR2, icrh);
> +		kvm_lapic_reg_write(apic, APIC_ICR, icrl);

Wouldn't this cause a double injection of the IPI if the following happens:

1) destination 1 is running, so the processor sets IRR and sends a
doorbell message

2) destination 2 is not running, so the processor sets IRR and exits

3) destination 1 processes the interrupt, moving it from IRR to ISR

4) destination 1 sends an EOI

5) the source exits and reinjects the interrupt

6) destination 1 then receives the interrupt again.


Or alternatively:

1) destination 1 is not running, so the processor sets IRR and exits

2) another CPU executes VMRUN for destination 1, so the processor
injects the interrupt

3) destination 1 sends an EOI

4) the source exits and reinjects the interrupt

5) destination 1 then receives the interrupt again.


The handling of races for IsRunning and incomplete IPIs has always been
very confusing to me whenever I read the AVIC specification.  It would
be great if you could clarify this.

Paolo

> +		break;
> +	case AVIC_INCMP_IPI_ERR_INV_TARGET:
> +		pr_err("SVM: %s: Invalid IPI target (icr=%#08x:%08x, idx=%u)\n",
> +		       __func__, icrh, icrl, index);
> +		BUG();
> +		break;
> +	case AVIC_INCMP_IPI_ERR_INV_BK_PAGE:
> +		pr_err("SVM: %s: Invalid bk page (icr=%#08x:%08x, idx=%u)\n",
> +		       __func__, icrh, icrl, index);
> +		BUG();

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PART1 RFC 4/9] KVM: x86: Detect and Initialize AVIC support
  2016-02-12 14:13   ` Borislav Petkov
@ 2016-02-12 15:46     ` Suravee Suthikulpanit
  0 siblings, 0 replies; 52+ messages in thread
From: Suravee Suthikulpanit @ 2016-02-12 15:46 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: joro, alex.williamson, gleb, pbonzini, kvm, linux-kernel, wei,
	sherry.hurwitz

Hi,

On 02/12/2016 09:13 PM, Borislav Petkov wrote:
>> diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
>> >index 8cb57df..88cfbe7 100644
>> >--- a/arch/x86/kernel/cpu/scattered.c
>> >+++ b/arch/x86/kernel/cpu/scattered.c
>> >@@ -37,6 +37,7 @@ void init_scattered_cpuid_features(struct cpuinfo_x86 *c)
>> >  		{ X86_FEATURE_HW_PSTATE,	CR_EDX, 7, 0x80000007, 0 },
>> >  		{ X86_FEATURE_CPB,		CR_EDX, 9, 0x80000007, 0 },
>> >  		{ X86_FEATURE_PROC_FEEDBACK,	CR_EDX,11, 0x80000007, 0 },
>> >+		{ X86_FEATURE_AVIC,		CR_EDX,13, 0x8000000a, 0 },
>> >  		{ 0, 0, 0, 0, 0 }
>> >  	};
> You need to check tip/master when/before/after touching arch/x86/:
>
> a1ff57260818 ("x86/cpufeature: Add AMD AVIC bit")

Ok, I'll check that in the next patch series.

Thanks,
Suravee

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PART1 RFC 9/9] svm: Manage vcpu load/unload when enable AVIC
  2016-02-12 13:59 ` [PART1 RFC 9/9] svm: Manage vcpu load/unload " Suravee Suthikulpanit
@ 2016-02-12 15:46   ` Paolo Bonzini
  0 siblings, 0 replies; 52+ messages in thread
From: Paolo Bonzini @ 2016-02-12 15:46 UTC (permalink / raw)
  To: Suravee Suthikulpanit, joro, alex.williamson, gleb
  Cc: kvm, linux-kernel, wei, sherry.hurwitz



On 12/02/2016 14:59, Suravee Suthikulpanit wrote:
> +
> +	if (is_running) {
> +		phys_addr_t pa = PFN_PHYS(page_to_pfn(svm->avic_bk_page));
> +
> +		entry->bk_pg_ptr = (pa >> 12) & 0xffffffffff;
> +		entry->valid = 1;
> +		entry->host_phy_apic_id = h_phy_apic_id;
> +		barrier();
> +		entry->is_running = is_running;

I'm not sure if you can rely on the compiler doing the right thing here.
 I would prefer something like:

	new_entry = READ_ONCE(entry);
	new_entry.bk_pg_ptr = (pa >> 12) & 0xffffffffff;
	new_entry.valid = 1;
	new_entry.host_phy_apic_id = h_phy_apic_id;
	new_entry.is_running = is_running;
	WRITE_ONCE(entry, new_entry);

Paolo

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PART1 RFC 8/9] svm: Do not intercept CR8 when enable AVIC
  2016-02-12 13:59 ` [PART1 RFC 8/9] svm: Do not intercept CR8 " Suravee Suthikulpanit
@ 2016-02-12 15:48   ` Paolo Bonzini
  0 siblings, 0 replies; 52+ messages in thread
From: Paolo Bonzini @ 2016-02-12 15:48 UTC (permalink / raw)
  To: Suravee Suthikulpanit, joro, alex.williamson, gleb
  Cc: kvm, linux-kernel, wei, sherry.hurwitz



On 12/02/2016 14:59, Suravee Suthikulpanit wrote:
> From: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
> 
> When enable AVIC:
>     * Do not intercept CR8 since this should be handled by AVIC HW.
>     * Also update TPR in APIC backing page when syncing CR8 before VMRUN
> 
> Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>

Note that latest KVM has a mechanism to disable APIC virtualization on a
per-CPU basis (kvm-vcpu_deactivate_apicv).  You need to implement this too.

Paolo

> ---
>  arch/x86/kvm/svm.c | 15 +++++++++++----
>  1 file changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 0998e67..32da657 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -981,6 +981,9 @@ static __init int svm_hardware_setup(void)
>  
>  	if (avic) {
>  		printk(KERN_INFO "kvm: AVIC enabled\n");
> +
> +		/* Do not do cr8 intercept if AVIC is enabled. */
> +		svm_x86_ops.update_cr8_intercept = NULL;
>  	} else {
>  		svm_x86_ops.deliver_posted_interrupt = NULL;
>  		svm_x86_ops.apicv_intr_pending = NULL;
> @@ -1098,7 +1101,8 @@ static void init_vmcb(struct vcpu_svm *svm)
>  	set_cr_intercept(svm, INTERCEPT_CR0_WRITE);
>  	set_cr_intercept(svm, INTERCEPT_CR3_WRITE);
>  	set_cr_intercept(svm, INTERCEPT_CR4_WRITE);
> -	set_cr_intercept(svm, INTERCEPT_CR8_WRITE);
> +	if (!avic)
> +		set_cr_intercept(svm, INTERCEPT_CR8_WRITE);
>  
>  	set_dr_intercepts(svm);
>  
> @@ -4248,7 +4252,8 @@ static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
>  
> -	if (is_guest_mode(vcpu) && (vcpu->arch.hflags & HF_VINTR_MASK))
> +	if ((is_guest_mode(vcpu) && (vcpu->arch.hflags & HF_VINTR_MASK)) ||
> +	     avic)
>  		return;
>  
>  	clr_cr_intercept(svm, INTERCEPT_CR8_WRITE);
> @@ -4440,8 +4445,10 @@ static inline void sync_lapic_to_cr8(struct kvm_vcpu *vcpu)
>  	if (is_guest_mode(vcpu) && (vcpu->arch.hflags & HF_VINTR_MASK))
>  		return;
>  
> -	cr8 = kvm_get_cr8(vcpu);
> -	svm->vmcb->control.v_tpr = cr8 & V_TPR_MASK;
> +	svm->vmcb->control.v_tpr = cr8 = kvm_get_cr8(vcpu) & V_TPR_MASK;
> +
> +	if (avic)
> +		*(avic_get_bk_page_entry(svm, APIC_TASKPRI)) = (u32)cr8 << 4;
>  }
>  
>  static void svm_complete_interrupts(struct vcpu_svm *svm)
> 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PART1 RFC 6/9] svm: Add interrupt injection via AVIC
  2016-02-12 14:16   ` Borislav Petkov
@ 2016-02-12 15:54     ` Suravee Suthikulpanit
  2016-02-12 17:14       ` Borislav Petkov
  0 siblings, 1 reply; 52+ messages in thread
From: Suravee Suthikulpanit @ 2016-02-12 15:54 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: joro, alex.williamson, gleb, pbonzini, kvm, linux-kernel, wei,
	sherry.hurwitz

Hi,

On 02/12/2016 09:16 PM, Borislav Petkov wrote:
>> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
>> >index b05402e..605b869 100644
>> >--- a/arch/x86/include/asm/msr-index.h
>> >+++ b/arch/x86/include/asm/msr-index.h
>> >@@ -284,6 +284,7 @@
>> >  #define MSR_AMD64_TSC_RATIO		0xc0000104
>> >  #define MSR_AMD64_NB_CFG		0xc001001f
>> >  #define MSR_AMD64_PATCH_LOADER		0xc0010020
>> >+#define MSR_AMD64_AVIC_DOORBELL		0xc001011b
> AFAICT, that MSR is being used only in arch/x86/kvm/lapic.c
>
> Please add it there.

Do you mean in the arch/x86/kvm/svm.c? If so, sure, I'll only put it there.

Suravee

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PART1 RFC 6/9] svm: Add interrupt injection via AVIC
  2016-02-12 13:59 ` [PART1 RFC 6/9] svm: Add interrupt injection via AVIC Suravee Suthikulpanit
  2016-02-12 14:16   ` Borislav Petkov
@ 2016-02-12 15:55   ` Paolo Bonzini
  2016-02-12 16:21     ` Suravee Suthikulpanit
  1 sibling, 1 reply; 52+ messages in thread
From: Paolo Bonzini @ 2016-02-12 15:55 UTC (permalink / raw)
  To: Suravee Suthikulpanit, joro, alex.williamson, gleb
  Cc: kvm, linux-kernel, wei, sherry.hurwitz



On 12/02/2016 14:59, Suravee Suthikulpanit wrote:
> +static bool avic_check_irr_pending(struct kvm_vcpu *vcpu)
> +{
> +	int i;
> +	u32 irr;
> +	struct vcpu_svm *svm = to_svm(vcpu);
> +
> +	for (i = 0; i < 8; i++) {
> +		irr = *(avic_get_bk_page_entry(svm,
> +					APIC_IRR + (0x10 * i)));
> +		if (irr)
> +			return true;
> +	}
> +
> +	return false;
> +}
> +
> +static bool svm_avic_check_ppr(struct vcpu_svm *svm)
> +{
> +	u32 tpr = *(avic_get_bk_page_entry(svm, APIC_TASKPRI));
> +	u32 ppr = *(avic_get_bk_page_entry(svm, APIC_PROCPRI));
> +
> +	if (ppr && (ppr != tpr))
> +		return true;
> +
> +	return false;
> +}
> +
> +/* Note: Returns true means do not block */
> +static bool svm_apicv_intr_pending (struct kvm_vcpu *vcpu)
> +{
> +	struct vcpu_svm *svm = to_svm(vcpu);
> +
> +	if (!avic)
> +		return false;
> +
> +	if (atomic_read(&svm->avic_pending_cnt))
> +		return true;
> +
> +	return avic_check_irr_pending(vcpu);
> +}
> +
> +static void avic_post_vmrun(struct kvm_vcpu *vcpu)
> +{
> +	struct vcpu_svm *svm = to_svm(vcpu);
> +
> +	if (!avic)
> +		return;
> +
> +	if (atomic_read(&svm->avic_pending_cnt)) {
> +		if (svm_avic_check_ppr(svm))
> +			return;
> +		if (avic_check_irr_pending(vcpu))
> +			return;
> +		/*
> +		 * At this point, if there is no interrupt pending.
> +		 * So, we decrement the pending count
> +		 */
> +		atomic_dec(&svm->avic_pending_cnt);
> +	}
> +}
> +
>  static void svm_vcpu_run(struct kvm_vcpu *vcpu)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
> @@ -4588,6 +4686,8 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
>  	if (unlikely(svm->vmcb->control.exit_code == SVM_EXIT_NMI))
>  		kvm_after_handle_nmi(&svm->vcpu);
>  
> +	avic_post_vmrun(vcpu);
> +
>  	sync_cr8_to_lapic(vcpu);
>  
>  	svm->next_rip = 0;
> @@ -5050,7 +5150,9 @@ static struct kvm_x86_ops svm_x86_ops = {
>  
>  	.sched_in = svm_sched_in,
>  
> +	.apicv_intr_pending = svm_apicv_intr_pending,
>  	.pmu_ops = &amd_pmu_ops,
> +	.deliver_posted_interrupt = svm_deliver_avic_intr,
>  };
>  
>  static int __init svm_init(void)
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 4244c2b..2def290 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -8087,7 +8087,9 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
>  	if (is_guest_mode(vcpu) && kvm_x86_ops->check_nested_events)
>  		kvm_x86_ops->check_nested_events(vcpu, false);
>  
> -	return kvm_vcpu_running(vcpu) || kvm_vcpu_has_events(vcpu);
> +	return (kvm_vcpu_running(vcpu) || kvm_vcpu_has_events(vcpu) ||
> +		(kvm_x86_ops->apicv_intr_pending &&
> +		 kvm_x86_ops->apicv_intr_pending(vcpu)));
>  }

I think this is not necessary.  What you need is to make kvm_lapic's
regs field point to the backing page.  Then when the processor writes to
IRR, kvm_apic_has_interrupt (called through kvm_vcpu_has_events) will
see it.

avic_pending_cnt shouldn't be necessary either.

Paolo

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PART1 RFC 6/9] svm: Add interrupt injection via AVIC
  2016-02-12 15:55   ` [PART1 RFC 6/9] svm: Add interrupt injection via AVIC Paolo Bonzini
@ 2016-02-12 16:21     ` Suravee Suthikulpanit
  2016-02-12 18:19       ` Paolo Bonzini
  0 siblings, 1 reply; 52+ messages in thread
From: Suravee Suthikulpanit @ 2016-02-12 16:21 UTC (permalink / raw)
  To: Paolo Bonzini, joro, alex.williamson, gleb
  Cc: kvm, linux-kernel, wei, sherry.hurwitz

Hi Paolo,

On 02/12/2016 10:55 PM, Paolo Bonzini wrote:
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> >index 4244c2b..2def290 100644
>> >--- a/arch/x86/kvm/x86.c
>> >+++ b/arch/x86/kvm/x86.c
>> >@@ -8087,7 +8087,9 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
>> >  	if (is_guest_mode(vcpu) && kvm_x86_ops->check_nested_events)
>> >  		kvm_x86_ops->check_nested_events(vcpu, false);
>> >
>> >-	return kvm_vcpu_running(vcpu) || kvm_vcpu_has_events(vcpu);
>> >+	return (kvm_vcpu_running(vcpu) || kvm_vcpu_has_events(vcpu) ||
>> >+		(kvm_x86_ops->apicv_intr_pending &&
>> >+		 kvm_x86_ops->apicv_intr_pending(vcpu)));
>> >  }
> I think this is not necessary.  What you need is to make kvm_lapic's
> regs field point to the backing page.  Then when the processor writes to
> IRR, kvm_apic_has_interrupt (called through kvm_vcpu_has_events) will
> see it.
>
> avic_pending_cnt shouldn't be necessary either.
>
> Paolo

So, the other thing I am using the avic_pending_cnt for is for the part 
2 of the series (to enable AVIC support in IOMMU) that I am planning to 
send out later. However, it might be good to discuss this at this point.

When the IOMMU cannot inject interrupts into the guest vcpu due to it is 
not running (therefore, it cannot doorbell the vcpu directly), it logs 
the interrupt in the GA log buffer. Then it generates interrupt to 
notify the IOMMU driver that it needs to handle the log entry. Here, the 
IOMMU driver will end up notifying the SVM to scheduling the VCPU in to 
process interrupt.

Here, I have run into issue where the vcpu often goes into idle (i.e. 
scheduled out), and ended up causing IOMMU to generate a lot of the 
entries in the GA log. This really hurts device pass-through performance 
(e.g. for XGBE NIC).

So, what I ended up experimenting with is to set the avic_pending_cnt to 
a larger value (i.e. avic_ga_log_threshold) whenever we processing the 
GA log entry. The intention is to delay the vcpu schedule out in 
expecting that there might be more interrupts coming in soon. I also 
make this threshold value tunable as a module_param.

This actually works well in my experiment, where I can actually get 
about 5% speed up in my netperf test on XGBE NIC pass-through test.
However, I am not sure if this is an acceptable approach. Actually, I 
think it's similar to the halt_poll_ns, but specifically for IOMMU GA 
log in this case.

Let me know what you think.

Thanks,
Suravee

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PART1 RFC 6/9] svm: Add interrupt injection via AVIC
  2016-02-12 15:54     ` Suravee Suthikulpanit
@ 2016-02-12 17:14       ` Borislav Petkov
  2016-02-12 18:21         ` Paolo Bonzini
  0 siblings, 1 reply; 52+ messages in thread
From: Borislav Petkov @ 2016-02-12 17:14 UTC (permalink / raw)
  To: Suravee Suthikulpanit
  Cc: joro, alex.williamson, gleb, pbonzini, kvm, linux-kernel, wei,
	sherry.hurwitz

On Fri, Feb 12, 2016 at 10:54:43PM +0700, Suravee Suthikulpanit wrote:
> Do you mean in the arch/x86/kvm/svm.c?

Ah, it is svm.c. Yes, where it is being used.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PART1 RFC 0/9] KVM: x86: Introduce SVM AVIC support
  2016-02-12 13:59 [PART1 RFC 0/9] KVM: x86: Introduce SVM AVIC support Suravee Suthikulpanit
                   ` (8 preceding siblings ...)
  2016-02-12 13:59 ` [PART1 RFC 9/9] svm: Manage vcpu load/unload " Suravee Suthikulpanit
@ 2016-02-12 18:13 ` Paolo Bonzini
  2016-02-12 19:55   ` Suravee Suthikulpanit
  9 siblings, 1 reply; 52+ messages in thread
From: Paolo Bonzini @ 2016-02-12 18:13 UTC (permalink / raw)
  To: Suravee Suthikulpanit, joro, alex.williamson, gleb
  Cc: kvm, linux-kernel, wei, sherry.hurwitz

I have a few questions about AVIC.

On 12/02/2016 14:59, Suravee Suthikulpanit wrote:
> CURRENT UNSUPPORT USE-CASES
> ===========================
>     - Nested VM
>     - VM Migration

I'm interested in what you mean with "VM migration".  I've noticed that,
because x2APIC is disabled by default, it's possible that a VM on a
non-AVIC host will fail when moved to an AVIC host.  I would prefer if
you kept x2APIC working through VMEXITs.  Getting the full performance
benefit would require disabling x2APIC of course (or waiting for
improved support in the processor), but at least there would be no surprise.

Is it just this, or is there anything else?

(I'm quite surprised that x2APIC is not supported.  MSRs are faster than
memory accesses due to the cost of TLB misses, which can be hefty for VMs).

I have a few other doubts about the architecture that hopefully you (or
Wei Huang too) can clarify before I can review this patch meaningfully.
 I have replied to other patches with these questions.

Paolo

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PART1 RFC 6/9] svm: Add interrupt injection via AVIC
  2016-02-12 16:21     ` Suravee Suthikulpanit
@ 2016-02-12 18:19       ` Paolo Bonzini
  2016-02-12 19:36         ` Suravee Suthikulpanit
  2016-02-19 11:57         ` Suravee Suthikulpanit
  0 siblings, 2 replies; 52+ messages in thread
From: Paolo Bonzini @ 2016-02-12 18:19 UTC (permalink / raw)
  To: Suravee Suthikulpanit, joro, alex.williamson, gleb
  Cc: kvm, linux-kernel, wei, sherry.hurwitz



On 12/02/2016 17:21, Suravee Suthikulpanit wrote:
> Hi Paolo,
> 
> On 02/12/2016 10:55 PM, Paolo Bonzini wrote:
>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>> >index 4244c2b..2def290 100644
>>> >--- a/arch/x86/kvm/x86.c
>>> >+++ b/arch/x86/kvm/x86.c
>>> >@@ -8087,7 +8087,9 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
>>> >      if (is_guest_mode(vcpu) && kvm_x86_ops->check_nested_events)
>>> >          kvm_x86_ops->check_nested_events(vcpu, false);
>>> >
>>> >-    return kvm_vcpu_running(vcpu) || kvm_vcpu_has_events(vcpu);
>>> >+    return (kvm_vcpu_running(vcpu) || kvm_vcpu_has_events(vcpu) ||
>>> >+        (kvm_x86_ops->apicv_intr_pending &&
>>> >+         kvm_x86_ops->apicv_intr_pending(vcpu)));
>>> >  }
>> I think this is not necessary.  What you need is to make kvm_lapic's
>> regs field point to the backing page.  Then when the processor writes to
>> IRR, kvm_apic_has_interrupt (called through kvm_vcpu_has_events) will
>> see it.
>>
>> avic_pending_cnt shouldn't be necessary either.
>>
>> Paolo
> 
> So, the other thing I am using the avic_pending_cnt for is for the part
> 2 of the series (to enable AVIC support in IOMMU) that I am planning to
> send out later. However, it might be good to discuss this at this point.

It's better to discuss it later.  For now, I would prefer the AVIC
patches to be as clean as possible, and not know about the IOMMU at all.
 Also, there are a lot of assumptions about how to use kvm_lapic's regs
field for APIC virtualization---dating back to when Intel only
virtualized the TPR field.  Deviating for that would be a recipe for
trouble. :)

Regarding the IOMMU, I'm actually very happy with the way the Intel VT-d
posted interrupts patches worked out, so I would be even more happy if
everything you do fits in the same scheme and reuses the same hooks! :D

> When the IOMMU cannot inject interrupts into the guest vcpu due to it is
> not running (therefore, it cannot doorbell the vcpu directly), it logs
> the interrupt in the GA log buffer.

Where is this documented?

> Then it generates interrupt to
> notify the IOMMU driver that it needs to handle the log entry. Here, the
> IOMMU driver will end up notifying the SVM to scheduling the VCPU in to
> process interrupt.
> 
> Here, I have run into issue where the vcpu often goes into idle (i.e.
> scheduled out), and ended up causing IOMMU to generate a lot of the
> entries in the GA log. This really hurts device pass-through performance
> (e.g. for XGBE NIC).
> 
> So, what I ended up experimenting with is to set the avic_pending_cnt to
> a larger value (i.e. avic_ga_log_threshold) whenever we processing the
> GA log entry. The intention is to delay the vcpu schedule out in
> expecting that there might be more interrupts coming in soon. I also
> make this threshold value tunable as a module_param.
> 
> This actually works well in my experiment, where I can actually get
> about 5% speed up in my netperf test on XGBE NIC pass-through test.
> However, I am not sure if this is an acceptable approach. Actually, I
> think it's similar to the halt_poll_ns, but specifically for IOMMU GA
> log in this case.

Have you retested now that the halt_poll_ns mechanism is dynamic and
enabled by default?  If I read patch 9 right, halt_poll_ns would delay
vcpu_put and IsRunning=0.  Hopefully this is enough to avoid this kind
of notification and make the issue moot.

Paolo

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PART1 RFC 6/9] svm: Add interrupt injection via AVIC
  2016-02-12 17:14       ` Borislav Petkov
@ 2016-02-12 18:21         ` Paolo Bonzini
  2016-02-12 18:30           ` Borislav Petkov
  0 siblings, 1 reply; 52+ messages in thread
From: Paolo Bonzini @ 2016-02-12 18:21 UTC (permalink / raw)
  To: Borislav Petkov, Suravee Suthikulpanit
  Cc: joro, alex.williamson, gleb, kvm, linux-kernel, wei, sherry.hurwitz



On 12/02/2016 18:14, Borislav Petkov wrote:
> On Fri, Feb 12, 2016 at 10:54:43PM +0700, Suravee Suthikulpanit wrote:
>> > Do you mean in the arch/x86/kvm/svm.c?
> Ah, it is svm.c. Yes, where it is being used.

Hmm, currently things such as MSR_VM_HSAVE_PA are defined in msr-index.h.

It's okay for me to move them to kvm_host.h or similar, but they should
all be treated the same.  Right now this means adding the doorbell MSR
to msr-index.h.

Paolo

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PART1 RFC 6/9] svm: Add interrupt injection via AVIC
  2016-02-12 18:21         ` Paolo Bonzini
@ 2016-02-12 18:30           ` Borislav Petkov
  2016-02-12 18:56             ` Paolo Bonzini
  0 siblings, 1 reply; 52+ messages in thread
From: Borislav Petkov @ 2016-02-12 18:30 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Suravee Suthikulpanit, joro, alex.williamson, gleb, kvm,
	linux-kernel, wei, sherry.hurwitz

On Fri, Feb 12, 2016 at 07:21:17PM +0100, Paolo Bonzini wrote:
> Hmm, currently things such as MSR_VM_HSAVE_PA are defined in msr-index.h.

But that one is used in 3 files AFAICT.

> It's okay for me to move them to kvm_host.h or similar, but they should
> all be treated the same.  Right now this means adding the doorbell MSR
> to msr-index.h.

Only if it is used in multiple files. The doorbell thing is used once in
svm.c.

We don't want to make msr-index.h an encyclopedia of any MSR ever
defined :-)

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PART1 RFC 6/9] svm: Add interrupt injection via AVIC
  2016-02-12 18:30           ` Borislav Petkov
@ 2016-02-12 18:56             ` Paolo Bonzini
  2016-02-12 19:33               ` Borislav Petkov
  0 siblings, 1 reply; 52+ messages in thread
From: Paolo Bonzini @ 2016-02-12 18:56 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Suravee Suthikulpanit, joro, alex.williamson, gleb, kvm,
	linux-kernel, wei, sherry.hurwitz



On 12/02/2016 19:30, Borislav Petkov wrote:
> On Fri, Feb 12, 2016 at 07:21:17PM +0100, Paolo Bonzini wrote:
>> Hmm, currently things such as MSR_VM_HSAVE_PA are defined in msr-index.h.
> 
> But that one is used in 3 files AFAICT.

Ok, next examples: MSR_VM_CR and MSR_VM_IGNNE. :)

>> It's okay for me to move them to kvm_host.h or similar, but they should
>> all be treated the same.  Right now this means adding the doorbell MSR
>> to msr-index.h.
> 
> Only if it is used in multiple files. The doorbell thing is used once in
> svm.c.
> 
> We don't want to make msr-index.h an encyclopedia of any MSR ever
> defined :-)

Are you okay with moving all the SVM MSRs to virtext.h instead?

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PART1 RFC 6/9] svm: Add interrupt injection via AVIC
  2016-02-12 18:56             ` Paolo Bonzini
@ 2016-02-12 19:33               ` Borislav Petkov
  2016-02-16  7:50                 ` Ingo Molnar
  0 siblings, 1 reply; 52+ messages in thread
From: Borislav Petkov @ 2016-02-12 19:33 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Suravee Suthikulpanit, joro, alex.williamson, gleb, kvm,
	linux-kernel, wei, sherry.hurwitz, x86-ml

On Fri, Feb 12, 2016 at 07:56:59PM +0100, Paolo Bonzini wrote:
> Ok, next examples: MSR_VM_CR and MSR_VM_IGNNE. :)

I knew you were going to dig out some. :-)

> Are you okay with moving all the SVM MSRs to virtext.h instead?

So I would not move any now and cause unnecessary churn. I think it
should be enough if we agree on a strategy wrt msr-index.h and then
follow it. I think we should do something similar to pci_ids.h.

Let me add tip guys to CC.

---
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 552346598dab..75a5bb61d32f 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -1,7 +1,12 @@
 #ifndef _ASM_X86_MSR_INDEX_H
 #define _ASM_X86_MSR_INDEX_H
 
-/* CPU model specific register (MSR) numbers */
+/*
+ * CPU model specific register (MSR) numbers.
+ *
+ * Do not add new entries to this file unless the definitions are shared
+ * between multiple compilation units.
+ */
 
 /* x86-64 specific MSRs */
 #define MSR_EFER		0xc0000080 /* extended feature register */


-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PART1 RFC 6/9] svm: Add interrupt injection via AVIC
  2016-02-12 18:19       ` Paolo Bonzini
@ 2016-02-12 19:36         ` Suravee Suthikulpanit
  2016-02-19 11:57         ` Suravee Suthikulpanit
  1 sibling, 0 replies; 52+ messages in thread
From: Suravee Suthikulpanit @ 2016-02-12 19:36 UTC (permalink / raw)
  To: Paolo Bonzini, joro, alex.williamson, gleb
  Cc: kvm, linux-kernel, wei, sherry.hurwitz

Hi

On 2/13/16 01:19, Paolo Bonzini wrote:
>> When the IOMMU cannot inject interrupts into the guest vcpu due to it is
>> >not running (therefore, it cannot doorbell the vcpu directly), it logs
>> >the interrupt in the GA log buffer.
> Where is this documented?
>

http://support.amd.com/TechDocs/48882_IOMMU.pdf

Regards,
Suravee

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PART1 RFC 0/9] KVM: x86: Introduce SVM AVIC support
  2016-02-12 18:13 ` [PART1 RFC 0/9] KVM: x86: Introduce SVM AVIC support Paolo Bonzini
@ 2016-02-12 19:55   ` Suravee Suthikulpanit
  2016-02-12 20:05     ` Paolo Bonzini
  0 siblings, 1 reply; 52+ messages in thread
From: Suravee Suthikulpanit @ 2016-02-12 19:55 UTC (permalink / raw)
  To: Paolo Bonzini, joro, alex.williamson, gleb
  Cc: kvm, linux-kernel, wei, sherry.hurwitz

Hi,

On 2/13/16 01:13, Paolo Bonzini wrote:
> I have a few questions about AVIC.
>
> On 12/02/2016 14:59, Suravee Suthikulpanit wrote:
>> CURRENT UNSUPPORT USE-CASES
>> ===========================
>>      - Nested VM
>>      - VM Migration
>
> I'm interested in what you mean with "VM migration".  I've noticed that,
> because x2APIC is disabled by default, it's possible that a VM on a
> non-AVIC host will fail when moved to an AVIC host.  I would prefer if
> you kept x2APIC working through VMEXITs.  Getting the full performance
> benefit would require disabling x2APIC of course (or waiting for
> improved support in the processor), but at least there would be no surprise.
>
> Is it just this, or is there anything else?

For VM Migration, I meant the process of taking a snapshot of the VM and 
later on loading it back to resume where it left off, and not necessary 
to a different type of system. This would require saving off the VAPIC 
backing page of each vCPU along with the rest of per-VM AVIC data 
structures. IIUC, this might require changes in the QEmu as well.

Moving VM non-AVIC host to AVIC host, and vice versa, would be tricky.

I also see you points about supporting x2APIC. Let me look more into it, 
and then get back to you on this.

>
> (I'm quite surprised that x2APIC is not supported.  MSRs are faster than
> memory accesses due to the cost of TLB misses, which can be hefty for VMs).
>
> I have a few other doubts about the architecture that hopefully you (or
> Wei Huang too) can clarify before I can review this patch meaningfully.
>   I have replied to other patches with these questions.
>
> Paolo
>

I'll go through those and get back to you separately.

Regards,

Suravee

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PART1 RFC 0/9] KVM: x86: Introduce SVM AVIC support
  2016-02-12 19:55   ` Suravee Suthikulpanit
@ 2016-02-12 20:05     ` Paolo Bonzini
  0 siblings, 0 replies; 52+ messages in thread
From: Paolo Bonzini @ 2016-02-12 20:05 UTC (permalink / raw)
  To: Suravee Suthikulpanit, joro, alex.williamson, gleb
  Cc: kvm, linux-kernel, wei, sherry.hurwitz



On 12/02/2016 20:55, Suravee Suthikulpanit wrote:
>> Is it just this, or is there anything else?
> 
> For VM Migration, I meant the process of taking a snapshot of the VM and
> later on loading it back to resume where it left off, and not necessary
> to a different type of system. This would require saving off the VAPIC
> backing page of each vCPU along with the rest of per-VM AVIC data
> structures. IIUC, this might require changes in the QEmu as well.

It comes for free if you use regs for the backing page as I mentioned.
QEMU then manipulates the registers as usual throgh
KVM_GET_LAPIC/KVM_SET_LAPIC.  Same for non-AVIC <-> AVIC migration.

Paolo

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PART1 RFC 5/9] svm: Add VMEXIT handlers for AVIC
  2016-02-12 15:38   ` Paolo Bonzini
@ 2016-02-15 19:22     ` Radim Krčmář
  2016-02-16  6:29     ` Suravee Suthikulpanit
  1 sibling, 0 replies; 52+ messages in thread
From: Radim Krčmář @ 2016-02-15 19:22 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Suravee Suthikulpanit, joro, alex.williamson, gleb, kvm,
	linux-kernel, wei, sherry.hurwitz

2016-02-12 16:38+0100, Paolo Bonzini:
> On 12/02/2016 14:59, Suravee Suthikulpanit wrote:
>> +	case AVIC_INCMP_IPI_ERR_TARGET_NOT_RUN:
>> +		kvm_lapic_reg_write(apic, APIC_ICR2, icrh);
>> +		kvm_lapic_reg_write(apic, APIC_ICR, icrl);
> 
> Wouldn't this cause a double injection of the IPI if the following happens:

I think it will.  (IRR was written to APIC pages, so hypervisor's only
job is to make sure that all targeted VCPUs eventually run.)

> The handling of races for IsRunning and incomplete IPIs has always been
> very confusing to me whenever I read the AVIC specification.  It would
> be great if you could clarify this.

Yeah, we bug there as well:  If all target VCPUs have IsRunning set and
are in the process of being scheduled out (avic_set_running false), then
there is no VMEXIT on IPI and the doorbell does nothing[1];  KVM desn't
re-check pending interrupts before actually scheduling out, therefore
VCPUs will wake only if another interrupt comes.

The hypervisor can manage the IsRunning as it wishes to, so KVM probably
should set IsRunning to false before scanning IRR.


---
1: I didn't find a single mention of a situation when doorbell arrives
   outside of guest mode, so I presume that nothing happens.
   Is it right?

   Thanks.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PART1 RFC 5/9] svm: Add VMEXIT handlers for AVIC
  2016-02-12 15:38   ` Paolo Bonzini
  2016-02-15 19:22     ` Radim Krčmář
@ 2016-02-16  6:29     ` Suravee Suthikulpanit
  2016-02-16 12:15       ` Paolo Bonzini
  1 sibling, 1 reply; 52+ messages in thread
From: Suravee Suthikulpanit @ 2016-02-16  6:29 UTC (permalink / raw)
  To: Paolo Bonzini, joro, alex.williamson, gleb
  Cc: kvm, linux-kernel, wei, sherry.hurwitz

Hi Paolo,

On 2/12/16 22:38, Paolo Bonzini wrote:
>
>
> On 12/02/2016 14:59, Suravee Suthikulpanit wrote:
>> +		 "icrh:icrl=%#010x:%08x, id=%u, index=%u\n",
>> +		 __func__, svm->vcpu.cpu, svm->vcpu.vcpu_id,
>> +		 icrh, icrl, id, index);
>> +
>> +	switch (id) {
>> +	case AVIC_INCMP_IPI_ERR_INVALID_INT_TYPE:
>> +		/*
>> +		 * AVIC hardware handles the generation of
>> +		 * IPIs when the specified Message Type is Fixed
>> +		 * (also known as fixed delivery mode) and
>> +		 * the Trigger Mode is edge-triggered. The hardware
>> +		 * also supports self and broadcast delivery modes
>> +		 * specified via the Destination Shorthand(DSH)
>> +		 * field of the ICRL. Logical and physical APIC ID
>> +		 * formats are supported. All other IPI types cause
>> +		 * a #VMEXIT, which needs to emulated.
>> +		 */
>> +		kvm_lapic_reg_write(apic, APIC_ICR2, icrh);
>> +		kvm_lapic_reg_write(apic, APIC_ICR, icrl);
>> +		break;
>> +	case AVIC_INCMP_IPI_ERR_TARGET_NOT_RUN:
>> +		kvm_lapic_reg_write(apic, APIC_ICR2, icrh);
>> +		kvm_lapic_reg_write(apic, APIC_ICR, icrl);
>
> Wouldn't this cause a double injection of the IPI if the following happens:
>
> 1) destination 1 is running, so the processor sets IRR and sends a
> doorbell message
>
> 2) destination 2 is not running, so the processor sets IRR and exits
>
> 3) destination 1 processes the interrupt, moving it from IRR to ISR
>
> 4) destination 1 sends an EOI
>
> 5) the source exits and reinjects the interrupt
>
> 6) destination 1 then receives the interrupt again.

Not sure if I am following your scenario here.  IIUC, your concern is 
regarding the dest2 that was not running at the time that the IPI 
message is sent to both dest1 and dest2?

In this case, since the HW cannot deliver due to one ore more target 
vcpus due to not running, I believe it would not set the IRR bit of 
dest1, and generate the AVIC_INCOMPLETE_IPI #vmexit above instead. I 
don't think it would progress to step 3 right away.

>
>
> Or alternatively:
>
> 1) destination 1 is not running, so the processor sets IRR and exits
>
> 2) another CPU executes VMRUN for destination 1, so the processor
> injects the interrupt
>
> 3) destination 1 sends an EOI
>
> 4) the source exits and reinjects the interrupt
>
> 5) destination 1 then receives the interrupt again.

Same here, I don't think the HW would set the IRR of the dest1. Instead, 
it would generate the #VMEXIT.

> The handling of races for IsRunning and incomplete IPIs has always been
> very confusing to me whenever I read the AVIC specification.  It would
> be great if you could clarify this.

I'll make sure to confirm with the HW designer again just to be sure.

Thanks,
Suravee

> Paolo
>
>> +		break;
>> +	case AVIC_INCMP_IPI_ERR_INV_TARGET:
>> +		pr_err("SVM: %s: Invalid IPI target (icr=%#08x:%08x, idx=%u)\n",
>> +		       __func__, icrh, icrl, index);
>> +		BUG();
>> +		break;
>> +	case AVIC_INCMP_IPI_ERR_INV_BK_PAGE:
>> +		pr_err("SVM: %s: Invalid bk page (icr=%#08x:%08x, idx=%u)\n",
>> +		       __func__, icrh, icrl, index);
>> +		BUG();

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PART1 RFC 6/9] svm: Add interrupt injection via AVIC
  2016-02-12 19:33               ` Borislav Petkov
@ 2016-02-16  7:50                 ` Ingo Molnar
  2016-02-16  8:39                   ` [PATCH] x86/msr: Document msr-index.h rule for addition Borislav Petkov
  0 siblings, 1 reply; 52+ messages in thread
From: Ingo Molnar @ 2016-02-16  7:50 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Paolo Bonzini, Suravee Suthikulpanit, joro, alex.williamson,
	gleb, kvm, linux-kernel, wei, sherry.hurwitz, x86-ml


* Borislav Petkov <bp@alien8.de> wrote:

> On Fri, Feb 12, 2016 at 07:56:59PM +0100, Paolo Bonzini wrote:
> > Ok, next examples: MSR_VM_CR and MSR_VM_IGNNE. :)
> 
> I knew you were going to dig out some. :-)
> 
> > Are you okay with moving all the SVM MSRs to virtext.h instead?
> 
> So I would not move any now and cause unnecessary churn. I think it
> should be enough if we agree on a strategy wrt msr-index.h and then
> follow it. I think we should do something similar to pci_ids.h.
> 
> Let me add tip guys to CC.
> 
> ---
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index 552346598dab..75a5bb61d32f 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -1,7 +1,12 @@
>  #ifndef _ASM_X86_MSR_INDEX_H
>  #define _ASM_X86_MSR_INDEX_H
>  
> -/* CPU model specific register (MSR) numbers */
> +/*
> + * CPU model specific register (MSR) numbers.
> + *
> + * Do not add new entries to this file unless the definitions are shared
> + * between multiple compilation units.
> + */

This sounds good to me.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH] x86/msr: Document msr-index.h rule for addition
  2016-02-16  7:50                 ` Ingo Molnar
@ 2016-02-16  8:39                   ` Borislav Petkov
  0 siblings, 0 replies; 52+ messages in thread
From: Borislav Petkov @ 2016-02-16  8:39 UTC (permalink / raw)
  To: LKML
  Cc: alex.williamson, gleb, joro, kvm, Paolo Bonzini, sherry.hurwitz,
	Suravee Suthikulpanit, wei, x86-ml

From: Borislav Petkov <bp@suse.de>

In order to keep this file's size sensible and not cause too much
unnecessary churn, make the rule explicit - similar to pci_ids.h - that
only MSRs which are used in multiple compilation units, should get added
to it.

Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: alex.williamson@redhat.com
Cc: gleb@kernel.org
Cc: joro@8bytes.org
Cc: kvm@vger.kernel.org
CC: Paolo Bonzini <pbonzini@redhat.com>
Cc: sherry.hurwitz@amd.com
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Cc: wei@redhat.com
Cc: x86-ml <x86@kernel.org>
---
 arch/x86/include/asm/msr-index.h | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index b05402ef3b84..984ab75bf621 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -1,7 +1,12 @@
 #ifndef _ASM_X86_MSR_INDEX_H
 #define _ASM_X86_MSR_INDEX_H
 
-/* CPU model specific register (MSR) numbers */
+/*
+ * CPU model specific register (MSR) numbers.
+ *
+ * Do not add new entries to this file unless the definitions are shared
+ * between multiple compilation units.
+ */
 
 /* x86-64 specific MSRs */
 #define MSR_EFER		0xc0000080 /* extended feature register */
-- 
2.3.5

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PART1 RFC 5/9] svm: Add VMEXIT handlers for AVIC
  2016-02-16  6:29     ` Suravee Suthikulpanit
@ 2016-02-16 12:15       ` Paolo Bonzini
  2016-02-16 14:13         ` Radim Krčmář
  0 siblings, 1 reply; 52+ messages in thread
From: Paolo Bonzini @ 2016-02-16 12:15 UTC (permalink / raw)
  To: Suravee Suthikulpanit, joro, alex.williamson, gleb
  Cc: kvm, linux-kernel, wei, sherry.hurwitz



On 16/02/2016 07:29, Suravee Suthikulpanit wrote:
> Hi Paolo,
> 
> On 2/12/16 22:38, Paolo Bonzini wrote:
>>
>>
>> On 12/02/2016 14:59, Suravee Suthikulpanit wrote:
>>> +         "icrh:icrl=%#010x:%08x, id=%u, index=%u\n",
>>> +         __func__, svm->vcpu.cpu, svm->vcpu.vcpu_id,
>>> +         icrh, icrl, id, index);
>>> +
>>> +    switch (id) {
>>> +    case AVIC_INCMP_IPI_ERR_INVALID_INT_TYPE:
>>> +        /*
>>> +         * AVIC hardware handles the generation of
>>> +         * IPIs when the specified Message Type is Fixed
>>> +         * (also known as fixed delivery mode) and
>>> +         * the Trigger Mode is edge-triggered. The hardware
>>> +         * also supports self and broadcast delivery modes
>>> +         * specified via the Destination Shorthand(DSH)
>>> +         * field of the ICRL. Logical and physical APIC ID
>>> +         * formats are supported. All other IPI types cause
>>> +         * a #VMEXIT, which needs to emulated.
>>> +         */
>>> +        kvm_lapic_reg_write(apic, APIC_ICR2, icrh);
>>> +        kvm_lapic_reg_write(apic, APIC_ICR, icrl);
>>> +        break;
>>> +    case AVIC_INCMP_IPI_ERR_TARGET_NOT_RUN:
>>> +        kvm_lapic_reg_write(apic, APIC_ICR2, icrh);
>>> +        kvm_lapic_reg_write(apic, APIC_ICR, icrl);
>>
>> Wouldn't this cause a double injection of the IPI if the following
>> happens:
>>
>> 1) destination 1 is running, so the processor sets IRR and sends a
>> doorbell message
>>
>> 2) destination 2 is not running, so the processor sets IRR and exits
>>
>> 3) destination 1 processes the interrupt, moving it from IRR to ISR
>>
>> 4) destination 1 sends an EOI
>>
>> 5) the source exits and reinjects the interrupt
>>
>> 6) destination 1 then receives the interrupt again.
> 
> Not sure if I am following your scenario here.  IIUC, your concern is
> regarding the dest2 that was not running at the time that the IPI
> message is sent to both dest1 and dest2?
> 
> In this case, since the HW cannot deliver due to one ore more target
> vcpus due to not running, I believe it would not set the IRR bit of
> dest1, and generate the AVIC_INCOMPLETE_IPI #vmexit above instead. I
> don't think it would progress to step 3 right away.

The documentation doesn't say that setting the IRR bit is atomic across
all CPUs (and from a hardware perspective that would be extremely
unlikely).  Another hint in my opinion is that the vmexit is called
"incomplete" IPI, not something like "aborted" IPI.  "abort" might
suggest atomicity, "incomplete" definitely suggests *non*atomicity to me.

Wei, what do you think/recall?

I am afraid that this could be a showstopper for AVIC support in KVM.
The only solution I see is to have a different page for each CPU, so
that only self-IPIs are virtualized.  Then you'd only support
virtualization of self-IPIs, similar to Intel's APICv.

>> The handling of races for IsRunning and incomplete IPIs has always been
>> very confusing to me whenever I read the AVIC specification.  It would
>> be great if you could clarify this.
> 
> I'll make sure to confirm with the HW designer again just to be sure.

Please do, thanks!

Paolo

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PART1 RFC 5/9] svm: Add VMEXIT handlers for AVIC
  2016-02-16 12:15       ` Paolo Bonzini
@ 2016-02-16 14:13         ` Radim Krčmář
  2016-02-16 16:56           ` Paolo Bonzini
  0 siblings, 1 reply; 52+ messages in thread
From: Radim Krčmář @ 2016-02-16 14:13 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Suravee Suthikulpanit, joro, alex.williamson, gleb, kvm,
	linux-kernel, wei, sherry.hurwitz

2016-02-16 13:15+0100, Paolo Bonzini:
> On 16/02/2016 07:29, Suravee Suthikulpanit wrote:
>> On 2/12/16 22:38, Paolo Bonzini wrote:
>>> On 12/02/2016 14:59, Suravee Suthikulpanit wrote:
>>>> +    case AVIC_INCMP_IPI_ERR_INVALID_INT_TYPE:
| [...]
>>>> +    case AVIC_INCMP_IPI_ERR_TARGET_NOT_RUN:
>>>> +        kvm_lapic_reg_write(apic, APIC_ICR2, icrh);
>>>> +        kvm_lapic_reg_write(apic, APIC_ICR, icrl);
>>>
>>> Wouldn't this cause a double injection of the IPI if the following
>>> happens:
>>>
>>> 1) destination 1 is running, so the processor sets IRR and sends a
>>> doorbell message
>>>
>>> 2) destination 2 is not running, so the processor sets IRR and exits
>>>
>>> 3) destination 1 processes the interrupt, moving it from IRR to ISR
>>>
>>> 4) destination 1 sends an EOI
>>>
>>> 5) the source exits and reinjects the interrupt
>>>
>>> 6) destination 1 then receives the interrupt again.
>> 
>> Not sure if I am following your scenario here.  IIUC, your concern is
>> regarding the dest2 that was not running at the time that the IPI
>> message is sent to both dest1 and dest2?
>> 
>> In this case, since the HW cannot deliver due to one ore more target
>> vcpus due to not running, I believe it would not set the IRR bit of
>> dest1, and generate the AVIC_INCOMPLETE_IPI #vmexit above instead. I
>> don't think it would progress to step 3 right away.

(AVIC handles logical interrupts and broadcasts, so there is IPI to two
 destinations that won't exit with AVIC_INCMP_IPI_ERR_INVALID_INT_TYPE.)

I assume that the whole step must be completed before a subsequent step
is started, so either all IRRs are written or none is:

 3,4: If any IPI destination is not valid, then AVIC exits with
      !AVIC_INCMP_IPI_ERR_TARGET_NOT_RUN before any IRR is written.
      (Broadcast ignores invalid, which is ok.)
 5: AVIC will set IRR and maybe send doorbell on all valid destinations.
 6: If doorbell wasn't sent to all valid destinations, exit with
    AVIC_INCMP_IPI_ERR_TARGET_NOT_RUN

(Relevant bit of the spec is at the bottom.)

> The documentation doesn't say that setting the IRR bit is atomic across
> all CPUs (and from a hardware perspective that would be extremely
> unlikely).

Yeah, I think atomic there means that it won't race with other writes to
the same byte in IRR.  We're fine as long as AVIC writes IRR before
checking IsRunning on every destination, which it seems to be.

>             Another hint in my opinion is that the vmexit is called
> "incomplete" IPI, not something like "aborted" IPI.  "abort" might
> suggest atomicity, "incomplete" definitely suggests *non*atomicity to me.
> 
> Wei, what do you think/recall?
> 
> I am afraid that this could be a showstopper for AVIC support in KVM.

(It would, but I believe that AVIC designers made it sane and the spec
 doesn't let me read it in a way that supports your theories.)

> The only solution I see is to have a different page for each CPU, so
> that only self-IPIs are virtualized.  Then you'd only support
> virtualization of self-IPIs, similar to Intel's APICv.

We need one APIC page per VCPU and a pair (physical, logical) of APIC ID
tables per VM, but per CPU structures shouldn't be necessary.

>> I'll make sure to confirm with the HW designer again just to be sure.
> 
> Please do, thanks!

Thanks, looking forward to it!

We definitely have incompatible understanding of some aspects. :)


---
APM 2, June 2015, 15.29.2.4 Interrupt Delivery:

  Interprocessor Interrupts. To process an IPI, AVIC hardware executes
  the following steps:
  [...]
  3. If the destination(s) is (are) logically addressed, lookup the
     guest physical APIC IDs for each logical ID using the Logical APIC
     ID table.  If the entry is not valid (V bit is cleared), cause a
     #VMEXIT.  If the entry is valid, but contains an invalid backing
     page pointer, cause a #VMEXIT.

  4. Lookup the vAPIC backing page address in the Physical APIC table using
     the guest physical APIC ID as an index into the table.  For
     directed interrupts, if the selected table entry is not valid,
     cause a #VMEXIT. For broadcast IPIs, invalid entries are ignored.

  5. For every valid destination:
     - Atomically set the appropriate IRR bit in each of the
       destinations’ vAPIC backing page.
     - Check the IsRunning status of each destination.
     - If the destination IsRunning bit is set, send a doorbell message
       using the host physical core number from the Physical APIC ID
       table.

  6. If any destinations are identified as not currently scheduled on a
     physical core (i.e., the IsRunning bit for that virtual processor
     is not set), cause a #VMEXIT.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PART1 RFC 5/9] svm: Add VMEXIT handlers for AVIC
  2016-02-16 14:13         ` Radim Krčmář
@ 2016-02-16 16:56           ` Paolo Bonzini
  2016-02-16 18:06               ` Radim Krčmář
  0 siblings, 1 reply; 52+ messages in thread
From: Paolo Bonzini @ 2016-02-16 16:56 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: Suravee Suthikulpanit, joro, alex.williamson, gleb, kvm,
	linux-kernel, wei, sherry.hurwitz



On 16/02/2016 15:13, Radim Krčmář wrote:
> Yeah, I think atomic there means that it won't race with other writes to
> the same byte in IRR.  We're fine as long as AVIC writes IRR before
> checking IsRunning on every destination, which it seems to be.

More precisely, if AVIC writes all IRRs (5.1) and ANDs all IsRunning
flags before checking the result of the AND (6).

> (It would, but I believe that AVIC designers made it sane and the spec
>  doesn't let me read it in a way that supports your theories.)

I hope so as well, and you've probably convinced me.  But I still think
the code is wrong in this patch.  Let's look at the spec that you pasted:

>   3. If the destination(s) is (are) logically addressed, lookup the
>      guest physical APIC IDs for each logical ID using the Logical APIC
>      ID table.  If the entry is not valid (V bit is cleared), cause a
>      #VMEXIT.  If the entry is valid, but contains an invalid backing
>      page pointer, cause a #VMEXIT.
> 
>   4. Lookup the vAPIC backing page address in the Physical APIC table using
>      the guest physical APIC ID as an index into the table.  For
>      directed interrupts, if the selected table entry is not valid,
>      cause a #VMEXIT. For broadcast IPIs, invalid entries are ignored.
> 
>   5. For every valid destination:
>      - Atomically set the appropriate IRR bit in each of the
>        destinations’ vAPIC backing page.
>      - Check the IsRunning status of each destination.
>      - If the destination IsRunning bit is set, send a doorbell message
>        using the host physical core number from the Physical APIC ID
>        table.

This is where the following steps happen:

1) destination 1 is running, so the processor sets IRR and sends a
doorbell message

2) destination 2 is a valid destination, so the processor sets IRR


In the meanwhile destination 1 is running on another VCPU so we can say
that it does the following:

3) destination 1 processes the interrupt, moving it from IRR to ISR

4) destination 1 sends an EOI


>   6. If any destinations are identified as not currently scheduled on a
>      physical core (i.e., the IsRunning bit for that virtual processor
>      is not set), cause a #VMEXIT.

Now the following happens:

5) the source exits and reinjects the interrupt (in Suravee's code, the
VMEXIT handler just writes again to ICR);

6) the KVM code has no way to know that destination 1 has serviced the
interrupt already, so destination 1 then receives the interrupt again.

So perhaps it's enough to change KVM to _not_ modify IRR on an
"incomplete IPI - target not running" vmexit, and instead only do

       kvm_make_request(KVM_REQ_EVENT, vcpu);
       kvm_vcpu_kick(vcpu);

on the destination VCPUs.  That would indeed be simply just be something
to fix in the patches.  Do you agree that this is a bug?

I'm curious about how often the AVIC VMEXIT fires.  Suravee, can you add
debugfs counters for the various incomplete IPI subcauses?


And since we are at it, I'm curious about the following two steps at the
end of 15.29.2.6.

- on VMRUN the interrupt state is evaluated and the highest priority
pending interrupt indicated in the IRR is delivered if interrupt masking
and priority allow

- Any doorbell signals received during VMRUN processing are recognized
immediately after entering the guest

Isn't step 1 exactly the same as evaluating the doorbell signals?  Is
the IRR evaluated only if the hypervisor had rang the doorbell, or
unconditionally?

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PART1 RFC 5/9] svm: Add VMEXIT handlers for AVIC
  2016-02-16 16:56           ` Paolo Bonzini
@ 2016-02-16 18:06               ` Radim Krčmář
  0 siblings, 0 replies; 52+ messages in thread
From: Radim Krčmář @ 2016-02-16 18:06 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Suravee Suthikulpanit, joro, alex.williamson, gleb, kvm,
	linux-kernel, wei, sherry.hurwitz

2016-02-16 17:56+0100, Paolo Bonzini:
> On 16/02/2016 15:13, Radim Krčmář wrote:
>> Yeah, I think atomic there means that it won't race with other writes to
>> the same byte in IRR.  We're fine as long as AVIC writes IRR before
>> checking IsRunning on every destination, which it seems to be.
> 
> More precisely, if AVIC writes all IRRs (5.1) and ANDs all IsRunning
> flags before checking the result of the AND (6).
> 
>> (It would, but I believe that AVIC designers made it sane and the spec
>>  doesn't let me read it in a way that supports your theories.)
> 
> I hope so as well, and you've probably convinced me.  But I still think
> the code is wrong in this patch.  Let's look at the spec that you pasted:

The code definitely is wrong.  I'll be more specific when disagreeing,
sorry.

> This is where the following steps happen:

  [I completely agree with the race presented here.]

> So perhaps it's enough to change KVM to _not_ modify IRR on an
> "incomplete IPI - target not running" vmexit, and instead only do
> 
>        kvm_make_request(KVM_REQ_EVENT, vcpu);
>        kvm_vcpu_kick(vcpu);
> 
> on the destination VCPUs.  That would indeed be simply just be something
> to fix in the patches.  Do you agree that this is a bug?

Yes.  (We don't even need KVM_REQ_EVENT, because there should be nothing
to do, KVM just has to run the guest.)

> I'm curious about how often the AVIC VMEXIT fires.

>From a theoretical standpoint:

AVIC_INCMP_IPI_ERR_INVALID_INT_TYPE:  Not much; OS usually doesn't send
lowest priority IPIs (it's not even supported on Intel), NMI, INIT, ...
and the rest seems to be handled.

AVIC_INCMP_IPI_ERR_TARGET_NOT_RUN: depends a lot on host load (and what
the guest does); most IPIs will trigger this on an over-committed host.

AVIC_INCMP_IPI_ERR_INV_TARGET: Almost never; only on guest OS bugs,
where the guest can trigger if it targets non-existing VCPUs.
(Btw. calling BUG() there is a bug.)

AVIC_INCMP_IPI_ERR_INV_BK_PAGE: It's a bug in KVM, so hopefully never.

>                                                     Suravee, can you add
> debugfs counters for the various incomplete IPI subcauses?

Good point, large value in any of those would point to a problem.

> And since we are at it, I'm curious about the following two steps at the
> end of 15.29.2.6.
> 
> - on VMRUN the interrupt state is evaluated and the highest priority
> pending interrupt indicated in the IRR is delivered if interrupt masking
> and priority allow
> 
> - Any doorbell signals received during VMRUN processing are recognized
> immediately after entering the guest
> 
> Isn't step 1 exactly the same as evaluating the doorbell signals?

It is.

>                                                                    Is
> the IRR evaluated only if the hypervisor had rang the doorbell, or
> unconditionally?

Unconditionally.
(Supporting evidence: current code doesn't send doorbell when the VCPU
 is in host mode and I suppose that it works fine. :])

I think these two clauses cover a race on VMRUN:
when processing VMRUN, we might not consider the CPU to be in guest
mode, so these two disambiguate a case when VMRUN has already checked
for IRR (it was empty) and other CPU set IRR and issued doorbell before
VMRUN entered the guest.  (The doorbell could be considered as lost
otherwise, because doorbells in host mode do nothing.)

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PART1 RFC 5/9] svm: Add VMEXIT handlers for AVIC
@ 2016-02-16 18:06               ` Radim Krčmář
  0 siblings, 0 replies; 52+ messages in thread
From: Radim Krčmář @ 2016-02-16 18:06 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Suravee Suthikulpanit, joro, alex.williamson, gleb, kvm,
	linux-kernel, wei, sherry.hurwitz

2016-02-16 17:56+0100, Paolo Bonzini:
> On 16/02/2016 15:13, Radim Krčmář wrote:
>> Yeah, I think atomic there means that it won't race with other writes to
>> the same byte in IRR.  We're fine as long as AVIC writes IRR before
>> checking IsRunning on every destination, which it seems to be.
> 
> More precisely, if AVIC writes all IRRs (5.1) and ANDs all IsRunning
> flags before checking the result of the AND (6).
> 
>> (It would, but I believe that AVIC designers made it sane and the spec
>>  doesn't let me read it in a way that supports your theories.)
> 
> I hope so as well, and you've probably convinced me.  But I still think
> the code is wrong in this patch.  Let's look at the spec that you pasted:

The code definitely is wrong.  I'll be more specific when disagreeing,
sorry.

> This is where the following steps happen:

  [I completely agree with the race presented here.]

> So perhaps it's enough to change KVM to _not_ modify IRR on an
> "incomplete IPI - target not running" vmexit, and instead only do
> 
>        kvm_make_request(KVM_REQ_EVENT, vcpu);
>        kvm_vcpu_kick(vcpu);
> 
> on the destination VCPUs.  That would indeed be simply just be something
> to fix in the patches.  Do you agree that this is a bug?

Yes.  (We don't even need KVM_REQ_EVENT, because there should be nothing
to do, KVM just has to run the guest.)

> I'm curious about how often the AVIC VMEXIT fires.

From a theoretical standpoint:

AVIC_INCMP_IPI_ERR_INVALID_INT_TYPE:  Not much; OS usually doesn't send
lowest priority IPIs (it's not even supported on Intel), NMI, INIT, ...
and the rest seems to be handled.

AVIC_INCMP_IPI_ERR_TARGET_NOT_RUN: depends a lot on host load (and what
the guest does); most IPIs will trigger this on an over-committed host.

AVIC_INCMP_IPI_ERR_INV_TARGET: Almost never; only on guest OS bugs,
where the guest can trigger if it targets non-existing VCPUs.
(Btw. calling BUG() there is a bug.)

AVIC_INCMP_IPI_ERR_INV_BK_PAGE: It's a bug in KVM, so hopefully never.

>                                                     Suravee, can you add
> debugfs counters for the various incomplete IPI subcauses?

Good point, large value in any of those would point to a problem.

> And since we are at it, I'm curious about the following two steps at the
> end of 15.29.2.6.
> 
> - on VMRUN the interrupt state is evaluated and the highest priority
> pending interrupt indicated in the IRR is delivered if interrupt masking
> and priority allow
> 
> - Any doorbell signals received during VMRUN processing are recognized
> immediately after entering the guest
> 
> Isn't step 1 exactly the same as evaluating the doorbell signals?

It is.

>                                                                    Is
> the IRR evaluated only if the hypervisor had rang the doorbell, or
> unconditionally?

Unconditionally.
(Supporting evidence: current code doesn't send doorbell when the VCPU
 is in host mode and I suppose that it works fine. :])

I think these two clauses cover a race on VMRUN:
when processing VMRUN, we might not consider the CPU to be in guest
mode, so these two disambiguate a case when VMRUN has already checked
for IRR (it was empty) and other CPU set IRR and issued doorbell before
VMRUN entered the guest.  (The doorbell could be considered as lost
otherwise, because doorbells in host mode do nothing.)

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PART1 RFC 5/9] svm: Add VMEXIT handlers for AVIC
  2016-02-16 18:06               ` Radim Krčmář
  (?)
@ 2016-02-18  2:25               ` Suravee Suthikulpanit
  2016-02-18 14:18                 ` Radim Krčmář
  -1 siblings, 1 reply; 52+ messages in thread
From: Suravee Suthikulpanit @ 2016-02-18  2:25 UTC (permalink / raw)
  To: Radim Krčmář, Paolo Bonzini
  Cc: joro, alex.williamson, gleb, kvm, linux-kernel, wei, sherry.hurwitz

Hi

On 2/17/16 01:06, Radim Krčmář wrote:
> 2016-02-16 17:56+0100, Paolo Bonzini:
>> >On 16/02/2016 15:13, Radim Krčmář wrote:
>>> >>Yeah, I think atomic there means that it won't race with other writes to
>>> >>the same byte in IRR.  We're fine as long as AVIC writes IRR before
>>> >>checking IsRunning on every destination, which it seems to be.
>> >
>> >More precisely, if AVIC writes all IRRs (5.1) and ANDs all IsRunning
>> >flags before checking the result of the AND (6).
>> >
>>> >>(It would, but I believe that AVIC designers made it sane and the spec
>>> >>  doesn't let me read it in a way that supports your theories.)
>> >
>> >I hope so as well, and you've probably convinced me.  But I still think
>> >the code is wrong in this patch.  Let's look at the spec that you pasted:
> The code definitely is wrong.  I'll be more specific when disagreeing,
> sorry.
>

Would you please be a bit more specific on what you think I am not doing 
correctly to handle the #VMEXIT in the case of target not running below.

+    case AVIC_INCMP_IPI_ERR_TARGET_NOT_RUN:
+        kvm_lapic_reg_write(apic, APIC_ICR2, icrh);
+        kvm_lapic_reg_write(apic, APIC_ICR, icrl);

This is actually not just writing to the register. Please note that 
writing to APIC_ICR register would also be calling apic_send_ipi(), 
which results in injecting interrupts to the target core:

From: arch/x86/kvm/lapic.c: kvm_lapic_write_reg()
[....]
         case APIC_ICR:
                 /* No delay here, so we always clear the pending bit */
                 apic_set_reg(apic, APIC_ICR, val & ~(1 << 12));
                 apic_send_ipi(apic);
                 break;

         case APIC_ICR2:
                 if (!apic_x2apic_mode(apic))
                         val &= 0xff000000;
                 apic_set_reg(apic, APIC_ICR2, val);
                 break;

Am I missing something?

Thanks,
Suravee

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PART1 RFC 5/9] svm: Add VMEXIT handlers for AVIC
  2016-02-18  2:25               ` Suravee Suthikulpanit
@ 2016-02-18 14:18                 ` Radim Krčmář
  2016-02-18 14:51                   ` Paolo Bonzini
  2016-02-19 11:32                   ` Suravee Suthikulpanit
  0 siblings, 2 replies; 52+ messages in thread
From: Radim Krčmář @ 2016-02-18 14:18 UTC (permalink / raw)
  To: Suravee Suthikulpanit
  Cc: Paolo Bonzini, joro, alex.williamson, gleb, kvm, linux-kernel,
	wei, sherry.hurwitz

2016-02-18 09:25+0700, Suravee Suthikulpanit:
> On 2/17/16 01:06, Radim Krčmář wrote:
>>2016-02-16 17:56+0100, Paolo Bonzini:
>>>>On 16/02/2016 15:13, Radim Krčmář wrote:
>>>>>>Yeah, I think atomic there means that it won't race with other writes to
>>>>>>the same byte in IRR.  We're fine as long as AVIC writes IRR before
>>>>>>checking IsRunning on every destination, which it seems to be.
>>>>
>>>>More precisely, if AVIC writes all IRRs (5.1) and ANDs all IsRunning
>>>>flags before checking the result of the AND (6).
>>>>
>>>>>>(It would, but I believe that AVIC designers made it sane and the spec
>>>>>>  doesn't let me read it in a way that supports your theories.)
>>>>
>>>>I hope so as well, and you've probably convinced me.  But I still think
>>>>the code is wrong in this patch.  Let's look at the spec that you pasted:
>>The code definitely is wrong.  I'll be more specific when disagreeing,
>>sorry.
>>
> 
> Would you please be a bit more specific on what you think I am not doing
> correctly to handle the #VMEXIT in the case of target not running below.
> 
> +    case AVIC_INCMP_IPI_ERR_TARGET_NOT_RUN:
> +        kvm_lapic_reg_write(apic, APIC_ICR2, icrh);
> +        kvm_lapic_reg_write(apic, APIC_ICR, icrl);
> 
> This is actually not just writing to the register. Please note that writing
> to APIC_ICR register would also be calling apic_send_ipi(), which results in
> injecting interrupts to the target core:

Exactly.  Injecting the interrupt in AVIC_INCMP_IPI_ERR_TARGET_NOT_RUN
handler is causing the double-injection bug that Paolo described.

> Am I missing something?

Probably that AVIC already wrote to all IRRs (and sent appropriate
doorbells) before this VMEXIT, so KVM shouldn't repeat it.

KVM just has to make sure that targeted VCPUs notice the interrupt,
which means to kick (wake up) VCPUs that don't have IsRunning set.
There is no need to do anything with running VCPUs, because they
 - are in guest mode and noticed the doorbell
 - are in host mode, where they will
   1) VMRUN as fast as they can because the VCPU didn't want to halt
      (and IRR is handled on VMRUN)
   2) check IRR after unsetting IsRunning and goto (1) if there are
      pending interrupts.  (RFC doesn't do this, which is another bug)

It's still possible that we misunderstood the spec.  Does AVIC handle
IPIs differently?

Thanks.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PART1 RFC 5/9] svm: Add VMEXIT handlers for AVIC
  2016-02-18 14:18                 ` Radim Krčmář
@ 2016-02-18 14:51                   ` Paolo Bonzini
  2016-02-18 15:43                     ` Radim Krčmář
  2016-02-19 11:32                   ` Suravee Suthikulpanit
  1 sibling, 1 reply; 52+ messages in thread
From: Paolo Bonzini @ 2016-02-18 14:51 UTC (permalink / raw)
  To: Radim Krčmář, Suravee Suthikulpanit
  Cc: joro, alex.williamson, gleb, kvm, linux-kernel, wei, sherry.hurwitz



On 18/02/2016 15:18, Radim Krčmář wrote:
> KVM just has to make sure that targeted VCPUs notice the interrupt,
> which means to kick (wake up) VCPUs that don't have IsRunning set.
> There is no need to do anything with running VCPUs, because they
>  - are in guest mode and noticed the doorbell
>  - are in host mode, where they will
>    1) VMRUN as fast as they can because the VCPU didn't want to halt
>       (and IRR is handled on VMRUN)
>    2) check IRR after unsetting IsRunning and goto (1) if there are
>       pending interrupts.  (RFC doesn't do this, which is another bug)

This is not necessary.  IsRunning is only cleared at vcpu_put time.  The
next KVM_RUN will look at IRR (kvm_arch_vcpu_runnable), if necessary set
the mp_state to KVM_MP_STATE_RUNNABLE, and do the VMRUN.

But I agree that this is what Suravee is missing.

> It's still possible that we misunderstood the spec.  Does AVIC handle
> IPIs differently?

I don't think we misunderstood it.  Well, I did, but that's fixed now. :)

Paolo

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PART1 RFC 5/9] svm: Add VMEXIT handlers for AVIC
  2016-02-18 14:51                   ` Paolo Bonzini
@ 2016-02-18 15:43                     ` Radim Krčmář
  2016-02-18 15:53                       ` Paolo Bonzini
  0 siblings, 1 reply; 52+ messages in thread
From: Radim Krčmář @ 2016-02-18 15:43 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Suravee Suthikulpanit, joro, alex.williamson, gleb, kvm,
	linux-kernel, wei, sherry.hurwitz

2016-02-18 15:51+0100, Paolo Bonzini:
> On 18/02/2016 15:18, Radim Krčmář wrote:
>> KVM just has to make sure that targeted VCPUs notice the interrupt,
>> which means to kick (wake up) VCPUs that don't have IsRunning set.
>> There is no need to do anything with running VCPUs, because they
>>  - are in guest mode and noticed the doorbell
>>  - are in host mode, where they will
>>    1) VMRUN as fast as they can because the VCPU didn't want to halt
>>       (and IRR is handled on VMRUN)
>>    2) check IRR after unsetting IsRunning and goto (1) if there are
>>       pending interrupts.  (RFC doesn't do this, which is another bug)
> 
> This is not necessary.  IsRunning is only cleared at vcpu_put time.

It's not necessary if we are being preempted, but it is necessary to
clear IsRunning earlier when we are going to block (i.e. after a halt).

>                                                                      The
> next KVM_RUN will look at IRR (kvm_arch_vcpu_runnable), if necessary set
> the mp_state to KVM_MP_STATE_RUNNABLE, and do the VMRUN.

We're not always going to exit to userspace.  I think the following
order of actions could result in a stuck VM:

(The main idea is that VCPU targets another VCPU between its last check
 for IRR and clearing of IsRunning.)

1) vcpu0 has set IsRunning and is in guest mode.
2) vcpu0 executes HLT and exits guest mode.
3) vcpu0 doesn't have any pending interrupts or other stuff that would
   make kvm_arch_vcpu_runnable() return true.
4) vcpu0 runs kvm_vcpu_block() and gets to call schedule().
5) *vcpu1* sends IPI to vcpu0.  IsRunning is set on vcpu0, so AVIC
   doesn't exit.  A doorbell is sent, but it does nothing.
6) vcpu0 runs schedule() and clears IsRunning in a callback.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PART1 RFC 5/9] svm: Add VMEXIT handlers for AVIC
  2016-02-18 15:43                     ` Radim Krčmář
@ 2016-02-18 15:53                       ` Paolo Bonzini
  2016-02-18 16:27                         ` Radim Krčmář
  0 siblings, 1 reply; 52+ messages in thread
From: Paolo Bonzini @ 2016-02-18 15:53 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: Suravee Suthikulpanit, joro, alex.williamson, gleb, kvm,
	linux-kernel, wei, sherry.hurwitz



On 18/02/2016 16:43, Radim Krčmář wrote:
> 2016-02-18 15:51+0100, Paolo Bonzini:
>> On 18/02/2016 15:18, Radim Krčmář wrote:
>>> KVM just has to make sure that targeted VCPUs notice the interrupt,
>>> which means to kick (wake up) VCPUs that don't have IsRunning set.
>>> There is no need to do anything with running VCPUs, because they
>>>  - are in guest mode and noticed the doorbell
>>>  - are in host mode, where they will
>>>    1) VMRUN as fast as they can because the VCPU didn't want to halt
>>>       (and IRR is handled on VMRUN)
>>>    2) check IRR after unsetting IsRunning and goto (1) if there are
>>>       pending interrupts.  (RFC doesn't do this, which is another bug)
>>
>> This is not necessary.  IsRunning is only cleared at vcpu_put time.
> 
> It's not necessary if we are being preempted, but it is necessary to
> clear IsRunning earlier when we are going to block (i.e. after a halt).
> 
>>                                                                      The
>> next KVM_RUN will look at IRR (kvm_arch_vcpu_runnable), if necessary set
>> the mp_state to KVM_MP_STATE_RUNNABLE, and do the VMRUN.
> 
> We're not always going to exit to userspace.  I think the following
> order of actions could result in a stuck VM:
> 
> (The main idea is that VCPU targets another VCPU between its last check
>  for IRR and clearing of IsRunning.)
> 
> 1) vcpu0 has set IsRunning and is in guest mode.
> 2) vcpu0 executes HLT and exits guest mode.
> 3) vcpu0 doesn't have any pending interrupts or other stuff that would
>    make kvm_arch_vcpu_runnable() return true.
> 4) vcpu0 runs kvm_vcpu_block() and gets to call schedule().
> 5) *vcpu1* sends IPI to vcpu0.  IsRunning is set on vcpu0, so AVIC
>    doesn't exit.  A doorbell is sent, but it does nothing.
> 6) vcpu0 runs schedule() and clears IsRunning in a callback.

You're completely right.  When the VCPU is halting, the preempt
notifier's sched_out callback is too late to clear IsRunning; you need
to do that before the last time you check for IRR.

Patch 9 is okay, but it is also necessary to clear IsRunning in
kvm_arch_vcpu_blocking and set it in kvm_arch_vcpu_unblocking.  In
addition, vcpu_put/vcpu_load should not modify IsRunning between
kvm_arch_vcpu_blocking and kvm_arch_vcpu_unblocking.  Do you agree?

Paolo

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PART1 RFC 5/9] svm: Add VMEXIT handlers for AVIC
  2016-02-18 15:53                       ` Paolo Bonzini
@ 2016-02-18 16:27                         ` Radim Krčmář
  2016-02-18 17:18                           ` Paolo Bonzini
  0 siblings, 1 reply; 52+ messages in thread
From: Radim Krčmář @ 2016-02-18 16:27 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Suravee Suthikulpanit, joro, alex.williamson, gleb, kvm,
	linux-kernel, wei, sherry.hurwitz

2016-02-18 16:53+0100, Paolo Bonzini:
> Patch 9 is okay, but it is also necessary to clear IsRunning in
> kvm_arch_vcpu_blocking and set it in kvm_arch_vcpu_unblocking.  In
> addition, vcpu_put/vcpu_load should not modify IsRunning between
> kvm_arch_vcpu_blocking and kvm_arch_vcpu_unblocking.  Do you agree?

Yes.

I think we don't need to clear IsRunning on preemption, which would
simplify the protection.  (I haven't thought much about userspace exit,
so maybe we could skip that one as well, but we don't need to now.)

The reason for that is that KVM knows that the VCPU was scheduled out,
so it couldn't do much in the AVIC VMEXIT.
(KVM could force scheduler to pritioritize the VCPU, but our kick
 doesn't do that now and it seems like a bad idea.)

Does it seem reasonable?

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PART1 RFC 5/9] svm: Add VMEXIT handlers for AVIC
  2016-02-18 16:27                         ` Radim Krčmář
@ 2016-02-18 17:18                           ` Paolo Bonzini
  2016-02-19 11:39                             ` Suravee Suthikulpanit
  2016-03-03 10:42                             ` Suravee Suthikulpanit
  0 siblings, 2 replies; 52+ messages in thread
From: Paolo Bonzini @ 2016-02-18 17:18 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: Suravee Suthikulpanit, joro, alex.williamson, gleb, kvm,
	linux-kernel, wei, sherry.hurwitz, Wu, Feng



On 18/02/2016 17:27, Radim Krčmář wrote:
> 2016-02-18 16:53+0100, Paolo Bonzini:
>> Patch 9 is okay, but it is also necessary to clear IsRunning in
>> kvm_arch_vcpu_blocking and set it in kvm_arch_vcpu_unblocking.  In
>> addition, vcpu_put/vcpu_load should not modify IsRunning between
>> kvm_arch_vcpu_blocking and kvm_arch_vcpu_unblocking.  Do you agree?
> 
> Yes.
> 
> I think we don't need to clear IsRunning on preemption, which would
> simplify the protection.  (I haven't thought much about userspace exit,
> so maybe we could skip that one as well, but we don't need to now.)
> 
> The reason for that is that KVM knows that the VCPU was scheduled out,
> so it couldn't do much in the AVIC VMEXIT.
> (KVM could force scheduler to pritioritize the VCPU, but our kick
>  doesn't do that now and it seems like a bad idea.)
> 
> Does it seem reasonable?

Yes, and in fact it wouldn't need to clear and set IsRunning on
vcpu_put/vcpu_load; only on vcpu_blocking/vcpu_unblocking.

The IsRunning flag is more of a IsNotHalted flag, in the end.

Paolo

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PART1 RFC 5/9] svm: Add VMEXIT handlers for AVIC
  2016-02-18 14:18                 ` Radim Krčmář
  2016-02-18 14:51                   ` Paolo Bonzini
@ 2016-02-19 11:32                   ` Suravee Suthikulpanit
  1 sibling, 0 replies; 52+ messages in thread
From: Suravee Suthikulpanit @ 2016-02-19 11:32 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: Paolo Bonzini, joro, alex.williamson, gleb, kvm, linux-kernel,
	wei, sherry.hurwitz

Hi,

On 2/18/16 21:18, Radim Krčmář wrote:
> 2016-02-18 09:25+0700, Suravee Suthikulpanit:
>> On 2/17/16 01:06, Radim Krčmář wrote:
>>> 2016-02-16 17:56+0100, Paolo Bonzini:
>>>>> On 16/02/2016 15:13, Radim Krčmář wrote:
>>>>>>> Yeah, I think atomic there means that it won't race with other writes to
>>>>>>> the same byte in IRR.  We're fine as long as AVIC writes IRR before
>>>>>>> checking IsRunning on every destination, which it seems to be.
>>>>>
>>>>> More precisely, if AVIC writes all IRRs (5.1) and ANDs all IsRunning
>>>>> flags before checking the result of the AND (6).
>>>>>
>>>>>>> (It would, but I believe that AVIC designers made it sane and the spec
>>>>>>>   doesn't let me read it in a way that supports your theories.)
>>>>>
>>>>> I hope so as well, and you've probably convinced me.  But I still think
>>>>> the code is wrong in this patch.  Let's look at the spec that you pasted:
>>> The code definitely is wrong.  I'll be more specific when disagreeing,
>>> sorry.
>>>
>>
>> Would you please be a bit more specific on what you think I am not doing
>> correctly to handle the #VMEXIT in the case of target not running below.
>>
>> +    case AVIC_INCMP_IPI_ERR_TARGET_NOT_RUN:
>> +        kvm_lapic_reg_write(apic, APIC_ICR2, icrh);
>> +        kvm_lapic_reg_write(apic, APIC_ICR, icrl);
>>
>> This is actually not just writing to the register. Please note that writing
>> to APIC_ICR register would also be calling apic_send_ipi(), which results in
>> injecting interrupts to the target core:
>
> Exactly.  Injecting the interrupt in AVIC_INCMP_IPI_ERR_TARGET_NOT_RUN
> handler is causing the double-injection bug that Paolo described.
>
>> Am I missing something?
>
> Probably that AVIC already wrote to all IRRs (and sent appropriate
> doorbells) before this VMEXIT, so KVM shouldn't repeat it.

Ah, Ok I got it now. Thanks for the detail description. I am still 
waiting to hear back from the hardware designer to confirm the HW 
behavior. Meanwhile, I have tried NOT setting the IRR, and only 
kick_vcpu(). And things seem to work fine. Therefore, I think your 
analysis is likely to be correct.

Thanks again,
Suravee

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PART1 RFC 5/9] svm: Add VMEXIT handlers for AVIC
  2016-02-18 17:18                           ` Paolo Bonzini
@ 2016-02-19 11:39                             ` Suravee Suthikulpanit
  2016-02-19 11:44                               ` Paolo Bonzini
  2016-03-03 10:42                             ` Suravee Suthikulpanit
  1 sibling, 1 reply; 52+ messages in thread
From: Suravee Suthikulpanit @ 2016-02-19 11:39 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář
  Cc: joro, alex.williamson, gleb, kvm, linux-kernel, wei,
	sherry.hurwitz, Wu, Feng

Hi,

On 2/19/16 00:18, Paolo Bonzini wrote:
>
>
> On 18/02/2016 17:27, Radim Krčmář wrote:
>> 2016-02-18 16:53+0100, Paolo Bonzini:
>>> Patch 9 is okay, but it is also necessary to clear IsRunning in
>>> kvm_arch_vcpu_blocking and set it in kvm_arch_vcpu_unblocking.  In
>>> addition, vcpu_put/vcpu_load should not modify IsRunning between
>>> kvm_arch_vcpu_blocking and kvm_arch_vcpu_unblocking.  Do you agree?
>>
>> Yes.
>>
>> I think we don't need to clear IsRunning on preemption, which would
>> simplify the protection.  (I haven't thought much about userspace exit,
>> so maybe we could skip that one as well, but we don't need to now.)
>>
>> The reason for that is that KVM knows that the VCPU was scheduled out,
>> so it couldn't do much in the AVIC VMEXIT.
>> (KVM could force scheduler to pritioritize the VCPU, but our kick
>>   doesn't do that now and it seems like a bad idea.)
>>
>> Does it seem reasonable?
>
> Yes, and in fact it wouldn't need to clear and set IsRunning on
> vcpu_put/vcpu_load; only on vcpu_blocking/vcpu_unblocking.
>
> The IsRunning flag is more of a IsNotHalted flag, in the end.
>
> Paolo
>

Good point. I have made the change by introducing new function pointer, 
kvm_x86_ops.vcpu_blocking() and kvm_x86_ops.vcpu_unblocking(). Then 
provides the hook to set/unset the IsRunningBit here. Also, I no longer 
set the bit in the vcpu_load/vcpu_put.

If this is okay. I'll send out V2 soon.

Thanks,
Suravee

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PART1 RFC 5/9] svm: Add VMEXIT handlers for AVIC
  2016-02-19 11:39                             ` Suravee Suthikulpanit
@ 2016-02-19 11:44                               ` Paolo Bonzini
  2016-02-19 11:59                                 ` Suravee Suthikulpanit
  0 siblings, 1 reply; 52+ messages in thread
From: Paolo Bonzini @ 2016-02-19 11:44 UTC (permalink / raw)
  To: Suravee Suthikulpanit, Radim Krčmář
  Cc: joro, alex.williamson, gleb, kvm, linux-kernel, wei,
	sherry.hurwitz, Wu, Feng



On 19/02/2016 12:39, Suravee Suthikulpanit wrote:
> 
> Good point. I have made the change by introducing new function pointer,
> kvm_x86_ops.vcpu_blocking() and kvm_x86_ops.vcpu_unblocking(). Then
> provides the hook to set/unset the IsRunningBit here. Also, I no longer
> set the bit in the vcpu_load/vcpu_put.
> 
> If this is okay. I'll send out V2 soon.

Great, thanks.  Have you tried making the backing page point to the
kvm_lapic's regs?  If so, I think you're ready to send out v2 indeed.

Paolo

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PART1 RFC 6/9] svm: Add interrupt injection via AVIC
  2016-02-12 18:19       ` Paolo Bonzini
  2016-02-12 19:36         ` Suravee Suthikulpanit
@ 2016-02-19 11:57         ` Suravee Suthikulpanit
  1 sibling, 0 replies; 52+ messages in thread
From: Suravee Suthikulpanit @ 2016-02-19 11:57 UTC (permalink / raw)
  To: Paolo Bonzini, joro, alex.williamson, gleb
  Cc: kvm, linux-kernel, wei, sherry.hurwitz, Radim Krčmář

Hi Paolo,

On 2/13/16 01:19, Paolo Bonzini wrote:
>
>
> On 12/02/2016 17:21, Suravee Suthikulpanit wrote:
>> Hi Paolo,
>>
>> On 02/12/2016 10:55 PM, Paolo Bonzini wrote:
>>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>>>> index 4244c2b..2def290 100644
>>>>> --- a/arch/x86/kvm/x86.c
>>>>> +++ b/arch/x86/kvm/x86.c
>>>>> @@ -8087,7 +8087,9 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
>>>>>       if (is_guest_mode(vcpu) && kvm_x86_ops->check_nested_events)
>>>>>           kvm_x86_ops->check_nested_events(vcpu, false);
>>>>>
>>>>> -    return kvm_vcpu_running(vcpu) || kvm_vcpu_has_events(vcpu);
>>>>> +    return (kvm_vcpu_running(vcpu) || kvm_vcpu_has_events(vcpu) ||
>>>>> +        (kvm_x86_ops->apicv_intr_pending &&
>>>>> +         kvm_x86_ops->apicv_intr_pending(vcpu)));
>>>>>   }
>>> I think this is not necessary.  What you need is to make kvm_lapic's
>>> regs field point to the backing page.  Then when the processor writes to
>>> IRR, kvm_apic_has_interrupt (called through kvm_vcpu_has_events) will
>>> see it.
>>>
>>> avic_pending_cnt shouldn't be necessary either.
>>>
>>> Paolo

Actually, I also found out during another benchmark (running tar xf 
linux.tar.gz) on the VM w/ multiple cpus, that the performance is quite 
bad due to large amount of AVIC_INCOMP_IPI vmexit for to target not 
running. The same issue does not happen with 1 vcpu, or taskset the tar 
process to one vcpu, or if I put in the logic above in 
kvm_arch_vcpu_runnable() to delay the halting.

>>
>> So, the other thing I am using the avic_pending_cnt for is for the part
>> 2 of the series (to enable AVIC support in IOMMU) that I am planning to
>> send out later. However, it might be good to discuss this at this point.
>
> It's better to discuss it later.  For now, I would prefer the AVIC
> patches to be as clean as possible, and not know about the IOMMU at all.
>   Also, there are a lot of assumptions about how to use kvm_lapic's regs
> field for APIC virtualization---dating back to when Intel only
> virtualized the TPR field.  Deviating for that would be a recipe for
> trouble. :)
>
> Regarding the IOMMU, I'm actually very happy with the way the Intel VT-d
> posted interrupts patches worked out, so I would be even more happy if
> everything you do fits in the same scheme and reuses the same hooks! :D
>
>> When the IOMMU cannot inject interrupts into the guest vcpu due to it is
>> not running (therefore, it cannot doorbell the vcpu directly), it logs
>> the interrupt in the GA log buffer.
>>
>> Then it generates interrupt to
>> notify the IOMMU driver that it needs to handle the log entry. Here, the
>> IOMMU driver will end up notifying the SVM to scheduling the VCPU in to
>> process interrupt.
>>
>> Here, I have run into issue where the vcpu often goes into idle (i.e.
>> scheduled out), and ended up causing IOMMU to generate a lot of the
>> entries in the GA log. This really hurts device pass-through performance
>> (e.g. for XGBE NIC).
>>
>> So, what I ended up experimenting with is to set the avic_pending_cnt to
>> a larger value (i.e. avic_ga_log_threshold) whenever we processing the
>> GA log entry. The intention is to delay the vcpu schedule out in
>> expecting that there might be more interrupts coming in soon. I also
>> make this threshold value tunable as a module_param.
>>
>> This actually works well in my experiment, where I can actually get
>> about 5% speed up in my netperf test on XGBE NIC pass-through test.
>> However, I am not sure if this is an acceptable approach. Actually, I
>> think it's similar to the halt_poll_ns, but specifically for IOMMU GA
>> log in this case.
>
> Have you retested now that the halt_poll_ns mechanism is dynamic and
> enabled by default?  If I read patch 9 right, halt_poll_ns would delay
> vcpu_put and IsRunning=0.  Hopefully this is enough to avoid this kind
> of notification and make the issue moot.
>
> Paolo
>
I assume that the halt_poll_ns mechanism would have been already enabled 
by default as off commit 93c9247cfd1e608e262274616a28632681abb2d3.

So, I have tried playing with halt_poll_ns, halt_poll_ns_[grow|shrink], 
but it doesn't seem to help much.

Thanks,
Suravee

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PART1 RFC 5/9] svm: Add VMEXIT handlers for AVIC
  2016-02-19 11:44                               ` Paolo Bonzini
@ 2016-02-19 11:59                                 ` Suravee Suthikulpanit
  0 siblings, 0 replies; 52+ messages in thread
From: Suravee Suthikulpanit @ 2016-02-19 11:59 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář
  Cc: joro, alex.williamson, gleb, kvm, linux-kernel, wei,
	sherry.hurwitz, Wu, Feng

Hi

On 2/19/16 18:44, Paolo Bonzini wrote:
>
>
> On 19/02/2016 12:39, Suravee Suthikulpanit wrote:
>>
>> Good point. I have made the change by introducing new function pointer,
>> kvm_x86_ops.vcpu_blocking() and kvm_x86_ops.vcpu_unblocking(). Then
>> provides the hook to set/unset the IsRunningBit here. Also, I no longer
>> set the bit in the vcpu_load/vcpu_put.
>>
>> If this is okay. I'll send out V2 soon.
>
> Great, thanks.  Have you tried making the backing page point to the
> kvm_lapic's regs?  If so, I think you're ready to send out v2 indeed.
>
> Paolo
>

Yes, this also done.  It works quite well. I was just about to reply to 
you regarding this change. (Having trouble looking for the related 
message in the thread :) ).

Thanks,
Suravee

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PART1 RFC 5/9] svm: Add VMEXIT handlers for AVIC
  2016-02-18 17:18                           ` Paolo Bonzini
  2016-02-19 11:39                             ` Suravee Suthikulpanit
@ 2016-03-03 10:42                             ` Suravee Suthikulpanit
  2016-03-03 10:50                               ` Paolo Bonzini
  1 sibling, 1 reply; 52+ messages in thread
From: Suravee Suthikulpanit @ 2016-03-03 10:42 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář
  Cc: joro, alex.williamson, gleb, kvm, linux-kernel, wei,
	sherry.hurwitz, Wu, Feng

Hi

On 02/19/2016 12:18 AM, Paolo Bonzini wrote:
>
>
> On 18/02/2016 17:27, Radim Krčmář wrote:
>> 2016-02-18 16:53+0100, Paolo Bonzini:
>>> Patch 9 is okay, but it is also necessary to clear IsRunning in
>>> kvm_arch_vcpu_blocking and set it in kvm_arch_vcpu_unblocking.  In
>>> addition, vcpu_put/vcpu_load should not modify IsRunning between
>>> kvm_arch_vcpu_blocking and kvm_arch_vcpu_unblocking.  Do you agree?
>>
>> Yes.
>>
>> I think we don't need to clear IsRunning on preemption, which would
>> simplify the protection.  (I haven't thought much about userspace exit,
>> so maybe we could skip that one as well, but we don't need to now.)
>>
>> The reason for that is that KVM knows that the VCPU was scheduled out,
>> so it couldn't do much in the AVIC VMEXIT.
>> (KVM could force scheduler to pritioritize the VCPU, but our kick
>>   doesn't do that now and it seems like a bad idea.)
>>
>> Does it seem reasonable?
>
> Yes, and in fact it wouldn't need to clear and set IsRunning on
> vcpu_put/vcpu_load; only on vcpu_blocking/vcpu_unblocking.
>
> The IsRunning flag is more of a IsNotHalted flag, in the end.
>
> Paolo
>

In facts, instead of setting up the vAPIC backing page address when 
calling kvm_arch_vcpu_load(), we should be able to do it when calling 
kvm_arch_vcpu_sched_in(). This seems more appropriate since the 
kvm_arch_vcpu_load() is also called in many unnecessary occasions via 
vcpu_load() (in the arch/x86/kvm/x86.c). The same goes for the 
kvm_arch_vcpu_put().

However, there is no kvm_arch_vcpu_sched_out(). But that can be added 
easily.

What do you think?

Suravee

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PART1 RFC 5/9] svm: Add VMEXIT handlers for AVIC
  2016-03-03 10:42                             ` Suravee Suthikulpanit
@ 2016-03-03 10:50                               ` Paolo Bonzini
  0 siblings, 0 replies; 52+ messages in thread
From: Paolo Bonzini @ 2016-03-03 10:50 UTC (permalink / raw)
  To: Suravee Suthikulpanit, Radim Krčmář
  Cc: joro, alex.williamson, gleb, kvm, linux-kernel, wei,
	sherry.hurwitz, Wu, Feng



On 03/03/2016 11:42, Suravee Suthikulpanit wrote:
> In facts, instead of setting up the vAPIC backing page address when
> calling kvm_arch_vcpu_load(), we should be able to do it when calling
> kvm_arch_vcpu_sched_in(). This seems more appropriate since the
> kvm_arch_vcpu_load() is also called in many unnecessary occasions via
> vcpu_load() (in the arch/x86/kvm/x86.c). The same goes for the
> kvm_arch_vcpu_put().
> 
> However, there is no kvm_arch_vcpu_sched_out(). But that can be added
> easily.
> 
> What do you think?

How much code is involved?  It seems an unnecessary complication...
Unless you can profile a difference, I would leave it in
kvm_arch_vcpu_load()/kvm_arch_vcpu_put().

Note that sched_in and sched_out are not called once per invocation of
KVM_RUN---only through the preempt notifiers.  So I'm not sure it is
enough to use sched_in and sched_out.

Paolo

^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2016-03-03 10:50 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-12 13:59 [PART1 RFC 0/9] KVM: x86: Introduce SVM AVIC support Suravee Suthikulpanit
2016-02-12 13:59 ` [PART1 RFC 1/9] KVM: x86: Misc LAPIC changes to exposes helper functions Suravee Suthikulpanit
2016-02-12 13:59 ` [PART1 RFC 2/9] svm: Introduce new AVIC VMCB registers Suravee Suthikulpanit
2016-02-12 13:59 ` [PART1 RFC 3/9] svm: clean up V_TPR, V_IRQ, V_INTR_PRIO, and V_INTR_MASKING Suravee Suthikulpanit
2016-02-12 13:59 ` [PART1 RFC 4/9] KVM: x86: Detect and Initialize AVIC support Suravee Suthikulpanit
2016-02-12 14:13   ` Borislav Petkov
2016-02-12 15:46     ` Suravee Suthikulpanit
2016-02-12 13:59 ` [PART1 RFC 5/9] svm: Add VMEXIT handlers for AVIC Suravee Suthikulpanit
2016-02-12 15:38   ` Paolo Bonzini
2016-02-15 19:22     ` Radim Krčmář
2016-02-16  6:29     ` Suravee Suthikulpanit
2016-02-16 12:15       ` Paolo Bonzini
2016-02-16 14:13         ` Radim Krčmář
2016-02-16 16:56           ` Paolo Bonzini
2016-02-16 18:06             ` Radim Krčmář
2016-02-16 18:06               ` Radim Krčmář
2016-02-18  2:25               ` Suravee Suthikulpanit
2016-02-18 14:18                 ` Radim Krčmář
2016-02-18 14:51                   ` Paolo Bonzini
2016-02-18 15:43                     ` Radim Krčmář
2016-02-18 15:53                       ` Paolo Bonzini
2016-02-18 16:27                         ` Radim Krčmář
2016-02-18 17:18                           ` Paolo Bonzini
2016-02-19 11:39                             ` Suravee Suthikulpanit
2016-02-19 11:44                               ` Paolo Bonzini
2016-02-19 11:59                                 ` Suravee Suthikulpanit
2016-03-03 10:42                             ` Suravee Suthikulpanit
2016-03-03 10:50                               ` Paolo Bonzini
2016-02-19 11:32                   ` Suravee Suthikulpanit
2016-02-12 13:59 ` [PART1 RFC 6/9] svm: Add interrupt injection via AVIC Suravee Suthikulpanit
2016-02-12 14:16   ` Borislav Petkov
2016-02-12 15:54     ` Suravee Suthikulpanit
2016-02-12 17:14       ` Borislav Petkov
2016-02-12 18:21         ` Paolo Bonzini
2016-02-12 18:30           ` Borislav Petkov
2016-02-12 18:56             ` Paolo Bonzini
2016-02-12 19:33               ` Borislav Petkov
2016-02-16  7:50                 ` Ingo Molnar
2016-02-16  8:39                   ` [PATCH] x86/msr: Document msr-index.h rule for addition Borislav Petkov
2016-02-12 15:55   ` [PART1 RFC 6/9] svm: Add interrupt injection via AVIC Paolo Bonzini
2016-02-12 16:21     ` Suravee Suthikulpanit
2016-02-12 18:19       ` Paolo Bonzini
2016-02-12 19:36         ` Suravee Suthikulpanit
2016-02-19 11:57         ` Suravee Suthikulpanit
2016-02-12 13:59 ` [PART1 RFC 7/9] svm: Do not expose x2APIC when enable AVIC Suravee Suthikulpanit
2016-02-12 13:59 ` [PART1 RFC 8/9] svm: Do not intercept CR8 " Suravee Suthikulpanit
2016-02-12 15:48   ` Paolo Bonzini
2016-02-12 13:59 ` [PART1 RFC 9/9] svm: Manage vcpu load/unload " Suravee Suthikulpanit
2016-02-12 15:46   ` Paolo Bonzini
2016-02-12 18:13 ` [PART1 RFC 0/9] KVM: x86: Introduce SVM AVIC support Paolo Bonzini
2016-02-12 19:55   ` Suravee Suthikulpanit
2016-02-12 20:05     ` Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.