linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PART1 RFC v3 00/12] KVM: x86: Introduce SVM AVIC support
@ 2016-03-18  6:09 Suravee Suthikulpanit
  2016-03-18  6:09 ` [PART1 RFC v3 01/12] KVM: x86: Misc LAPIC changes to expose helper functions Suravee Suthikulpanit
                   ` (11 more replies)
  0 siblings, 12 replies; 44+ messages in thread
From: Suravee Suthikulpanit @ 2016-03-18  6:09 UTC (permalink / raw)
  To: pbonzini, rkrcmar, joro, bp, gleb, alex.williamson
  Cc: kvm, linux-kernel, wei, sherry.hurwitz, Suravee Suthikulpanit

CHANGES FROM RFCv2:
==================
(https://lkml.org/lkml/2016/3/4/746)
    * Do not use bit-field in VMCB.
    * Get rid of struct svm_vm_data, and consolidate the struct members
      into struct kvm_arch.
    * Do not abbreviate AVIC vmexit function and structure names.
    * Adding VM init/uninit function hooks and use it to initialize /
      uninitialize per-VM data structures.  
    * Simplify AVIC backing page allocation to use the emulated lapic
      register page.
    * Introduce kvm_vcpu_wake_up() (from Radim).
    * Clean up AVIC vmexit handling functions.
    * Replace pr_debug with new AVIC tracepoints.
    * Use per-vcpu AVIC enable check (besides the global avic flag).
    * Add smb_mb__after_atomic() in svm_deliver_avic_intr.
    * Get rid of iommu related code for now.

CHANGES FROM RFCv1:
==================
(https://lkml.org/lkml/2016/2/12/225)
    * Rebased from tip.git
    * Use the vAPIC backing page as the emulated LAPIC register page.
    * Clarified with HW engineer that Avic sets the IRR of all the cores
      targeted whether they are running or not. It sends doorbells to the
      ones that are running to get them to evaluate their new state.
      If one or more weren't running it does the VMEXIT to indicate that
      the host needs to go run the not running guests. It gives it the
      ID of one of the guests that it observed not running which should
      be enough of a hint for the hypervisor to track them all down.
    * Rewrite the logic for handling AVIC_INCOMPLETE_IPI #vmexit for
      IPI target not running case based on information above.
    * Rewrite the logic in avic_vcpu_load() and avic_set_running().
    * Rewrite the interrupt injection to remove the avic_pending_cnt
      variable.
    * Adding vcpu blocking/unblocking function hooks

GITHUB
======
Latest git tree can be found at:
    http://github.com/ssuthiku/linux.git    avic_part1_rfc_v3

OVERVIEW
========
This patch set is the first of the two-part patch series to introduce 
the new AMD Advance Virtual Interrupt Controller (AVIC) support.

Basically, SVM AVIC hardware virtualizes local APIC registers of each
vCPU via the virtual APIC (vAPIC) backing page. This allows guest access
to certain APIC registers without the need to emulate the hardware behavior
in the hypervisor. More information about AVIC can be found in the
AMD64 Architecture Programmer’s Manual Volume 2 - System Programming.

  http://support.amd.com/TechDocs/24593.pdf

For SVM AVIC, we extend the existing kvm_amd driver to:
  * Check CPUID to detect AVIC support in the processor
  * Program new fields in VMCB to enable AVIC
  * Introduce new AVIC data structures and add code to manage them
  * Handle two new AVIC #VMEXITs
  * Add new interrupt intjection code using vAPIC backing page
    instead of the existing V_IRQ, V_INTR_PRIO, V_INTR_VECTOR,
    and V_IGN_TPR fields

Currently, this patch series does not enable AVIC by default.
Users can enable SVM AVIC by specifying avic=1 during insmod kvm-amd.

Later, in part 2, we will introduce the IOMMU AVIC support, which
provides speed up for PCI device pass-through use case by allowing
the IOMMU hardware to inject interrupt directly into the guest via
the vAPIC backing page.

PERFORMANCE RESULTS
===================
Currently, AVIC is supported in the AMD family 15h models 6Xh
(Carrizo) processors. Therefore, it is used to collect the 
perforamance data shown below.

Generaly, SVM AVIC alone (w/o IOMMU AVIC) should provide speedup for
IPI interrupt since hypervisor does not require VMEXIT to inject
these interrupts. Also, it should speed up the case when hypervisor
wants to inject an interrupt into a running guest by setting the
corresponded IRR bit in the vAPIC backing page and trigger
AVIC_DOORBELL MSR.

IPI PERFORMANCE
===============

* BENCHMARK 1: HACKBENCH
For IPI, I have collected some performance number on 2 and 4 CPU running
hackbech with the following detail:

  hackbench -p -l 100000
  Running in process mode with 10 groups using 40 file descriptors each (== 400 tasks)
  Each sender will pass 100000 messages of 100 bytes

                |    2 vcpus     |    4 vcpus 
 ------------------------------------------------
         Vanila |  273.76        |  190.21	
  AVIC disabled |  260.51 (~5%)  |  184.40 (~5%)
           AVIC |  248.53 (~10%) |  155.01 (~20%)

OVERALL PERFORMANCE
===================
Enabling AVIC should helps speeding up workloads, which generate
large amount of interrupts. However, it requires additional logics to
maintain AVIC-specific data structures during vCPU load/unload
due to vcpu scheduling.

The goal is to minimize the overhead of AVIC in most cases, so that
we can achieve equivalent or improvement in overall performance when
enabling AVIC.

* BENCHMARK 1: TAR DECOMPRESSION
This test measures the average running time (of 10 runs) of the following
tar decompression command with 1, 2, and 4 vcpus.

   tar xf linux-4.3.3.tar.xz

                |      4 vcpus 
 ---------------------------------
         Vanila |  10.26  
  AVIC disabled |  10.10 (~1.5%)
           AVIC |  10.07 (~1.8%)

Note: The unit of result below is in seconds (lower is better). 

* BENCHMARK 2: NETPERF w/ virtual network
This test creates a virtual network by setting up bridge and tap device
on the host and pass it into the VM as virtio-net-pci device w/ vhost. 
Then it sets up netserver in the host machine, and run netperf
in the VM with following option:

  netperf -H <netserver ip> -l 60 -t TCP_RR -D 2

                |    1 vcpu      
 ------------------------------------
         Vanila |  21623.887
  AVIC disabled |  21538.09 (~-.4%)
           AVIC |  21712.68 (~0.4%)

Note: The unit of result below is trans/sec (higher is better).

Preliminary result of both benchmarks show AVIC performance are slightly
better than the other two cases.

CURRENT UNTESTED USE-CASES
===========================
    - VM Migration (work in progress)
    - Nested VM

Any feedback and comments are very much appreciated.

Thank you,
Suravee

Radim Krčmář (1):
  KVM: split kvm_vcpu_wake_up from kvm_vcpu_kick

Suravee Suthikulpanit (11):
  KVM: x86: Misc LAPIC changes to expose helper functions
  KVM: x86: Introducing kvm_x86_ops VM init/uninit hooks
  KVM: x86: Introducing kvm_x86_ops VCPU blocking/unblocking hooks
  svm: Introduce new AVIC VMCB registers
  KVM: x86: Detect and Initialize AVIC support
  svm: Add interrupt injection via AVIC
  KVM: x86: Add trace events for AVIC
  svm: Add VMEXIT handlers for AVIC
  svm: Do not expose x2APIC when enable AVIC
  svm: Do not intercept CR8 when enable AVIC
  svm: Manage vcpu load/unload when enable AVIC

 arch/x86/include/asm/kvm_host.h |  28 +-
 arch/x86/include/asm/svm.h      |  12 +-
 arch/x86/include/uapi/asm/svm.h |   9 +-
 arch/x86/kvm/lapic.c            | 110 ++++---
 arch/x86/kvm/lapic.h            |  11 +
 arch/x86/kvm/svm.c              | 707 +++++++++++++++++++++++++++++++++++++++-
 arch/x86/kvm/trace.h            |  57 ++++
 arch/x86/kvm/x86.c              |  12 +-
 include/linux/kvm_host.h        |   1 +
 virt/kvm/kvm_main.c             |  27 +-
 10 files changed, 891 insertions(+), 83 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PART1 RFC v3 01/12] KVM: x86: Misc LAPIC changes to expose helper functions
  2016-03-18  6:09 [PART1 RFC v3 00/12] KVM: x86: Introduce SVM AVIC support Suravee Suthikulpanit
@ 2016-03-18  6:09 ` Suravee Suthikulpanit
  2016-03-18 11:16   ` Paolo Bonzini
  2016-03-18  6:09 ` [PART1 RFC v3 02/12] KVM: x86: Introducing kvm_x86_ops VM init/uninit hooks Suravee Suthikulpanit
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 44+ messages in thread
From: Suravee Suthikulpanit @ 2016-03-18  6:09 UTC (permalink / raw)
  To: pbonzini, rkrcmar, joro, bp, gleb, alex.williamson
  Cc: kvm, linux-kernel, wei, sherry.hurwitz, Suravee Suthikulpanit

From: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>

Exporting LAPIC utility functions and macros for re-use in SVM code.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
 arch/x86/kvm/lapic.c | 110 ++++++++++++++++++++++++++-------------------------
 arch/x86/kvm/lapic.h |   8 ++++
 2 files changed, 65 insertions(+), 53 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 3a045f3..6da1628 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -59,9 +59,8 @@
 /* #define apic_debug(fmt,arg...) printk(KERN_WARNING fmt,##arg) */
 #define apic_debug(fmt, arg...)
 
-#define APIC_LVT_NUM			6
 /* 14 is the version for Xeon and Pentium 8.4.8*/
-#define APIC_VERSION			(0x14UL | ((APIC_LVT_NUM - 1) << 16))
+#define APIC_VERSION			(0x14UL | ((KVM_APIC_LVT_NUM - 1) << 16))
 #define LAPIC_MMIO_LENGTH		(1 << 12)
 /* followed define is not in apicdef.h */
 #define APIC_SHORT_MASK			0xc0000
@@ -76,10 +75,11 @@
 #define VEC_POS(v) ((v) & (32 - 1))
 #define REG_POS(v) (((v) >> 5) << 4)
 
-static inline void apic_set_reg(struct kvm_lapic *apic, int reg_off, u32 val)
+void kvm_lapic_set_reg(struct kvm_lapic *apic, int reg_off, u32 val)
 {
 	*((u32 *) (apic->regs + reg_off)) = val;
 }
+EXPORT_SYMBOL_GPL(kvm_lapic_set_reg);
 
 static inline int apic_test_vector(int vec, void *bitmap)
 {
@@ -94,10 +94,11 @@ bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector)
 		apic_test_vector(vector, apic->regs + APIC_IRR);
 }
 
-static inline void apic_set_vector(int vec, void *bitmap)
+void kvm_lapic_set_vector(int vec, void *bitmap)
 {
 	set_bit(VEC_POS(vec), (bitmap) + REG_POS(vec));
 }
+EXPORT_SYMBOL_GPL(kvm_lapic_set_vector);
 
 static inline void apic_clear_vector(int vec, void *bitmap)
 {
@@ -212,7 +213,7 @@ static inline void apic_set_spiv(struct kvm_lapic *apic, u32 val)
 {
 	bool enabled = val & APIC_SPIV_APIC_ENABLED;
 
-	apic_set_reg(apic, APIC_SPIV, val);
+	kvm_lapic_set_reg(apic, APIC_SPIV, val);
 
 	if (enabled != apic->sw_enabled) {
 		apic->sw_enabled = enabled;
@@ -226,13 +227,13 @@ static inline void apic_set_spiv(struct kvm_lapic *apic, u32 val)
 
 static inline void kvm_apic_set_id(struct kvm_lapic *apic, u8 id)
 {
-	apic_set_reg(apic, APIC_ID, id << 24);
+	kvm_lapic_set_reg(apic, APIC_ID, id << 24);
 	recalculate_apic_map(apic->vcpu->kvm);
 }
 
 static inline void kvm_apic_set_ldr(struct kvm_lapic *apic, u32 id)
 {
-	apic_set_reg(apic, APIC_LDR, id);
+	kvm_lapic_set_reg(apic, APIC_LDR, id);
 	recalculate_apic_map(apic->vcpu->kvm);
 }
 
@@ -240,8 +241,8 @@ static inline void kvm_apic_set_x2apic_id(struct kvm_lapic *apic, u8 id)
 {
 	u32 ldr = ((id >> 4) << 16) | (1 << (id & 0xf));
 
-	apic_set_reg(apic, APIC_ID, id << 24);
-	apic_set_reg(apic, APIC_LDR, ldr);
+	kvm_lapic_set_reg(apic, APIC_ID, id << 24);
+	kvm_lapic_set_reg(apic, APIC_LDR, ldr);
 	recalculate_apic_map(apic->vcpu->kvm);
 }
 
@@ -287,10 +288,10 @@ void kvm_apic_set_version(struct kvm_vcpu *vcpu)
 	feat = kvm_find_cpuid_entry(apic->vcpu, 0x1, 0);
 	if (feat && (feat->ecx & (1 << (X86_FEATURE_X2APIC & 31))))
 		v |= APIC_LVR_DIRECTED_EOI;
-	apic_set_reg(apic, APIC_LVR, v);
+	kvm_lapic_set_reg(apic, APIC_LVR, v);
 }
 
-static const unsigned int apic_lvt_mask[APIC_LVT_NUM] = {
+static const unsigned int apic_lvt_mask[KVM_APIC_LVT_NUM] = {
 	LVT_MASK ,      /* part LVTT mask, timer mode mask added at runtime */
 	LVT_MASK | APIC_MODE_MASK,	/* LVTTHMR */
 	LVT_MASK | APIC_MODE_MASK,	/* LVTPC */
@@ -351,7 +352,7 @@ EXPORT_SYMBOL_GPL(kvm_apic_update_irr);
 
 static inline void apic_set_irr(int vec, struct kvm_lapic *apic)
 {
-	apic_set_vector(vec, apic->regs + APIC_IRR);
+	kvm_lapic_set_vector(vec, apic->regs + APIC_IRR);
 	/*
 	 * irr_pending must be true if any interrupt is pending; set it after
 	 * APIC_IRR to avoid race with apic_clear_irr
@@ -569,7 +570,7 @@ static void apic_update_ppr(struct kvm_lapic *apic)
 		   apic, ppr, isr, isrv);
 
 	if (old_ppr != ppr) {
-		apic_set_reg(apic, APIC_PROCPRI, ppr);
+		kvm_lapic_set_reg(apic, APIC_PROCPRI, ppr);
 		if (ppr < old_ppr)
 			kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
 	}
@@ -577,7 +578,7 @@ static void apic_update_ppr(struct kvm_lapic *apic)
 
 static void apic_set_tpr(struct kvm_lapic *apic, u32 tpr)
 {
-	apic_set_reg(apic, APIC_TASKPRI, tpr);
+	kvm_lapic_set_reg(apic, APIC_TASKPRI, tpr);
 	apic_update_ppr(apic);
 }
 
@@ -674,6 +675,7 @@ bool kvm_apic_match_dest(struct kvm_vcpu *vcpu, struct kvm_lapic *source,
 		return false;
 	}
 }
+EXPORT_SYMBOL_GPL(kvm_apic_match_dest);
 
 bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src,
 		struct kvm_lapic_irq *irq, int *r, unsigned long *dest_map)
@@ -844,7 +846,7 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
 
 		if (apic_test_vector(vector, apic->regs + APIC_TMR) != !!trig_mode) {
 			if (trig_mode)
-				apic_set_vector(vector, apic->regs + APIC_TMR);
+				kvm_lapic_set_vector(vector, apic->regs + APIC_TMR);
 			else
 				apic_clear_vector(vector, apic->regs + APIC_TMR);
 		}
@@ -1109,7 +1111,7 @@ static inline struct kvm_lapic *to_lapic(struct kvm_io_device *dev)
 	return container_of(dev, struct kvm_lapic, dev);
 }
 
-static int apic_reg_read(struct kvm_lapic *apic, u32 offset, int len,
+int kvm_lapic_reg_read(struct kvm_lapic *apic, u32 offset, int len,
 		void *data)
 {
 	unsigned char alignment = offset & 0xf;
@@ -1146,6 +1148,7 @@ static int apic_reg_read(struct kvm_lapic *apic, u32 offset, int len,
 	}
 	return 0;
 }
+EXPORT_SYMBOL_GPL(kvm_lapic_reg_read);
 
 static int apic_mmio_in_range(struct kvm_lapic *apic, gpa_t addr)
 {
@@ -1163,7 +1166,7 @@ static int apic_mmio_read(struct kvm_vcpu *vcpu, struct kvm_io_device *this,
 	if (!apic_mmio_in_range(apic, address))
 		return -EOPNOTSUPP;
 
-	apic_reg_read(apic, offset, len, data);
+	kvm_lapic_reg_read(apic, offset, len, data);
 
 	return 0;
 }
@@ -1348,7 +1351,7 @@ static void apic_manage_nmi_watchdog(struct kvm_lapic *apic, u32 lvt0_val)
 	}
 }
 
-static int apic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
+int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
 {
 	int ret = 0;
 
@@ -1380,7 +1383,7 @@ static int apic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
 
 	case APIC_DFR:
 		if (!apic_x2apic_mode(apic)) {
-			apic_set_reg(apic, APIC_DFR, val | 0x0FFFFFFF);
+			kvm_lapic_set_reg(apic, APIC_DFR, val | 0x0FFFFFFF);
 			recalculate_apic_map(apic->vcpu->kvm);
 		} else
 			ret = 1;
@@ -1395,10 +1398,10 @@ static int apic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
 			int i;
 			u32 lvt_val;
 
-			for (i = 0; i < APIC_LVT_NUM; i++) {
+			for (i = 0; i < KVM_APIC_LVT_NUM; i++) {
 				lvt_val = kvm_apic_get_reg(apic,
 						       APIC_LVTT + 0x10 * i);
-				apic_set_reg(apic, APIC_LVTT + 0x10 * i,
+				kvm_lapic_set_reg(apic, APIC_LVTT + 0x10 * i,
 					     lvt_val | APIC_LVT_MASKED);
 			}
 			apic_update_lvtt(apic);
@@ -1409,14 +1412,14 @@ static int apic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
 	}
 	case APIC_ICR:
 		/* No delay here, so we always clear the pending bit */
-		apic_set_reg(apic, APIC_ICR, val & ~(1 << 12));
+		kvm_lapic_set_reg(apic, APIC_ICR, val & ~(1 << 12));
 		apic_send_ipi(apic);
 		break;
 
 	case APIC_ICR2:
 		if (!apic_x2apic_mode(apic))
 			val &= 0xff000000;
-		apic_set_reg(apic, APIC_ICR2, val);
+		kvm_lapic_set_reg(apic, APIC_ICR2, val);
 		break;
 
 	case APIC_LVT0:
@@ -1430,7 +1433,7 @@ static int apic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
 			val |= APIC_LVT_MASKED;
 
 		val &= apic_lvt_mask[(reg - APIC_LVTT) >> 4];
-		apic_set_reg(apic, reg, val);
+		kvm_lapic_set_reg(apic, reg, val);
 
 		break;
 
@@ -1438,7 +1441,7 @@ static int apic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
 		if (!kvm_apic_sw_enabled(apic))
 			val |= APIC_LVT_MASKED;
 		val &= (apic_lvt_mask[0] | apic->lapic_timer.timer_mode_mask);
-		apic_set_reg(apic, APIC_LVTT, val);
+		kvm_lapic_set_reg(apic, APIC_LVTT, val);
 		apic_update_lvtt(apic);
 		break;
 
@@ -1447,14 +1450,14 @@ static int apic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
 			break;
 
 		hrtimer_cancel(&apic->lapic_timer.timer);
-		apic_set_reg(apic, APIC_TMICT, val);
+		kvm_lapic_set_reg(apic, APIC_TMICT, val);
 		start_apic_timer(apic);
 		break;
 
 	case APIC_TDCR:
 		if (val & 4)
 			apic_debug("KVM_WRITE:TDCR %x\n", val);
-		apic_set_reg(apic, APIC_TDCR, val);
+		kvm_lapic_set_reg(apic, APIC_TDCR, val);
 		update_divide_count(apic);
 		break;
 
@@ -1467,7 +1470,7 @@ static int apic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
 
 	case APIC_SELF_IPI:
 		if (apic_x2apic_mode(apic)) {
-			apic_reg_write(apic, APIC_ICR, 0x40000 | (val & 0xff));
+			kvm_lapic_reg_write(apic, APIC_ICR, 0x40000 | (val & 0xff));
 		} else
 			ret = 1;
 		break;
@@ -1479,6 +1482,7 @@ static int apic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
 		apic_debug("Local APIC Write to read-only register %x\n", reg);
 	return ret;
 }
+EXPORT_SYMBOL_GPL(kvm_lapic_reg_write);
 
 static int apic_mmio_write(struct kvm_vcpu *vcpu, struct kvm_io_device *this,
 			    gpa_t address, int len, const void *data)
@@ -1508,7 +1512,7 @@ static int apic_mmio_write(struct kvm_vcpu *vcpu, struct kvm_io_device *this,
 		apic_debug("%s: offset 0x%x with length 0x%x, and value is "
 			   "0x%x\n", __func__, offset, len, val);
 
-	apic_reg_write(apic, offset & 0xff0, val);
+	kvm_lapic_reg_write(apic, offset & 0xff0, val);
 
 	return 0;
 }
@@ -1516,7 +1520,7 @@ static int apic_mmio_write(struct kvm_vcpu *vcpu, struct kvm_io_device *this,
 void kvm_lapic_set_eoi(struct kvm_vcpu *vcpu)
 {
 	if (kvm_vcpu_has_lapic(vcpu))
-		apic_reg_write(vcpu->arch.apic, APIC_EOI, 0);
+		kvm_lapic_reg_write(vcpu->arch.apic, APIC_EOI, 0);
 }
 EXPORT_SYMBOL_GPL(kvm_lapic_set_eoi);
 
@@ -1528,10 +1532,10 @@ void kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset)
 	/* hw has done the conditional check and inst decode */
 	offset &= 0xff0;
 
-	apic_reg_read(vcpu->arch.apic, offset, 4, &val);
+	kvm_lapic_reg_read(vcpu->arch.apic, offset, 4, &val);
 
 	/* TODO: optimize to just emulate side effect w/o one more write */
-	apic_reg_write(vcpu->arch.apic, offset, val);
+	kvm_lapic_reg_write(vcpu->arch.apic, offset, val);
 }
 EXPORT_SYMBOL_GPL(kvm_apic_write_nodecode);
 
@@ -1670,28 +1674,28 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event)
 		kvm_apic_set_id(apic, vcpu->vcpu_id);
 	kvm_apic_set_version(apic->vcpu);
 
-	for (i = 0; i < APIC_LVT_NUM; i++)
-		apic_set_reg(apic, APIC_LVTT + 0x10 * i, APIC_LVT_MASKED);
+	for (i = 0; i < KVM_APIC_LVT_NUM; i++)
+		kvm_lapic_set_reg(apic, APIC_LVTT + 0x10 * i, APIC_LVT_MASKED);
 	apic_update_lvtt(apic);
 	if (kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_LINT0_REENABLED))
-		apic_set_reg(apic, APIC_LVT0,
+		kvm_lapic_set_reg(apic, APIC_LVT0,
 			     SET_APIC_DELIVERY_MODE(0, APIC_MODE_EXTINT));
 	apic_manage_nmi_watchdog(apic, kvm_apic_get_reg(apic, APIC_LVT0));
 
-	apic_set_reg(apic, APIC_DFR, 0xffffffffU);
+	kvm_lapic_set_reg(apic, APIC_DFR, 0xffffffffU);
 	apic_set_spiv(apic, 0xff);
-	apic_set_reg(apic, APIC_TASKPRI, 0);
+	kvm_lapic_set_reg(apic, APIC_TASKPRI, 0);
 	if (!apic_x2apic_mode(apic))
 		kvm_apic_set_ldr(apic, 0);
-	apic_set_reg(apic, APIC_ESR, 0);
-	apic_set_reg(apic, APIC_ICR, 0);
-	apic_set_reg(apic, APIC_ICR2, 0);
-	apic_set_reg(apic, APIC_TDCR, 0);
-	apic_set_reg(apic, APIC_TMICT, 0);
+	kvm_lapic_set_reg(apic, APIC_ESR, 0);
+	kvm_lapic_set_reg(apic, APIC_ICR, 0);
+	kvm_lapic_set_reg(apic, APIC_ICR2, 0);
+	kvm_lapic_set_reg(apic, APIC_TDCR, 0);
+	kvm_lapic_set_reg(apic, APIC_TMICT, 0);
 	for (i = 0; i < 8; i++) {
-		apic_set_reg(apic, APIC_IRR + 0x10 * i, 0);
-		apic_set_reg(apic, APIC_ISR + 0x10 * i, 0);
-		apic_set_reg(apic, APIC_TMR + 0x10 * i, 0);
+		kvm_lapic_set_reg(apic, APIC_IRR + 0x10 * i, 0);
+		kvm_lapic_set_reg(apic, APIC_ISR + 0x10 * i, 0);
+		kvm_lapic_set_reg(apic, APIC_TMR + 0x10 * i, 0);
 	}
 	apic->irr_pending = vcpu->arch.apicv_active;
 	apic->isr_count = vcpu->arch.apicv_active ? 1 : 0;
@@ -2073,8 +2077,8 @@ int kvm_x2apic_msr_write(struct kvm_vcpu *vcpu, u32 msr, u64 data)
 
 	/* if this is ICR write vector before command */
 	if (reg == APIC_ICR)
-		apic_reg_write(apic, APIC_ICR2, (u32)(data >> 32));
-	return apic_reg_write(apic, reg, (u32)data);
+		kvm_lapic_reg_write(apic, APIC_ICR2, (u32)(data >> 32));
+	return kvm_lapic_reg_write(apic, reg, (u32)data);
 }
 
 int kvm_x2apic_msr_read(struct kvm_vcpu *vcpu, u32 msr, u64 *data)
@@ -2091,10 +2095,10 @@ int kvm_x2apic_msr_read(struct kvm_vcpu *vcpu, u32 msr, u64 *data)
 		return 1;
 	}
 
-	if (apic_reg_read(apic, reg, 4, &low))
+	if (kvm_lapic_reg_read(apic, reg, 4, &low))
 		return 1;
 	if (reg == APIC_ICR)
-		apic_reg_read(apic, APIC_ICR2, 4, &high);
+		kvm_lapic_reg_read(apic, APIC_ICR2, 4, &high);
 
 	*data = (((u64)high) << 32) | low;
 
@@ -2110,8 +2114,8 @@ int kvm_hv_vapic_msr_write(struct kvm_vcpu *vcpu, u32 reg, u64 data)
 
 	/* if this is ICR write vector before command */
 	if (reg == APIC_ICR)
-		apic_reg_write(apic, APIC_ICR2, (u32)(data >> 32));
-	return apic_reg_write(apic, reg, (u32)data);
+		kvm_lapic_reg_write(apic, APIC_ICR2, (u32)(data >> 32));
+	return kvm_lapic_reg_write(apic, reg, (u32)data);
 }
 
 int kvm_hv_vapic_msr_read(struct kvm_vcpu *vcpu, u32 reg, u64 *data)
@@ -2122,10 +2126,10 @@ int kvm_hv_vapic_msr_read(struct kvm_vcpu *vcpu, u32 reg, u64 *data)
 	if (!kvm_vcpu_has_lapic(vcpu))
 		return 1;
 
-	if (apic_reg_read(apic, reg, 4, &low))
+	if (kvm_lapic_reg_read(apic, reg, 4, &low))
 		return 1;
 	if (reg == APIC_ICR)
-		apic_reg_read(apic, APIC_ICR2, 4, &high);
+		kvm_lapic_reg_read(apic, APIC_ICR2, 4, &high);
 
 	*data = (((u64)high) << 32) | low;
 
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 41bdb35..7bf8184 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -7,6 +7,7 @@
 
 #define KVM_APIC_INIT		0
 #define KVM_APIC_SIPI		1
+#define KVM_APIC_LVT_NUM	6
 
 struct kvm_timer {
 	struct hrtimer timer;
@@ -56,6 +57,12 @@ void kvm_lapic_set_eoi(struct kvm_vcpu *vcpu);
 void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value);
 u64 kvm_lapic_get_base(struct kvm_vcpu *vcpu);
 void kvm_apic_set_version(struct kvm_vcpu *vcpu);
+void kvm_lapic_set_vector(int vec, void *bitmap);
+int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val);
+int kvm_lapic_reg_read(struct kvm_lapic *apic, u32 offset, int len,
+		       void *data);
+bool kvm_apic_match_dest(struct kvm_vcpu *vcpu, struct kvm_lapic *source,
+			   int short_hand, unsigned int dest, int dest_mode);
 
 void __kvm_apic_update_irr(u32 *pir, void *regs);
 void kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir);
@@ -96,6 +103,7 @@ static inline bool kvm_hv_vapic_assist_page_enabled(struct kvm_vcpu *vcpu)
 int kvm_lapic_enable_pv_eoi(struct kvm_vcpu *vcpu, u64 data);
 void kvm_lapic_init(void);
 
+void kvm_lapic_set_reg(struct kvm_lapic *apic, int reg_off, u32 val);
 static inline u32 kvm_apic_get_reg(struct kvm_lapic *apic, int reg_off)
 {
 	        return *((u32 *) (apic->regs + reg_off));
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PART1 RFC v3 02/12] KVM: x86: Introducing kvm_x86_ops VM init/uninit hooks
  2016-03-18  6:09 [PART1 RFC v3 00/12] KVM: x86: Introduce SVM AVIC support Suravee Suthikulpanit
  2016-03-18  6:09 ` [PART1 RFC v3 01/12] KVM: x86: Misc LAPIC changes to expose helper functions Suravee Suthikulpanit
@ 2016-03-18  6:09 ` Suravee Suthikulpanit
  2016-03-18 10:11   ` Paolo Bonzini
  2016-03-18  6:09 ` [PART1 RFC v3 03/12] KVM: x86: Introducing kvm_x86_ops VCPU blocking/unblocking hooks Suravee Suthikulpanit
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 44+ messages in thread
From: Suravee Suthikulpanit @ 2016-03-18  6:09 UTC (permalink / raw)
  To: pbonzini, rkrcmar, joro, bp, gleb, alex.williamson
  Cc: kvm, linux-kernel, wei, sherry.hurwitz, Suravee Suthikulpanit,
	Suravee Suthikulpanit

Adding function pointers in struct kvm_x86_ops for processor-specific
layer to provide hooks for when KVM initialize and un-initialize VM.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
 arch/x86/include/asm/kvm_host.h |  3 +++
 arch/x86/kvm/x86.c              | 10 +++++++++-
 virt/kvm/kvm_main.c             |  8 ++++----
 3 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 44adbb8..4b0dd0f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -828,6 +828,9 @@ struct kvm_x86_ops {
 	bool (*cpu_has_high_real_mode_segbase)(void);
 	void (*cpuid_update)(struct kvm_vcpu *vcpu);
 
+	int (*vm_init)(struct kvm *kvm);
+	void (*vm_uninit)(struct kvm *kvm);
+
 	/* Create, but do not attach this VCPU */
 	struct kvm_vcpu *(*vcpu_create)(struct kvm *kvm, unsigned id);
 	void (*vcpu_free)(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 429c3f5..4d2961d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7700,6 +7700,8 @@ void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu)
 
 int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 {
+	int ret = 0;
+
 	if (type)
 		return -EINVAL;
 
@@ -7724,7 +7726,10 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	INIT_DELAYED_WORK(&kvm->arch.kvmclock_update_work, kvmclock_update_fn);
 	INIT_DELAYED_WORK(&kvm->arch.kvmclock_sync_work, kvmclock_sync_fn);
 
-	return 0;
+	if (kvm_x86_ops->vm_init)
+		ret = kvm_x86_ops->vm_init(kvm);
+
+	return ret;
 }
 
 static void kvm_unload_vcpu_mmu(struct kvm_vcpu *vcpu)
@@ -7751,6 +7756,9 @@ static void kvm_free_vcpus(struct kvm *kvm)
 	kvm_for_each_vcpu(i, vcpu, kvm)
 		kvm_arch_vcpu_free(vcpu);
 
+	if (kvm_x86_ops->vm_uninit)
+		kvm_x86_ops->vm_uninit(kvm);
+
 	mutex_lock(&kvm->lock);
 	for (i = 0; i < atomic_read(&kvm->online_vcpus); i++)
 		kvm->vcpus[i] = NULL;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 1ca0258..5460325 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -536,10 +536,6 @@ static struct kvm *kvm_create_vm(unsigned long type)
 	if (!kvm)
 		return ERR_PTR(-ENOMEM);
 
-	r = kvm_arch_init_vm(kvm, type);
-	if (r)
-		goto out_err_no_disable;
-
 	r = hardware_enable_all();
 	if (r)
 		goto out_err_no_disable;
@@ -578,6 +574,10 @@ static struct kvm *kvm_create_vm(unsigned long type)
 	atomic_set(&kvm->users_count, 1);
 	INIT_LIST_HEAD(&kvm->devices);
 
+	r = kvm_arch_init_vm(kvm, type);
+	if (r)
+		goto out_err;
+
 	r = kvm_init_mmu_notifier(kvm);
 	if (r)
 		goto out_err;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PART1 RFC v3 03/12] KVM: x86: Introducing kvm_x86_ops VCPU blocking/unblocking hooks
  2016-03-18  6:09 [PART1 RFC v3 00/12] KVM: x86: Introduce SVM AVIC support Suravee Suthikulpanit
  2016-03-18  6:09 ` [PART1 RFC v3 01/12] KVM: x86: Misc LAPIC changes to expose helper functions Suravee Suthikulpanit
  2016-03-18  6:09 ` [PART1 RFC v3 02/12] KVM: x86: Introducing kvm_x86_ops VM init/uninit hooks Suravee Suthikulpanit
@ 2016-03-18  6:09 ` Suravee Suthikulpanit
  2016-03-18 10:11   ` Paolo Bonzini
  2016-03-18  6:09 ` [PART1 RFC v3 04/12] KVM: split kvm_vcpu_wake_up from kvm_vcpu_kick Suravee Suthikulpanit
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 44+ messages in thread
From: Suravee Suthikulpanit @ 2016-03-18  6:09 UTC (permalink / raw)
  To: pbonzini, rkrcmar, joro, bp, gleb, alex.williamson
  Cc: kvm, linux-kernel, wei, sherry.hurwitz, Suravee Suthikulpanit,
	Suravee Suthikulpanit

Adding new function pointer in struct kvm_x86_ops, and calling them
from the kvm_arch_vcpu[blocking/unblocking].

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
 arch/x86/include/asm/kvm_host.h | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4b0dd0f..87eac2a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -972,6 +972,10 @@ struct kvm_x86_ops {
 	 */
 	int (*pre_block)(struct kvm_vcpu *vcpu);
 	void (*post_block)(struct kvm_vcpu *vcpu);
+
+	void (*vcpu_blocking)(struct kvm_vcpu *vcpu);
+	void (*vcpu_unblocking)(struct kvm_vcpu *vcpu);
+
 	int (*update_pi_irte)(struct kvm *kvm, unsigned int host_irq,
 			      uint32_t guest_irq, bool set);
 };
@@ -1323,7 +1327,16 @@ bool kvm_intr_is_single_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq,
 void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
 		     struct kvm_lapic_irq *irq);
 
-static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
-static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
+static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu)
+{
+	if (kvm_x86_ops->vcpu_blocking)
+		kvm_x86_ops->vcpu_blocking(vcpu);
+}
+
+static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu)
+{
+	if (kvm_x86_ops->vcpu_unblocking)
+		kvm_x86_ops->vcpu_unblocking(vcpu);
+}
 
 #endif /* _ASM_X86_KVM_HOST_H */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PART1 RFC v3 04/12] KVM: split kvm_vcpu_wake_up from kvm_vcpu_kick
  2016-03-18  6:09 [PART1 RFC v3 00/12] KVM: x86: Introduce SVM AVIC support Suravee Suthikulpanit
                   ` (2 preceding siblings ...)
  2016-03-18  6:09 ` [PART1 RFC v3 03/12] KVM: x86: Introducing kvm_x86_ops VCPU blocking/unblocking hooks Suravee Suthikulpanit
@ 2016-03-18  6:09 ` Suravee Suthikulpanit
  2016-03-18 10:11   ` Paolo Bonzini
  2016-03-18  6:09 ` [PART1 RFC v3 05/12] svm: Introduce new AVIC VMCB registers Suravee Suthikulpanit
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 44+ messages in thread
From: Suravee Suthikulpanit @ 2016-03-18  6:09 UTC (permalink / raw)
  To: pbonzini, rkrcmar, joro, bp, gleb, alex.williamson
  Cc: kvm, linux-kernel, wei, sherry.hurwitz

From: Radim Krčmář <rkrcmar@redhat.com>

AVIC has a use for kvm_vcpu_wake_up.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Tested-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
 include/linux/kvm_host.h |  1 +
 virt/kvm/kvm_main.c      | 19 +++++++++++++------
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 5276fe0..673749d 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -651,6 +651,7 @@ void kvm_vcpu_mark_page_dirty(struct kvm_vcpu *vcpu, gfn_t gfn);
 void kvm_vcpu_block(struct kvm_vcpu *vcpu);
 void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu);
 void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu);
+void kvm_vcpu_wake_up(struct kvm_vcpu *vcpu);
 void kvm_vcpu_kick(struct kvm_vcpu *vcpu);
 int kvm_vcpu_yield_to(struct kvm_vcpu *target);
 void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 5460325..e8fe787 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2048,13 +2048,8 @@ out:
 EXPORT_SYMBOL_GPL(kvm_vcpu_block);
 
 #ifndef CONFIG_S390
-/*
- * Kick a sleeping VCPU, or a guest VCPU in guest mode, into host kernel mode.
- */
-void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
+void kvm_vcpu_wake_up(struct kvm_vcpu *vcpu)
 {
-	int me;
-	int cpu = vcpu->cpu;
 	struct swait_queue_head *wqp;
 
 	wqp = kvm_arch_vcpu_wq(vcpu);
@@ -2063,6 +2058,18 @@ void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
 		++vcpu->stat.halt_wakeup;
 	}
 
+}
+EXPORT_SYMBOL_GPL(kvm_vcpu_wake_up);
+
+/*
+ * Kick a sleeping VCPU, or a guest VCPU in guest mode, into host kernel mode.
+ */
+void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
+{
+	int me;
+	int cpu = vcpu->cpu;
+
+	kvm_vcpu_wake_up(vcpu);
 	me = get_cpu();
 	if (cpu != me && (unsigned)cpu < nr_cpu_ids && cpu_online(cpu))
 		if (kvm_arch_vcpu_should_kick(vcpu))
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PART1 RFC v3 05/12] svm: Introduce new AVIC VMCB registers
  2016-03-18  6:09 [PART1 RFC v3 00/12] KVM: x86: Introduce SVM AVIC support Suravee Suthikulpanit
                   ` (3 preceding siblings ...)
  2016-03-18  6:09 ` [PART1 RFC v3 04/12] KVM: split kvm_vcpu_wake_up from kvm_vcpu_kick Suravee Suthikulpanit
@ 2016-03-18  6:09 ` Suravee Suthikulpanit
  2016-03-18 10:11   ` Paolo Bonzini
  2016-03-18  6:09 ` [PART1 RFC v3 06/12] KVM: x86: Detect and Initialize AVIC support Suravee Suthikulpanit
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 44+ messages in thread
From: Suravee Suthikulpanit @ 2016-03-18  6:09 UTC (permalink / raw)
  To: pbonzini, rkrcmar, joro, bp, gleb, alex.williamson
  Cc: kvm, linux-kernel, wei, sherry.hurwitz, Suravee Suthikulpanit,
	Suravee Suthikulpanit

Introduce new AVIC VMCB registers.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
 arch/x86/include/asm/svm.h | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 6136d99..66e26a0 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -78,7 +78,8 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
 	u32 exit_int_info;
 	u32 exit_int_info_err;
 	u64 nested_ctl;
-	u8 reserved_4[16];
+	u64 avic_vapic_bar;
+	u8 reserved_4[8];
 	u32 event_inj;
 	u32 event_inj_err;
 	u64 nested_cr3;
@@ -88,7 +89,11 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
 	u64 next_rip;
 	u8 insn_len;
 	u8 insn_bytes[15];
-	u8 reserved_6[800];
+	u64 avic_backing_page;	/* Offset 0xe0 */
+	u8 reserved_6[8];	/* Offset 0xe8 */
+	u64 avic_log_apic_id;	/* Offset 0xf0 */
+	u64 avic_phy_apic_id;	/* Offset 0xf8 */
+	u8 reserved_7[768];
 };
 
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PART1 RFC v3 06/12] KVM: x86: Detect and Initialize AVIC support
  2016-03-18  6:09 [PART1 RFC v3 00/12] KVM: x86: Introduce SVM AVIC support Suravee Suthikulpanit
                   ` (4 preceding siblings ...)
  2016-03-18  6:09 ` [PART1 RFC v3 05/12] svm: Introduce new AVIC VMCB registers Suravee Suthikulpanit
@ 2016-03-18  6:09 ` Suravee Suthikulpanit
  2016-03-18 11:21   ` Paolo Bonzini
  2016-03-18  6:09 ` [PART1 RFC v3 07/12] svm: Add interrupt injection via AVIC Suravee Suthikulpanit
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 44+ messages in thread
From: Suravee Suthikulpanit @ 2016-03-18  6:09 UTC (permalink / raw)
  To: pbonzini, rkrcmar, joro, bp, gleb, alex.williamson
  Cc: kvm, linux-kernel, wei, sherry.hurwitz, Suravee Suthikulpanit,
	Suravee Suthikulpanit

This patch introduces AVIC-related data structure, and AVIC
initialization code.

There are three main data structures for AVIC:
    * Virtual APIC (vAPIC) backing page (per-VCPU)
    * Physical APIC ID table (per-VM)
    * Logical APIC ID table (per-VM)

Currently, AVIC is disabled by default. Users can manually
enable AVIC via kernel boot option kvm-amd.avic=1 or during
kvm-amd module loading with parameter avic=1.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
 arch/x86/include/asm/kvm_host.h |   8 ++
 arch/x86/include/asm/svm.h      |   3 +
 arch/x86/kvm/svm.c              | 232 +++++++++++++++++++++++++++++++++++++++-
 3 files changed, 242 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 87eac2a..2698e40 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -754,6 +754,14 @@ struct kvm_arch {
 
 	bool irqchip_split;
 	u8 nr_reserved_ioapic_pins;
+
+	/* Struct members for AVIC */
+	struct hlist_node hnode;
+	struct kvm *kvm;
+	u32 ldr_mode;
+	u32 avic_tag;
+	struct page *avic_log_apic_id_table_page;
+	struct page *avic_phy_apic_id_table_page;
 };
 
 struct kvm_vm_stat {
diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 66e26a0..287e635 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -116,6 +116,9 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
 #define V_INTR_MASKING_SHIFT 24
 #define V_INTR_MASKING_MASK (1 << V_INTR_MASKING_SHIFT)
 
+#define AVIC_ENABLE_SHIFT 31
+#define AVIC_ENABLE_MASK (1 << AVIC_ENABLE_SHIFT)
+
 #define SVM_INTERRUPT_SHADOW_MASK 1
 
 #define SVM_IOIO_STR_SHIFT 2
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index c13a64b..621c948 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -14,6 +14,9 @@
  * the COPYING file in the top-level directory.
  *
  */
+
+#define pr_fmt(fmt) "SVM: " fmt
+
 #include <linux/kvm_host.h>
 
 #include "irq.h"
@@ -78,6 +81,11 @@ MODULE_DEVICE_TABLE(x86cpu, svm_cpu_id);
 #define TSC_RATIO_MIN		0x0000000000000001ULL
 #define TSC_RATIO_MAX		0x000000ffffffffffULL
 
+#define AVIC_HPA_MASK	~((0xFFFULL << 52) || 0xFFF)
+
+/* NOTE: Current max index allowed for physical APIC ID table is 255 */
+#define AVIC_PHY_APIC_ID_MAX	0xFF
+
 static bool erratum_383_found __read_mostly;
 
 static const u32 host_save_user_msrs[] = {
@@ -162,8 +170,20 @@ struct vcpu_svm {
 
 	/* cached guest cpuid flags for faster access */
 	bool nrips_enabled	: 1;
+
+	struct page *avic_backing_page;
+	u64 *avic_phy_apic_id_cache;
+	bool avic_was_running;
 };
 
+#define AVIC_LOG_APIC_ID__GUEST_PHY_APIC_ID_MSK	(0xFF)
+#define AVIC_LOG_APIC_ID__VALID_MSK		(1 << 31)
+
+#define AVIC_PHY_APIC_ID__HOST_PHY_APIC_ID_MSK	(0xFFULL)
+#define AVIC_PHY_APIC_ID__BACKING_PAGE_MSK	(0xFFFFFFFFFFULL << 12)
+#define AVIC_PHY_APIC_ID__IS_RUN_MSK		(1ULL << 62)
+#define AVIC_PHY_APIC_ID__VALID_MSK		(1ULL << 63)
+
 static DEFINE_PER_CPU(u64, current_tsc_ratio);
 #define TSC_RATIO_DEFAULT	0x0100000000ULL
 
@@ -205,6 +225,10 @@ module_param(npt, int, S_IRUGO);
 static int nested = true;
 module_param(nested, int, S_IRUGO);
 
+/* enable / disable AVIC */
+static int avic;
+module_param(avic, int, S_IRUGO);
+
 static void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0);
 static void svm_flush_tlb(struct kvm_vcpu *vcpu);
 static void svm_complete_interrupts(struct vcpu_svm *svm);
@@ -234,6 +258,18 @@ enum {
 /* TPR and CR2 are always written before VMRUN */
 #define VMCB_ALWAYS_DIRTY_MASK	((1U << VMCB_INTR) | (1U << VMCB_CR2))
 
+#define VMCB_AVIC_APIC_BAR_MASK		0xFFFFFFFFFF000ULL
+
+static inline bool svm_vcpu_avic_enabled(struct vcpu_svm *svm)
+{
+	return (avic && (svm->vmcb->control.int_ctl & AVIC_ENABLE_MASK));
+}
+
+static inline void avic_update_vapic_bar(struct vcpu_svm *svm, u64 data)
+{
+	svm->vmcb->control.avic_vapic_bar = data & VMCB_AVIC_APIC_BAR_MASK;
+}
+
 static inline void mark_all_dirty(struct vmcb *vmcb)
 {
 	vmcb->control.clean = 0;
@@ -923,6 +959,13 @@ static __init int svm_hardware_setup(void)
 	} else
 		kvm_disable_tdp();
 
+	if (avic && (!npt_enabled || !boot_cpu_has(X86_FEATURE_AVIC)))
+		avic = false;
+
+	if (avic) {
+		printk(KERN_INFO "kvm: AVIC enabled\n");
+	}
+
 	return 0;
 
 err:
@@ -1000,6 +1043,24 @@ static void svm_adjust_tsc_offset_guest(struct kvm_vcpu *vcpu, s64 adjustment)
 	mark_dirty(svm->vmcb, VMCB_INTERCEPTS);
 }
 
+static void avic_init_vmcb(struct vcpu_svm *svm)
+{
+	struct vmcb *vmcb = svm->vmcb;
+	struct kvm_arch *vm_data = &svm->vcpu.kvm->arch;
+	phys_addr_t bpa = page_to_phys(svm->avic_backing_page);
+	phys_addr_t lpa = page_to_phys(vm_data->avic_log_apic_id_table_page);
+	phys_addr_t ppa = page_to_phys(vm_data->avic_phy_apic_id_table_page);
+
+	if (!vmcb)
+		return;
+
+	vmcb->control.avic_backing_page = bpa & AVIC_HPA_MASK;
+	vmcb->control.avic_log_apic_id = lpa & AVIC_HPA_MASK;
+	vmcb->control.avic_phy_apic_id = ppa & AVIC_HPA_MASK;
+	vmcb->control.avic_phy_apic_id |= AVIC_PHY_APIC_ID_MAX;
+	vmcb->control.int_ctl |= AVIC_ENABLE_MASK;
+}
+
 static void init_vmcb(struct vcpu_svm *svm)
 {
 	struct vmcb_control_area *control = &svm->vmcb->control;
@@ -1113,6 +1174,139 @@ static void init_vmcb(struct vcpu_svm *svm)
 	mark_all_dirty(svm->vmcb);
 
 	enable_gif(svm);
+
+	if (avic)
+		avic_init_vmcb(svm);
+}
+
+static u64 *avic_get_phy_apic_id_entry(struct kvm_vcpu *vcpu, int index)
+{
+	u64 *avic_phy_ait;
+	struct kvm_arch *vm_data = &vcpu->kvm->arch;
+
+	/* Note: APIC ID = 0xff is used for broadcast.
+	 *       APIC ID > 0xff is reserved.
+	 */
+	if (index >= 0xff)
+		return NULL;
+
+	avic_phy_ait = page_address(vm_data->avic_phy_apic_id_table_page);
+
+	return &avic_phy_ait[index];
+}
+
+static int avic_init_backing_page(struct kvm_vcpu *vcpu)
+{
+	u64 *entry, new_entry;
+	int id = vcpu->vcpu_id;
+	struct vcpu_svm *svm = to_svm(vcpu);
+
+	if (id >= AVIC_PHY_APIC_ID_MAX)
+		return -EINVAL;
+
+	if (!svm->vcpu.arch.apic->regs)
+		return -EINVAL;
+
+	svm->avic_backing_page = virt_to_page(svm->vcpu.arch.apic->regs);
+
+	avic_init_vmcb(svm);
+
+	/* Setting AVIC backing page address in the phy APIC ID table */
+	entry = avic_get_phy_apic_id_entry(vcpu, id);
+	if (!entry)
+		return -EINVAL;
+
+	new_entry = READ_ONCE(*entry);
+	new_entry = (page_to_phys(svm->avic_backing_page) &
+		     AVIC_PHY_APIC_ID__BACKING_PAGE_MSK) |
+		     AVIC_PHY_APIC_ID__VALID_MSK;
+	WRITE_ONCE(*entry, new_entry);
+
+	svm->avic_phy_apic_id_cache = entry;
+
+	return 0;
+}
+
+static void avic_vm_uninit(struct kvm *kvm)
+{
+	unsigned long flags;
+	struct kvm_arch *vm_data = &kvm->arch;
+
+	if (vm_data->avic_log_apic_id_table_page)
+		__free_page(vm_data->avic_log_apic_id_table_page);
+	if (vm_data->avic_phy_apic_id_table_page)
+		__free_page(vm_data->avic_phy_apic_id_table_page);
+}
+
+static void avic_vcpu_uninit(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+
+	if (!avic)
+		return;
+
+	if (svm->avic_phy_apic_id_cache)
+		svm->avic_phy_apic_id_cache = NULL;
+}
+
+static atomic_t avic_tag_gen = ATOMIC_INIT(1);
+
+static inline u32 avic_get_next_tag(void)
+{
+	u32 tag = atomic_read(&avic_tag_gen);
+
+	atomic_inc(&avic_tag_gen);
+	return tag;
+}
+
+static int avic_vm_init(struct kvm *kvm)
+{
+	int err = -ENOMEM;
+	struct kvm_arch *vm_data = &kvm->arch;
+	struct page *p_page;
+	struct page *l_page;
+
+	/**
+	 * Note:
+	 * AVIC hardware walks the nested page table to check permissions,
+	 * but does not use the SPA address specified in the leaf page
+	 * table entry since it uses  address in the AVIC_BACKING_PAGE pointer
+	 * field of the VMCB. Therefore, we set up the
+	 * APIC_ACCESS_PAGE_PRIVATE_MEMSLOT (4KB) here.
+	 */
+	mutex_lock(&kvm->slots_lock);
+	err = __x86_set_memory_region(kvm,
+				      APIC_ACCESS_PAGE_PRIVATE_MEMSLOT,
+				      APIC_DEFAULT_PHYS_BASE,
+				      PAGE_SIZE);
+	mutex_unlock(&kvm->slots_lock);
+	if (err)
+		goto free_avic;
+	kvm->arch.apic_access_page_done = true;
+
+	/* Allocating physical APIC ID table (4KB) */
+	p_page = alloc_page(GFP_KERNEL);
+	if (!p_page)
+		goto free_avic;
+
+	vm_data->avic_phy_apic_id_table_page = p_page;
+	clear_page(page_address(p_page));
+
+	/* Allocating logical APIC ID table (4KB) */
+	l_page = alloc_page(GFP_KERNEL);
+	if (!l_page)
+		goto free_avic;
+
+	vm_data->avic_log_apic_id_table_page = l_page;
+	clear_page(page_address(l_page));
+
+	vm_data->avic_tag = avic_get_next_tag();
+
+	return 0;
+
+free_avic:
+	avic_vm_uninit(kvm);
+	return err;
 }
 
 static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
@@ -1131,6 +1325,9 @@ static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 
 	kvm_cpuid(vcpu, &eax, &dummy, &dummy, &dummy);
 	kvm_register_write(vcpu, VCPU_REGS_RDX, eax);
+
+	if (svm_vcpu_avic_enabled(svm) && !init_event)
+		avic_update_vapic_bar(svm, APIC_DEFAULT_PHYS_BASE);
 }
 
 static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id)
@@ -1169,6 +1366,14 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id)
 	if (!hsave_page)
 		goto free_page3;
 
+	if (avic) {
+		err = avic_init_backing_page(&svm->vcpu);
+		if (err) {
+			avic_vcpu_uninit(&svm->vcpu);
+			goto free_page4;
+		}
+	}
+
 	svm->nested.hsave = page_address(hsave_page);
 
 	svm->msrpm = page_address(msrpm_pages);
@@ -1187,6 +1392,8 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id)
 
 	return &svm->vcpu;
 
+free_page4:
+	__free_page(hsave_page);
 free_page3:
 	__free_pages(nested_msrpm_pages, MSRPM_ALLOC_ORDER);
 free_page2:
@@ -1209,6 +1416,7 @@ static void svm_free_vcpu(struct kvm_vcpu *vcpu)
 	__free_pages(virt_to_page(svm->msrpm), MSRPM_ALLOC_ORDER);
 	__free_page(virt_to_page(svm->nested.hsave));
 	__free_pages(virt_to_page(svm->nested.msrpm), MSRPM_ALLOC_ORDER);
+	avic_vcpu_uninit(vcpu);
 	kvm_vcpu_uninit(vcpu);
 	kmem_cache_free(kvm_vcpu_cache, svm);
 }
@@ -3371,6 +3579,7 @@ static void dump_vmcb(struct kvm_vcpu *vcpu)
 	pr_err("%-20s%08x\n", "exit_int_info_err:", control->exit_int_info_err);
 	pr_err("%-20s%lld\n", "nested_ctl:", control->nested_ctl);
 	pr_err("%-20s%016llx\n", "nested_cr3:", control->nested_cr3);
+	pr_err("%-20s%016llx\n", "avic_vapic_bar:", control->avic_vapic_bar);
 	pr_err("%-20s%08x\n", "event_inj:", control->event_inj);
 	pr_err("%-20s%08x\n", "event_inj_err:", control->event_inj_err);
 	pr_err("%-20s%lld\n", "lbr_ctl:", control->lbr_ctl);
@@ -3602,11 +3811,27 @@ static void svm_set_virtual_x2apic_mode(struct kvm_vcpu *vcpu, bool set)
 
 static bool svm_get_enable_apicv(void)
 {
-	return false;
+	return avic;
 }
 
+static void svm_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr)
+{
+}
+
+static void svm_hwapic_isr_update(struct kvm *kvm, int isr)
+{
+}
+
+/* Note: Currently only used by Hyper-V. */
 static void svm_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
 {
+	struct vcpu_svm *svm = to_svm(vcpu);
+	struct vmcb *vmcb = svm->vmcb;
+
+	if (!avic)
+		return;
+
+	vmcb->control.int_ctl &= ~AVIC_ENABLE_MASK;
 }
 
 static void svm_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)
@@ -4318,6 +4543,9 @@ static struct kvm_x86_ops svm_x86_ops = {
 	.vcpu_free = svm_free_vcpu,
 	.vcpu_reset = svm_vcpu_reset,
 
+	.vm_init = avic_vm_init,
+	.vm_uninit = avic_vm_uninit,
+
 	.prepare_guest_switch = svm_prepare_guest_switch,
 	.vcpu_load = svm_vcpu_load,
 	.vcpu_put = svm_vcpu_put,
@@ -4375,6 +4603,8 @@ static struct kvm_x86_ops svm_x86_ops = {
 	.refresh_apicv_exec_ctrl = svm_refresh_apicv_exec_ctrl,
 	.load_eoi_exitmap = svm_load_eoi_exitmap,
 	.sync_pir_to_irr = svm_sync_pir_to_irr,
+	.hwapic_irr_update = svm_hwapic_irr_update,
+	.hwapic_isr_update = svm_hwapic_isr_update,
 
 	.set_tss_addr = svm_set_tss_addr,
 	.get_tdp_level = get_npt_level,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PART1 RFC v3 07/12] svm: Add interrupt injection via AVIC
  2016-03-18  6:09 [PART1 RFC v3 00/12] KVM: x86: Introduce SVM AVIC support Suravee Suthikulpanit
                   ` (5 preceding siblings ...)
  2016-03-18  6:09 ` [PART1 RFC v3 06/12] KVM: x86: Detect and Initialize AVIC support Suravee Suthikulpanit
@ 2016-03-18  6:09 ` Suravee Suthikulpanit
  2016-03-18 10:22   ` Paolo Bonzini
  2016-03-18  6:09 ` [PART1 RFC v3 08/12] KVM: x86: Add trace events for AVIC Suravee Suthikulpanit
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 44+ messages in thread
From: Suravee Suthikulpanit @ 2016-03-18  6:09 UTC (permalink / raw)
  To: pbonzini, rkrcmar, joro, bp, gleb, alex.williamson
  Cc: kvm, linux-kernel, wei, sherry.hurwitz, Suravee Suthikulpanit,
	Suravee Suthikulpanit

This patch introduces a new mechanism to inject interrupt using AVIC.
Since VINTR is not supported when enable AVIC, we need to inject
interrupt via APIC backing page instead.

This patch also adds support for AVIC doorbell, which is used by
KVM to signal a running vcpu to check IRR for injected interrupts.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
 arch/x86/kvm/svm.c | 46 ++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 44 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 621c948..8e31ad3 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -71,6 +71,8 @@ MODULE_DEVICE_TABLE(x86cpu, svm_cpu_id);
 #define SVM_FEATURE_DECODE_ASSIST  (1 <<  7)
 #define SVM_FEATURE_PAUSE_FILTER   (1 << 10)
 
+#define SVM_AVIC_DOORBELL	0xc001011b
+
 #define NESTED_EXIT_HOST	0	/* Exit handled on host level */
 #define NESTED_EXIT_DONE	1	/* Exit caused nested vmexit  */
 #define NESTED_EXIT_CONTINUE	2	/* Further checks needed      */
@@ -217,6 +219,8 @@ static bool npt_enabled = true;
 static bool npt_enabled;
 #endif
 
+static struct kvm_x86_ops svm_x86_ops;
+
 /* allow nested paging (virtualized MMU) for all guests */
 static int npt = true;
 module_param(npt, int, S_IRUGO);
@@ -291,6 +295,18 @@ static inline struct vcpu_svm *to_svm(struct kvm_vcpu *vcpu)
 	return container_of(vcpu, struct vcpu_svm, vcpu);
 }
 
+static inline bool avic_vcpu_is_running(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+	u64 *entry;
+
+	entry = svm->avic_phy_apic_id_cache;
+	if (!entry)
+		return false;
+
+	return (READ_ONCE(*entry) & AVIC_PHY_APIC_ID__IS_RUN_MSK);
+}
+
 static void recalc_intercepts(struct vcpu_svm *svm)
 {
 	struct vmcb_control_area *c, *h;
@@ -964,6 +980,8 @@ static __init int svm_hardware_setup(void)
 
 	if (avic) {
 		printk(KERN_INFO "kvm: AVIC enabled\n");
+	} else {
+		svm_x86_ops.deliver_posted_interrupt = NULL;
 	}
 
 	return 0;
@@ -2877,8 +2895,10 @@ static int clgi_interception(struct vcpu_svm *svm)
 	disable_gif(svm);
 
 	/* After a CLGI no interrupts should come */
-	svm_clear_vintr(svm);
-	svm->vmcb->control.int_ctl &= ~V_IRQ_MASK;
+	if (!svm_vcpu_avic_enabled(svm)) {
+		svm_clear_vintr(svm);
+		svm->vmcb->control.int_ctl &= ~V_IRQ_MASK;
+	}
 
 	mark_dirty(svm->vmcb, VMCB_INTR);
 
@@ -3453,6 +3473,9 @@ static int msr_interception(struct vcpu_svm *svm)
 
 static int interrupt_window_interception(struct vcpu_svm *svm)
 {
+	if (svm_vcpu_avic_enabled(svm))
+		BUG();
+
 	kvm_make_request(KVM_REQ_EVENT, &svm->vcpu);
 	svm_clear_vintr(svm);
 	svm->vmcb->control.int_ctl &= ~V_IRQ_MASK;
@@ -3767,6 +3790,7 @@ static inline void svm_inject_irq(struct vcpu_svm *svm, int irq)
 {
 	struct vmcb_control_area *control;
 
+	/* The following fields are ignored when AVIC is enabled */
 	control = &svm->vmcb->control;
 	control->int_vector = irq;
 	control->int_ctl &= ~V_INTR_PRIO_MASK;
@@ -3844,6 +3868,20 @@ static void svm_sync_pir_to_irr(struct kvm_vcpu *vcpu)
 	return;
 }
 
+static void svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+
+	kvm_lapic_set_vector(vec, svm->vcpu.arch.apic->regs + APIC_IRR);
+	smp_mb__after_atomic();
+
+	if (avic_vcpu_is_running(vcpu))
+		wrmsrl(SVM_AVIC_DOORBELL,
+		       __default_cpu_present_to_apicid(vcpu->cpu));
+	else
+		kvm_vcpu_wake_up(vcpu);
+}
+
 static int svm_nmi_allowed(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
@@ -3904,6 +3942,9 @@ static void enable_irq_window(struct kvm_vcpu *vcpu)
 	 * get that intercept, this function will be called again though and
 	 * we'll get the vintr intercept.
 	 */
+	if (svm_vcpu_avic_enabled(svm))
+		return;
+
 	if (gif_set(svm) && nested_svm_intr(svm)) {
 		svm_set_vintr(svm);
 		svm_inject_irq(svm, 0x0);
@@ -4638,6 +4679,7 @@ static struct kvm_x86_ops svm_x86_ops = {
 	.sched_in = svm_sched_in,
 
 	.pmu_ops = &amd_pmu_ops,
+	.deliver_posted_interrupt = svm_deliver_avic_intr,
 };
 
 static int __init svm_init(void)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PART1 RFC v3 08/12] KVM: x86: Add trace events for AVIC
  2016-03-18  6:09 [PART1 RFC v3 00/12] KVM: x86: Introduce SVM AVIC support Suravee Suthikulpanit
                   ` (6 preceding siblings ...)
  2016-03-18  6:09 ` [PART1 RFC v3 07/12] svm: Add interrupt injection via AVIC Suravee Suthikulpanit
@ 2016-03-18  6:09 ` Suravee Suthikulpanit
  2016-03-18 10:24   ` Paolo Bonzini
  2016-03-18  6:09 ` [PART1 RFC v3 09/12] svm: Add VMEXIT handlers " Suravee Suthikulpanit
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 44+ messages in thread
From: Suravee Suthikulpanit @ 2016-03-18  6:09 UTC (permalink / raw)
  To: pbonzini, rkrcmar, joro, bp, gleb, alex.williamson
  Cc: kvm, linux-kernel, wei, sherry.hurwitz, Suravee Suthikulpanit,
	Suravee Suthikulpanit

Introduce trace events for AMD AVIC incomplete IPI vmexit, and
AVIC unaccelerate access vmexit.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
 arch/x86/kvm/trace.h | 57 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/x86.c   |  2 ++
 2 files changed, 59 insertions(+)

diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index ad9f6a2..3c85a3d 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -1288,6 +1288,63 @@ TRACE_EVENT(kvm_hv_stimer_cleanup,
 		  __entry->vcpu_id, __entry->timer_index)
 );
 
+/*
+ * Tracepoint for AMD AVIC
+ */
+TRACE_EVENT(kvm_avic_incomplete_ipi,
+	    TP_PROTO(u32 vcpu, u32 icrh, u32 icrl, u32 id, u32 index),
+	    TP_ARGS(vcpu, icrh, icrl, id, index),
+
+	TP_STRUCT__entry(
+		__field(u32, vcpu)
+		__field(u32, icrh)
+		__field(u32, icrl)
+		__field(u32, id)
+		__field(u32, index)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu = vcpu;
+		__entry->icrh = icrh;
+		__entry->icrl = icrl;
+		__entry->id = id;
+		__entry->index = index;
+	),
+
+	TP_printk("vcpu=%#x, icrh:icrl=%#010x:%08x, id=%u, index=%u\n",
+		  __entry->vcpu, __entry->icrh, __entry->icrl,
+		  __entry->id, __entry->index)
+);
+
+TRACE_EVENT(kvm_avic_unaccelerated_access,
+	    TP_PROTO(u32 vcpu, u32 offset, bool ft, bool rw, u32 vec),
+	    TP_ARGS(vcpu, offset, ft, rw, vec),
+
+	TP_STRUCT__entry(
+		__field(u32, vcpu)
+		__field(u32, offset)
+		__field(bool, ft)
+		__field(bool, rw)
+		__field(u32, vec)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu = vcpu;
+		__entry->offset = offset;
+		__entry->ft = ft;
+		__entry->rw = rw;
+		__entry->vec = vec;
+	),
+
+	TP_printk("vcpu=%#x, offset=%#x(%s), %s, %s, vec=%#x\n",
+		  __entry->vcpu,
+		  __entry->offset,
+		  __print_symbolic(__entry->offset, kvm_trace_symbol_apic),
+		  __entry->ft ? "trap" : "fault",
+		  __entry->rw ? "write" : "read",
+		  __entry->vec)
+);
+
 #endif /* _TRACE_KVM_H */
 
 #undef TRACE_INCLUDE_PATH
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4d2961d..775de1c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8395,3 +8395,5 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_write_tsc_offset);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_ple_window);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_pml_full);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_pi_irte_update);
+EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_unaccelerated_access);
+EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_incomplete_ipi);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PART1 RFC v3 09/12] svm: Add VMEXIT handlers for AVIC
  2016-03-18  6:09 [PART1 RFC v3 00/12] KVM: x86: Introduce SVM AVIC support Suravee Suthikulpanit
                   ` (7 preceding siblings ...)
  2016-03-18  6:09 ` [PART1 RFC v3 08/12] KVM: x86: Add trace events for AVIC Suravee Suthikulpanit
@ 2016-03-18  6:09 ` Suravee Suthikulpanit
  2016-03-18 11:11   ` Paolo Bonzini
  2016-03-18  6:09 ` [PART1 RFC v3 10/12] svm: Do not expose x2APIC when enable AVIC Suravee Suthikulpanit
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 44+ messages in thread
From: Suravee Suthikulpanit @ 2016-03-18  6:09 UTC (permalink / raw)
  To: pbonzini, rkrcmar, joro, bp, gleb, alex.williamson
  Cc: kvm, linux-kernel, wei, sherry.hurwitz, Suravee Suthikulpanit

From: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>

Introduce VMEXIT handlers, avic_incp_ipi_interception() and
avic_noaccel_interception().

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
 arch/x86/include/uapi/asm/svm.h |   9 +-
 arch/x86/kvm/lapic.h            |   3 +
 arch/x86/kvm/svm.c              | 244 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 255 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
index 8a4add8..131b0b9 100644
--- a/arch/x86/include/uapi/asm/svm.h
+++ b/arch/x86/include/uapi/asm/svm.h
@@ -73,6 +73,8 @@
 #define SVM_EXIT_MWAIT_COND    0x08c
 #define SVM_EXIT_XSETBV        0x08d
 #define SVM_EXIT_NPF           0x400
+#define SVM_EXIT_AVIC_INCOMPLETE_IPI		0x401
+#define SVM_EXIT_AVIC_UNACCELERATED_ACCESS	0x402
 
 #define SVM_EXIT_ERR           -1
 
@@ -107,8 +109,10 @@
 	{ SVM_EXIT_SMI,         "smi" }, \
 	{ SVM_EXIT_INIT,        "init" }, \
 	{ SVM_EXIT_VINTR,       "vintr" }, \
+	{ SVM_EXIT_CR0_SEL_WRITE, "cr0_sec_write" }, \
 	{ SVM_EXIT_CPUID,       "cpuid" }, \
 	{ SVM_EXIT_INVD,        "invd" }, \
+	{ SVM_EXIT_PAUSE,       "pause" }, \
 	{ SVM_EXIT_HLT,         "hlt" }, \
 	{ SVM_EXIT_INVLPG,      "invlpg" }, \
 	{ SVM_EXIT_INVLPGA,     "invlpga" }, \
@@ -127,7 +131,10 @@
 	{ SVM_EXIT_MONITOR,     "monitor" }, \
 	{ SVM_EXIT_MWAIT,       "mwait" }, \
 	{ SVM_EXIT_XSETBV,      "xsetbv" }, \
-	{ SVM_EXIT_NPF,         "npf" }
+	{ SVM_EXIT_NPF,         "npf" }, \
+	{ SVM_EXIT_RSM,         "rsm" }, \
+	{ SVM_EXIT_AVIC_INCOMPLETE_IPI,		"avic_incomplete_ipi" }, \
+	{ SVM_EXIT_AVIC_UNACCELERATED_ACCESS,   "avic_unaccelerated_access" }
 
 
 #endif /* _UAPI__SVM_H */
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 7bf8184..ed23af0 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -9,6 +9,9 @@
 #define KVM_APIC_SIPI		1
 #define KVM_APIC_LVT_NUM	6
 
+#define KVM_APIC_SHORT_MASK	0xc0000
+#define KVM_APIC_DEST_MASK	0x800
+
 struct kvm_timer {
 	struct hrtimer timer;
 	s64 period; 				/* unit: ns */
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 8e31ad3..6303147 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3508,6 +3508,248 @@ static int mwait_interception(struct vcpu_svm *svm)
 	return nop_interception(svm);
 }
 
+enum avic_ipi_failure_cause {
+	AVIC_IPI_FAILURE_INVALID_INT_TYPE,
+	AVIC_IPI_FAILURE_TARGET_NOT_RUNNING,
+	AVIC_IPI_FAILURE_INVALID_TARGET,
+	AVIC_IPI_FAILURE_INVALID_BACKING_PAGE,
+};
+
+static int avic_incomplete_ipi_interception(struct vcpu_svm *svm)
+{
+	u32 icrh = svm->vmcb->control.exit_info_1 >> 32;
+	u32 icrl = svm->vmcb->control.exit_info_1;
+	u32 id = svm->vmcb->control.exit_info_2 >> 32;
+	u32 index = svm->vmcb->control.exit_info_2 && 0xFF;
+	struct kvm_lapic *apic = svm->vcpu.arch.apic;
+
+	trace_kvm_avic_incomplete_ipi(svm->vcpu.vcpu_id, icrh, icrl, id, index);
+
+	switch (id) {
+	case AVIC_IPI_FAILURE_INVALID_INT_TYPE:
+		/*
+		 * AVIC hardware handles the generation of
+		 * IPIs when the specified Message Type is Fixed
+		 * (also known as fixed delivery mode) and
+		 * the Trigger Mode is edge-triggered. The hardware
+		 * also supports self and broadcast delivery modes
+		 * specified via the Destination Shorthand(DSH)
+		 * field of the ICRL. Logical and physical APIC ID
+		 * formats are supported. All other IPI types cause
+		 * a #VMEXIT, which needs to emulated.
+		 */
+		kvm_lapic_reg_write(apic, APIC_ICR2, icrh);
+		kvm_lapic_reg_write(apic, APIC_ICR, icrl);
+		break;
+	case AVIC_IPI_FAILURE_TARGET_NOT_RUNNING: {
+		int i;
+		struct kvm_vcpu *vcpu;
+		struct kvm *kvm = svm->vcpu.kvm;
+		struct kvm_lapic *apic = svm->vcpu.arch.apic;
+
+		/*
+		 * At this point, we expect that the AVIC HW has already
+		 * set the appropriate IRR bits on the valid target
+		 * vcpus. So, we just need to kick the appropriate vcpu.
+		 */
+		kvm_for_each_vcpu(i, vcpu, kvm) {
+			bool m = kvm_apic_match_dest(vcpu, apic,
+						     icrl & KVM_APIC_SHORT_MASK,
+						     GET_APIC_DEST_FIELD(icrh),
+						     icrl & KVM_APIC_DEST_MASK);
+
+			if (m && !avic_vcpu_is_running(vcpu))
+				kvm_vcpu_wake_up(vcpu);
+		}
+		break;
+	}
+	case AVIC_IPI_FAILURE_INVALID_TARGET:
+		break;
+	case AVIC_IPI_FAILURE_INVALID_BACKING_PAGE:
+		WARN_ONCE(1, "Invalid backing page\n");
+		break;
+	default:
+		pr_err("Unknown IPI interception\n");
+	}
+
+	return 1;
+}
+
+static u32 *avic_get_log_apic_id_entry(struct kvm_vcpu *vcpu, u8 mda, bool flat)
+{
+	struct kvm_arch *vm_data = &vcpu->kvm->arch;
+	int index;
+	u32 *avic_log_ait;
+
+	if (flat) { /* flat */
+		if (mda > 7)
+			return NULL;
+		index = mda;
+	} else { /* cluster */
+		int apic_id = mda & 0xf;
+		int cluster_id = (mda & 0xf0) >> 8;
+
+		if (apic_id > 4 || cluster_id >= 0xf)
+			return NULL;
+		index = (cluster_id << 2) + apic_id;
+	}
+	avic_log_ait = (u32 *) page_address(vm_data->avic_log_apic_id_table_page);
+
+	return &avic_log_ait[index];
+}
+
+static int avic_handle_ldr_write(struct kvm_vcpu *vcpu, u8 g_phy_apic_id,
+				 u8 log_apic_id)
+{
+	u32 mod;
+	u32 *entry;
+	struct vcpu_svm *svm = to_svm(vcpu);
+
+	if (!svm)
+		return -EINVAL;
+
+	mod = (kvm_apic_get_reg(svm->vcpu.arch.apic, APIC_DFR) >> 28) & 0xf;
+	entry = avic_get_log_apic_id_entry(vcpu, log_apic_id, (mod == 0xf));
+	if (!entry)
+		return -EINVAL;
+
+	*entry &= ~AVIC_LOG_APIC_ID__GUEST_PHY_APIC_ID_MSK;
+	*entry |= (g_phy_apic_id & AVIC_LOG_APIC_ID__GUEST_PHY_APIC_ID_MSK);
+	*entry |= AVIC_LOG_APIC_ID__VALID_MSK;
+
+	return 0;
+}
+
+static int avic_unaccel_trap_write(struct vcpu_svm *svm)
+{
+	u32 offset = svm->vmcb->control.exit_info_1 & 0xFF0;
+	struct kvm_lapic *apic = svm->vcpu.arch.apic;
+	u32 reg = kvm_apic_get_reg(apic, offset);
+
+	switch (offset) {
+	case APIC_ID: {
+		u32 aid = (reg >> 24) & 0xff;
+		u64 *o_ent = avic_get_phy_apic_id_entry(&svm->vcpu,
+							svm->vcpu.vcpu_id);
+		u64 *n_ent = avic_get_phy_apic_id_entry(&svm->vcpu, aid);
+
+		if (!n_ent || !o_ent)
+			return 0;
+
+		/* We need to move phy_apic_entry to new offset */
+		*n_ent = *o_ent;
+		*((u64 *)o_ent) = 0ULL;
+		svm->avic_phy_apic_id_cache = n_ent;
+		break;
+	}
+	case APIC_LDR: {
+		int ret, lid;
+		int dlid = (reg >> 24) & 0xff;
+
+		if (!dlid)
+			return 0;
+
+		lid = ffs(dlid) - 1;
+		ret = avic_handle_ldr_write(&svm->vcpu, svm->vcpu.vcpu_id, lid);
+		if (ret)
+			return 0;
+
+		break;
+	}
+	case APIC_DFR: {
+		struct kvm_arch *vm_data = &svm->vcpu.kvm->arch;
+		u32 mod = (reg >> 28) & 0xf;
+
+		/*
+		 * We assume that all local APICs are using the same type.
+		 * If this changes, we need to rebuild the AVIC logical
+		 * APID id table with subsequent write to APIC_LDR.
+		 */
+		if (vm_data->ldr_mode != mod) {
+			clear_page(page_address(vm_data->avic_log_apic_id_table_page));
+			vm_data->ldr_mode = mod;
+		}
+		break;
+	}
+	default:
+		break;
+	}
+
+	kvm_lapic_reg_write(apic, offset, reg);
+
+	return 1;
+}
+
+static bool is_avic_unaccelerated_access_trap(u32 offset)
+{
+	bool ret = false;
+
+	switch (offset) {
+	case APIC_ID:
+	case APIC_EOI:
+	case APIC_RRR:
+	case APIC_LDR:
+	case APIC_DFR:
+	case APIC_SPIV:
+	case APIC_ESR:
+	case APIC_ICR:
+	case APIC_LVTT:
+	case APIC_LVTTHMR:
+	case APIC_LVTPC:
+	case APIC_LVT0:
+	case APIC_LVT1:
+	case APIC_LVTERR:
+	case APIC_TMICT:
+	case APIC_TDCR:
+		ret = true;
+		break;
+	default:
+		break;
+	}
+	return ret;
+}
+
+#define AVIC_UNACCEL_ACCESS_WRITE_MSK		1
+#define AVIC_UNACCEL_ACCESS_OFFSET_MSK		0xFF0
+#define AVIC_UNACCEL_ACCESS_VECTOR_MSK		0xFFFFFFFF
+
+static int avic_unaccelerated_access_interception(struct vcpu_svm *svm)
+{
+	int ret = 0;
+	u32 offset = svm->vmcb->control.exit_info_1 &
+		     AVIC_UNACCEL_ACCESS_OFFSET_MSK;
+	u32 vector = svm->vmcb->control.exit_info_2 &
+		     AVIC_UNACCEL_ACCESS_VECTOR_MSK;
+	bool write = (svm->vmcb->control.exit_info_1 >> 32) &
+		     AVIC_UNACCEL_ACCESS_WRITE_MSK;
+	bool trap = is_avic_unaccelerated_access_trap(offset);
+
+	trace_kvm_avic_unaccelerated_access(svm->vcpu.vcpu_id, offset,
+					    trap, write, vector);
+
+	/**
+	 * AVIC does not support x2APIC registers, and we only advertise
+	 * xAPIC when enable AVIC. Therefore, access to these registers
+	 * will not be supported.
+	 */
+	if (offset >= 0x400) {
+		WARN(1, "Unsupported APIC offset %#x\n", offset);
+		return ret;
+	}
+
+	if (trap) {
+		/* Handling Trap */
+		if (!write) /* Trap read should never happens */
+			BUG();
+		ret = avic_unaccel_trap_write(svm);
+	} else {
+		/* Handling Fault */
+		ret = (emulate_instruction(&svm->vcpu, 0) == EMULATE_DONE);
+	}
+
+	return ret;
+}
+
 static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = {
 	[SVM_EXIT_READ_CR0]			= cr_interception,
 	[SVM_EXIT_READ_CR3]			= cr_interception,
@@ -3571,6 +3813,8 @@ static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = {
 	[SVM_EXIT_XSETBV]			= xsetbv_interception,
 	[SVM_EXIT_NPF]				= pf_interception,
 	[SVM_EXIT_RSM]                          = emulate_on_interception,
+	[SVM_EXIT_AVIC_INCOMPLETE_IPI]		= avic_incomplete_ipi_interception,
+	[SVM_EXIT_AVIC_UNACCELERATED_ACCESS]	= avic_unaccelerated_access_interception,
 };
 
 static void dump_vmcb(struct kvm_vcpu *vcpu)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PART1 RFC v3 10/12] svm: Do not expose x2APIC when enable AVIC
  2016-03-18  6:09 [PART1 RFC v3 00/12] KVM: x86: Introduce SVM AVIC support Suravee Suthikulpanit
                   ` (8 preceding siblings ...)
  2016-03-18  6:09 ` [PART1 RFC v3 09/12] svm: Add VMEXIT handlers " Suravee Suthikulpanit
@ 2016-03-18  6:09 ` Suravee Suthikulpanit
  2016-03-18 20:59   ` Radim Krčmář
  2016-03-18  6:09 ` [PART1 RFC v3 11/12] svm: Do not intercept CR8 " Suravee Suthikulpanit
  2016-03-18  6:09 ` [PART1 RFC v3 12/12] svm: Manage vcpu load/unload " Suravee Suthikulpanit
  11 siblings, 1 reply; 44+ messages in thread
From: Suravee Suthikulpanit @ 2016-03-18  6:09 UTC (permalink / raw)
  To: pbonzini, rkrcmar, joro, bp, gleb, alex.williamson
  Cc: kvm, linux-kernel, wei, sherry.hurwitz, Suravee Suthikulpanit

From: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>

Since AVIC only virtualizes xAPIC hardware for the guest, we need to:
    * Intercept APIC BAR msr accesses to disable x2APIC
    * Intercept CPUID access to not advertise x2APIC support
    * Hide x2APIC support when checking via KVM ioctl

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
 arch/x86/kvm/svm.c | 49 ++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 42 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 6303147..ba84d57 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -209,6 +209,7 @@ static const struct svm_direct_access_msrs {
 	{ .index = MSR_IA32_LASTBRANCHTOIP,		.always = false },
 	{ .index = MSR_IA32_LASTINTFROMIP,		.always = false },
 	{ .index = MSR_IA32_LASTINTTOIP,		.always = false },
+	{ .index = MSR_IA32_APICBASE,			.always = false },
 	{ .index = MSR_INVALID,				.always = false },
 };
 
@@ -841,7 +842,7 @@ static void set_msr_interception(u32 *msrpm, unsigned msr,
 	msrpm[offset] = tmp;
 }
 
-static void svm_vcpu_init_msrpm(u32 *msrpm)
+static void svm_vcpu_init_msrpm(struct vcpu_svm *svm, u32 *msrpm)
 {
 	int i;
 
@@ -853,6 +854,9 @@ static void svm_vcpu_init_msrpm(u32 *msrpm)
 
 		set_msr_interception(msrpm, direct_access_msrs[i].index, 1, 1);
 	}
+
+	if (svm_vcpu_avic_enabled(svm))
+		set_msr_interception(msrpm, MSR_IA32_APICBASE, 1, 1);
 }
 
 static void add_msr_offset(u32 offset)
@@ -1394,18 +1398,18 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id)
 
 	svm->nested.hsave = page_address(hsave_page);
 
-	svm->msrpm = page_address(msrpm_pages);
-	svm_vcpu_init_msrpm(svm->msrpm);
-
-	svm->nested.msrpm = page_address(nested_msrpm_pages);
-	svm_vcpu_init_msrpm(svm->nested.msrpm);
-
 	svm->vmcb = page_address(page);
 	clear_page(svm->vmcb);
 	svm->vmcb_pa = page_to_pfn(page) << PAGE_SHIFT;
 	svm->asid_generation = 0;
 	init_vmcb(svm);
 
+	svm->msrpm = page_address(msrpm_pages);
+	svm_vcpu_init_msrpm(svm, svm->msrpm);
+
+	svm->nested.msrpm = page_address(nested_msrpm_pages);
+	svm_vcpu_init_msrpm(svm, svm->nested.msrpm);
+
 	svm_init_osvw(&svm->vcpu);
 
 	return &svm->vcpu;
@@ -3308,6 +3312,18 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 			msr_info->data = 0x1E;
 		}
 		break;
+	case MSR_IA32_APICBASE:
+		if (svm_vcpu_avic_enabled(svm)) {
+			/* Note:
+			 * For AVIC, we need to disable X2APIC
+			 * and enable XAPIC
+			 */
+			kvm_get_msr_common(vcpu, msr_info);
+			msr_info->data &= ~X2APIC_ENABLE;
+			msr_info->data |= XAPIC_ENABLE;
+			break;
+		}
+		/* Follow through if not AVIC */
 	default:
 		return kvm_get_msr_common(vcpu, msr_info);
 	}
@@ -3436,6 +3452,10 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
 	case MSR_VM_IGNNE:
 		vcpu_unimpl(vcpu, "unimplemented wrmsr: 0x%x data 0x%llx\n", ecx, data);
 		break;
+	case MSR_IA32_APICBASE:
+		if (svm_vcpu_avic_enabled(svm))
+			avic_update_vapic_bar(to_svm(vcpu), data);
+		/* Follow through */
 	default:
 		return kvm_set_msr_common(vcpu, msr);
 	}
@@ -4554,11 +4574,26 @@ static void svm_cpuid_update(struct kvm_vcpu *vcpu)
 
 	/* Update nrips enabled cache */
 	svm->nrips_enabled = !!guest_cpuid_has_nrips(&svm->vcpu);
+
+	/* Do not support X2APIC when enable AVIC */
+	if (svm_vcpu_avic_enabled(svm)) {
+		int i;
+
+		for (i = 0 ; i < vcpu->arch.cpuid_nent ; i++) {
+			if (vcpu->arch.cpuid_entries[i].function == 1)
+				vcpu->arch.cpuid_entries[i].ecx &= ~(1 << 21);
+		}
+	}
 }
 
 static void svm_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 *entry)
 {
 	switch (func) {
+	case 0x00000001:
+		/* Do not support X2APIC when enable AVIC */
+		if (avic)
+			entry->ecx &= ~(1 << 21);
+		break;
 	case 0x80000001:
 		if (nested)
 			entry->ecx |= (1 << 2); /* Set SVM bit */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PART1 RFC v3 11/12] svm: Do not intercept CR8 when enable AVIC
  2016-03-18  6:09 [PART1 RFC v3 00/12] KVM: x86: Introduce SVM AVIC support Suravee Suthikulpanit
                   ` (9 preceding siblings ...)
  2016-03-18  6:09 ` [PART1 RFC v3 10/12] svm: Do not expose x2APIC when enable AVIC Suravee Suthikulpanit
@ 2016-03-18  6:09 ` Suravee Suthikulpanit
  2016-03-18 21:10   ` Radim Krčmář
  2016-03-18  6:09 ` [PART1 RFC v3 12/12] svm: Manage vcpu load/unload " Suravee Suthikulpanit
  11 siblings, 1 reply; 44+ messages in thread
From: Suravee Suthikulpanit @ 2016-03-18  6:09 UTC (permalink / raw)
  To: pbonzini, rkrcmar, joro, bp, gleb, alex.williamson
  Cc: kvm, linux-kernel, wei, sherry.hurwitz, Suravee Suthikulpanit,
	Suravee Suthikulpanit

When enable AVIC:
    * Do not intercept CR8 since this should be handled by AVIC HW.
    * Also update TPR in APIC backing page when syncing CR8 before VMRUN

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
 arch/x86/kvm/svm.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index ba84d57..d5418c3 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -984,6 +984,8 @@ static __init int svm_hardware_setup(void)
 
 	if (avic) {
 		printk(KERN_INFO "kvm: AVIC enabled\n");
+
+		svm_x86_ops.update_cr8_intercept = NULL;
 	} else {
 		svm_x86_ops.deliver_posted_interrupt = NULL;
 	}
@@ -1097,7 +1099,8 @@ static void init_vmcb(struct vcpu_svm *svm)
 	set_cr_intercept(svm, INTERCEPT_CR0_WRITE);
 	set_cr_intercept(svm, INTERCEPT_CR3_WRITE);
 	set_cr_intercept(svm, INTERCEPT_CR4_WRITE);
-	set_cr_intercept(svm, INTERCEPT_CR8_WRITE);
+	if (!svm_vcpu_avic_enabled(svm))
+		set_cr_intercept(svm, INTERCEPT_CR8_WRITE);
 
 	set_dr_intercepts(svm);
 
@@ -4080,7 +4083,8 @@ static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 
-	if (is_guest_mode(vcpu) && (vcpu->arch.hflags & HF_VINTR_MASK))
+	if ((is_guest_mode(vcpu) && (vcpu->arch.hflags & HF_VINTR_MASK)) ||
+	     svm_vcpu_avic_enabled(svm))
 		return;
 
 	clr_cr_intercept(svm, INTERCEPT_CR8_WRITE);
@@ -4271,9 +4275,12 @@ static inline void sync_lapic_to_cr8(struct kvm_vcpu *vcpu)
 	if (is_guest_mode(vcpu) && (vcpu->arch.hflags & HF_VINTR_MASK))
 		return;
 
-	cr8 = kvm_get_cr8(vcpu);
+	cr8 = kvm_get_cr8(vcpu) & V_TPR_MASK;
 	svm->vmcb->control.int_ctl &= ~V_TPR_MASK;
-	svm->vmcb->control.int_ctl |= cr8 & V_TPR_MASK;
+	svm->vmcb->control.int_ctl |= cr8;
+
+	if (svm_vcpu_avic_enabled(svm))
+		kvm_lapic_set_reg(svm->vcpu.arch.apic, APIC_TASKPRI, (u32)cr8 << 4);
 }
 
 static void svm_complete_interrupts(struct vcpu_svm *svm)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PART1 RFC v3 12/12] svm: Manage vcpu load/unload when enable AVIC
  2016-03-18  6:09 [PART1 RFC v3 00/12] KVM: x86: Introduce SVM AVIC support Suravee Suthikulpanit
                   ` (10 preceding siblings ...)
  2016-03-18  6:09 ` [PART1 RFC v3 11/12] svm: Do not intercept CR8 " Suravee Suthikulpanit
@ 2016-03-18  6:09 ` Suravee Suthikulpanit
  2016-03-18 21:44   ` Radim Krčmář
  11 siblings, 1 reply; 44+ messages in thread
From: Suravee Suthikulpanit @ 2016-03-18  6:09 UTC (permalink / raw)
  To: pbonzini, rkrcmar, joro, bp, gleb, alex.williamson
  Cc: kvm, linux-kernel, wei, sherry.hurwitz, Suravee Suthikulpanit

From: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>

When a vcpu is loaded/unloaded to a physical core, we need to update
host physical APIC ID information in the Physical APIC-ID table
accordingly.

Also, when vCPU is blocking/un-blocking (due to halt instruction),
we need to make sure that the is-running bit in set accordingly in the
physical APIC-ID table.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
 arch/x86/kvm/svm.c | 121 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 121 insertions(+)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index d5418c3..c5e8100 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -35,6 +35,7 @@
 #include <linux/trace_events.h>
 #include <linux/slab.h>
 
+#include <asm/apic.h>
 #include <asm/perf_event.h>
 #include <asm/tlbflush.h>
 #include <asm/desc.h>
@@ -1334,6 +1335,110 @@ free_avic:
 	return err;
 }
 
+static int avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu, bool is_load)
+{
+	int h_phy_apic_id;
+	u64 *entry, new_entry;
+	struct vcpu_svm *svm = to_svm(vcpu);
+	int ret = 0;
+
+	if (!svm_vcpu_avic_enabled(svm))
+		return 0;
+
+	if (!svm)
+		return -EINVAL;
+
+	/* Note: APIC ID = 0xff is used for broadcast.
+	 *       APIC ID > 0xff is reserved.
+	 */
+	h_phy_apic_id = __default_cpu_present_to_apicid(cpu);
+
+	if (h_phy_apic_id >= AVIC_PHY_APIC_ID_MAX)
+		return -EINVAL;
+
+	entry = svm->avic_phy_apic_id_cache;
+	if (!entry)
+		return -EINVAL;
+
+	if (is_load) {
+		new_entry = READ_ONCE(*entry);
+
+		BUG_ON(new_entry & AVIC_PHY_APIC_ID__IS_RUN_MSK);
+
+		new_entry &= ~AVIC_PHY_APIC_ID__HOST_PHY_APIC_ID_MSK;
+		new_entry |= (h_phy_apic_id & AVIC_PHY_APIC_ID__HOST_PHY_APIC_ID_MSK);
+
+		/**
+		 * Restore AVIC running flag if it was set during
+		 * vcpu unload.
+		 */
+		if (svm->avic_was_running)
+			new_entry |= AVIC_PHY_APIC_ID__IS_RUN_MSK;
+		else
+			new_entry &= ~AVIC_PHY_APIC_ID__IS_RUN_MSK;
+
+		WRITE_ONCE(*entry, new_entry);
+
+	} else {
+		new_entry = READ_ONCE(*entry);
+
+		/**
+		 * This handles the case when vcpu is scheduled out
+		 * and has not yet not called blocking. We save the
+		 * AVIC running flag so that we can restore later.
+		 */
+		if (new_entry & AVIC_PHY_APIC_ID__IS_RUN_MSK) {
+			svm->avic_was_running = true;
+			new_entry &= ~AVIC_PHY_APIC_ID__IS_RUN_MSK;
+			WRITE_ONCE(*entry, new_entry);
+		} else {
+			svm->avic_was_running = false;
+		}
+	}
+
+	return ret;
+}
+
+/**
+ * This function is called during VCPU halt/unhalt.
+ */
+static int avic_set_running(struct kvm_vcpu *vcpu, bool is_run)
+{
+	int ret = 0;
+	int h_phy_apic_id;
+	u64 *entry, new_entry;
+	struct vcpu_svm *svm = to_svm(vcpu);
+
+	if (!svm_vcpu_avic_enabled(svm))
+		return 0;
+
+	/* Note: APIC ID = 0xff is used for broadcast.
+	 *       APIC ID > 0xff is reserved.
+	 */
+	h_phy_apic_id = __default_cpu_present_to_apicid(vcpu->cpu);
+
+	if (h_phy_apic_id >= AVIC_PHY_APIC_ID_MAX)
+		return -EINVAL;
+
+	entry = svm->avic_phy_apic_id_cache;
+	if (!entry)
+		return -EINVAL;
+
+	if (is_run) {
+		/* Handle vcpu unblocking after HLT */
+		new_entry = READ_ONCE(*entry);
+		new_entry |= AVIC_PHY_APIC_ID__IS_RUN_MSK;
+		WRITE_ONCE(*entry, new_entry);
+	} else {
+		/* Handle vcpu blocking due to HLT */
+		new_entry = READ_ONCE(*entry);
+		new_entry &= ~AVIC_PHY_APIC_ID__IS_RUN_MSK;
+		WRITE_ONCE(*entry, new_entry);
+	}
+
+	return ret;
+}
+
 static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
@@ -1476,6 +1581,8 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	/* This assumes that the kernel never uses MSR_TSC_AUX */
 	if (static_cpu_has(X86_FEATURE_RDTSCP))
 		wrmsrl(MSR_TSC_AUX, svm->tsc_aux);
+
+	avic_vcpu_load(vcpu, cpu, true);
 }
 
 static void svm_vcpu_put(struct kvm_vcpu *vcpu)
@@ -1483,6 +1590,8 @@ static void svm_vcpu_put(struct kvm_vcpu *vcpu)
 	struct vcpu_svm *svm = to_svm(vcpu);
 	int i;
 
+	avic_vcpu_load(vcpu, 0, false);
+
 	++vcpu->stat.host_state_reload;
 	kvm_load_ldt(svm->host.ldt);
 #ifdef CONFIG_X86_64
@@ -1498,6 +1607,16 @@ static void svm_vcpu_put(struct kvm_vcpu *vcpu)
 		wrmsrl(host_save_user_msrs[i], svm->host_user_msrs[i]);
 }
 
+static void svm_vcpu_blocking(struct kvm_vcpu *vcpu)
+{
+	avic_set_running(vcpu, false);
+}
+
+static void svm_vcpu_unblocking(struct kvm_vcpu *vcpu)
+{
+	avic_set_running(vcpu, true);
+}
+
 static unsigned long svm_get_rflags(struct kvm_vcpu *vcpu)
 {
 	return to_svm(vcpu)->vmcb->save.rflags;
@@ -4876,6 +4995,8 @@ static struct kvm_x86_ops svm_x86_ops = {
 	.prepare_guest_switch = svm_prepare_guest_switch,
 	.vcpu_load = svm_vcpu_load,
 	.vcpu_put = svm_vcpu_put,
+	.vcpu_blocking = svm_vcpu_blocking,
+	.vcpu_unblocking = svm_vcpu_unblocking,
 
 	.update_bp_intercept = update_bp_intercept,
 	.get_msr = svm_get_msr,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PART1 RFC v3 02/12] KVM: x86: Introducing kvm_x86_ops VM init/uninit hooks
  2016-03-18  6:09 ` [PART1 RFC v3 02/12] KVM: x86: Introducing kvm_x86_ops VM init/uninit hooks Suravee Suthikulpanit
@ 2016-03-18 10:11   ` Paolo Bonzini
  2016-03-29  5:27     ` Suravee Suthikulpanit
  0 siblings, 1 reply; 44+ messages in thread
From: Paolo Bonzini @ 2016-03-18 10:11 UTC (permalink / raw)
  To: Suravee Suthikulpanit, rkrcmar, joro, bp, gleb, alex.williamson
  Cc: kvm, linux-kernel, wei, sherry.hurwitz



On 18/03/2016 07:09, Suravee Suthikulpanit wrote:
> Adding function pointers in struct kvm_x86_ops for processor-specific
> layer to provide hooks for when KVM initialize and un-initialize VM.

This is not the only thing your patch is doing, and the "other" change
definitely needs a lot more explanation about why you did it and how you
audited the code to ensure that it is safe.

Paolo

> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> ---
>  arch/x86/include/asm/kvm_host.h |  3 +++
>  arch/x86/kvm/x86.c              | 10 +++++++++-
>  virt/kvm/kvm_main.c             |  8 ++++----
>  3 files changed, 16 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 44adbb8..4b0dd0f 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -828,6 +828,9 @@ struct kvm_x86_ops {
>  	bool (*cpu_has_high_real_mode_segbase)(void);
>  	void (*cpuid_update)(struct kvm_vcpu *vcpu);
>  
> +	int (*vm_init)(struct kvm *kvm);
> +	void (*vm_uninit)(struct kvm *kvm);
> +
>  	/* Create, but do not attach this VCPU */
>  	struct kvm_vcpu *(*vcpu_create)(struct kvm *kvm, unsigned id);
>  	void (*vcpu_free)(struct kvm_vcpu *vcpu);
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 429c3f5..4d2961d 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -7700,6 +7700,8 @@ void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu)
>  
>  int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>  {
> +	int ret = 0;
> +
>  	if (type)
>  		return -EINVAL;
>  
> @@ -7724,7 +7726,10 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>  	INIT_DELAYED_WORK(&kvm->arch.kvmclock_update_work, kvmclock_update_fn);
>  	INIT_DELAYED_WORK(&kvm->arch.kvmclock_sync_work, kvmclock_sync_fn);
>  
> -	return 0;
> +	if (kvm_x86_ops->vm_init)
> +		ret = kvm_x86_ops->vm_init(kvm);
> +
> +	return ret;
>  }
>  
>  static void kvm_unload_vcpu_mmu(struct kvm_vcpu *vcpu)
> @@ -7751,6 +7756,9 @@ static void kvm_free_vcpus(struct kvm *kvm)
>  	kvm_for_each_vcpu(i, vcpu, kvm)
>  		kvm_arch_vcpu_free(vcpu);
>  
> +	if (kvm_x86_ops->vm_uninit)
> +		kvm_x86_ops->vm_uninit(kvm);
> +
>  	mutex_lock(&kvm->lock);
>  	for (i = 0; i < atomic_read(&kvm->online_vcpus); i++)
>  		kvm->vcpus[i] = NULL;
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 1ca0258..5460325 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -536,10 +536,6 @@ static struct kvm *kvm_create_vm(unsigned long type)
>  	if (!kvm)
>  		return ERR_PTR(-ENOMEM);
>  
> -	r = kvm_arch_init_vm(kvm, type);
> -	if (r)
> -		goto out_err_no_disable;
> -
>  	r = hardware_enable_all();
>  	if (r)
>  		goto out_err_no_disable;
> @@ -578,6 +574,10 @@ static struct kvm *kvm_create_vm(unsigned long type)
>  	atomic_set(&kvm->users_count, 1);
>  	INIT_LIST_HEAD(&kvm->devices);
>  
> +	r = kvm_arch_init_vm(kvm, type);
> +	if (r)
> +		goto out_err;
> +
>  	r = kvm_init_mmu_notifier(kvm);
>  	if (r)
>  		goto out_err;
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PART1 RFC v3 03/12] KVM: x86: Introducing kvm_x86_ops VCPU blocking/unblocking hooks
  2016-03-18  6:09 ` [PART1 RFC v3 03/12] KVM: x86: Introducing kvm_x86_ops VCPU blocking/unblocking hooks Suravee Suthikulpanit
@ 2016-03-18 10:11   ` Paolo Bonzini
  0 siblings, 0 replies; 44+ messages in thread
From: Paolo Bonzini @ 2016-03-18 10:11 UTC (permalink / raw)
  To: Suravee Suthikulpanit, rkrcmar, joro, bp, gleb, alex.williamson
  Cc: kvm, linux-kernel, wei, sherry.hurwitz



On 18/03/2016 07:09, Suravee Suthikulpanit wrote:
> Adding new function pointer in struct kvm_x86_ops, and calling them
> from the kvm_arch_vcpu[blocking/unblocking].
> 
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> ---
>  arch/x86/include/asm/kvm_host.h | 17 +++++++++++++++--
>  1 file changed, 15 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 4b0dd0f..87eac2a 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -972,6 +972,10 @@ struct kvm_x86_ops {
>  	 */
>  	int (*pre_block)(struct kvm_vcpu *vcpu);
>  	void (*post_block)(struct kvm_vcpu *vcpu);
> +
> +	void (*vcpu_blocking)(struct kvm_vcpu *vcpu);
> +	void (*vcpu_unblocking)(struct kvm_vcpu *vcpu);
> +
>  	int (*update_pi_irte)(struct kvm *kvm, unsigned int host_irq,
>  			      uint32_t guest_irq, bool set);
>  };
> @@ -1323,7 +1327,16 @@ bool kvm_intr_is_single_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq,
>  void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
>  		     struct kvm_lapic_irq *irq);
>  
> -static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
> -static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
> +static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu)
> +{
> +	if (kvm_x86_ops->vcpu_blocking)
> +		kvm_x86_ops->vcpu_blocking(vcpu);
> +}
> +
> +static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu)
> +{
> +	if (kvm_x86_ops->vcpu_unblocking)
> +		kvm_x86_ops->vcpu_unblocking(vcpu);
> +}
>  
>  #endif /* _ASM_X86_KVM_HOST_H */
> 

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PART1 RFC v3 04/12] KVM: split kvm_vcpu_wake_up from kvm_vcpu_kick
  2016-03-18  6:09 ` [PART1 RFC v3 04/12] KVM: split kvm_vcpu_wake_up from kvm_vcpu_kick Suravee Suthikulpanit
@ 2016-03-18 10:11   ` Paolo Bonzini
  0 siblings, 0 replies; 44+ messages in thread
From: Paolo Bonzini @ 2016-03-18 10:11 UTC (permalink / raw)
  To: Suravee Suthikulpanit, rkrcmar, joro, bp, gleb, alex.williamson
  Cc: kvm, linux-kernel, wei, sherry.hurwitz



On 18/03/2016 07:09, Suravee Suthikulpanit wrote:
> From: Radim Krčmář <rkrcmar@redhat.com>
> 
> AVIC has a use for kvm_vcpu_wake_up.
> 
> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
> Tested-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> ---
>  include/linux/kvm_host.h |  1 +
>  virt/kvm/kvm_main.c      | 19 +++++++++++++------
>  2 files changed, 14 insertions(+), 6 deletions(-)
> 
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 5276fe0..673749d 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -651,6 +651,7 @@ void kvm_vcpu_mark_page_dirty(struct kvm_vcpu *vcpu, gfn_t gfn);
>  void kvm_vcpu_block(struct kvm_vcpu *vcpu);
>  void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu);
>  void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu);
> +void kvm_vcpu_wake_up(struct kvm_vcpu *vcpu);
>  void kvm_vcpu_kick(struct kvm_vcpu *vcpu);
>  int kvm_vcpu_yield_to(struct kvm_vcpu *target);
>  void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu);
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 5460325..e8fe787 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2048,13 +2048,8 @@ out:
>  EXPORT_SYMBOL_GPL(kvm_vcpu_block);
>  
>  #ifndef CONFIG_S390
> -/*
> - * Kick a sleeping VCPU, or a guest VCPU in guest mode, into host kernel mode.
> - */
> -void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
> +void kvm_vcpu_wake_up(struct kvm_vcpu *vcpu)
>  {
> -	int me;
> -	int cpu = vcpu->cpu;
>  	struct swait_queue_head *wqp;
>  
>  	wqp = kvm_arch_vcpu_wq(vcpu);
> @@ -2063,6 +2058,18 @@ void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
>  		++vcpu->stat.halt_wakeup;
>  	}
>  
> +}
> +EXPORT_SYMBOL_GPL(kvm_vcpu_wake_up);
> +
> +/*
> + * Kick a sleeping VCPU, or a guest VCPU in guest mode, into host kernel mode.
> + */
> +void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
> +{
> +	int me;
> +	int cpu = vcpu->cpu;
> +
> +	kvm_vcpu_wake_up(vcpu);
>  	me = get_cpu();
>  	if (cpu != me && (unsigned)cpu < nr_cpu_ids && cpu_online(cpu))
>  		if (kvm_arch_vcpu_should_kick(vcpu))
> 

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PART1 RFC v3 05/12] svm: Introduce new AVIC VMCB registers
  2016-03-18  6:09 ` [PART1 RFC v3 05/12] svm: Introduce new AVIC VMCB registers Suravee Suthikulpanit
@ 2016-03-18 10:11   ` Paolo Bonzini
  0 siblings, 0 replies; 44+ messages in thread
From: Paolo Bonzini @ 2016-03-18 10:11 UTC (permalink / raw)
  To: Suravee Suthikulpanit, rkrcmar, joro, bp, gleb, alex.williamson
  Cc: kvm, linux-kernel, wei, sherry.hurwitz



On 18/03/2016 07:09, Suravee Suthikulpanit wrote:
> Introduce new AVIC VMCB registers.
> 
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> ---
>  arch/x86/include/asm/svm.h | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
> index 6136d99..66e26a0 100644
> --- a/arch/x86/include/asm/svm.h
> +++ b/arch/x86/include/asm/svm.h
> @@ -78,7 +78,8 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
>  	u32 exit_int_info;
>  	u32 exit_int_info_err;
>  	u64 nested_ctl;
> -	u8 reserved_4[16];
> +	u64 avic_vapic_bar;
> +	u8 reserved_4[8];
>  	u32 event_inj;
>  	u32 event_inj_err;
>  	u64 nested_cr3;
> @@ -88,7 +89,11 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
>  	u64 next_rip;
>  	u8 insn_len;
>  	u8 insn_bytes[15];
> -	u8 reserved_6[800];
> +	u64 avic_backing_page;	/* Offset 0xe0 */
> +	u8 reserved_6[8];	/* Offset 0xe8 */
> +	u64 avic_log_apic_id;	/* Offset 0xf0 */
> +	u64 avic_phy_apic_id;	/* Offset 0xf8 */
> +	u8 reserved_7[768];
>  };
>  
>  
> 

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PART1 RFC v3 07/12] svm: Add interrupt injection via AVIC
  2016-03-18  6:09 ` [PART1 RFC v3 07/12] svm: Add interrupt injection via AVIC Suravee Suthikulpanit
@ 2016-03-18 10:22   ` Paolo Bonzini
  2016-04-05 13:26     ` Suravee Suthikulpanit
  0 siblings, 1 reply; 44+ messages in thread
From: Paolo Bonzini @ 2016-03-18 10:22 UTC (permalink / raw)
  To: Suravee Suthikulpanit, rkrcmar, joro, bp, gleb, alex.williamson
  Cc: kvm, linux-kernel, wei, sherry.hurwitz



On 18/03/2016 07:09, Suravee Suthikulpanit wrote:
> This patch introduces a new mechanism to inject interrupt using AVIC.
> Since VINTR is not supported when enable AVIC, we need to inject
> interrupt via APIC backing page instead.
> 
> This patch also adds support for AVIC doorbell, which is used by
> KVM to signal a running vcpu to check IRR for injected interrupts.
> 
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>

Looks good, but I think it breaks nested virtualization.  See below.

> ---
>  arch/x86/kvm/svm.c | 46 ++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 44 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 621c948..8e31ad3 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -71,6 +71,8 @@ MODULE_DEVICE_TABLE(x86cpu, svm_cpu_id);
>  #define SVM_FEATURE_DECODE_ASSIST  (1 <<  7)
>  #define SVM_FEATURE_PAUSE_FILTER   (1 << 10)
>  
> +#define SVM_AVIC_DOORBELL	0xc001011b
> +
>  #define NESTED_EXIT_HOST	0	/* Exit handled on host level */
>  #define NESTED_EXIT_DONE	1	/* Exit caused nested vmexit  */
>  #define NESTED_EXIT_CONTINUE	2	/* Further checks needed      */
> @@ -217,6 +219,8 @@ static bool npt_enabled = true;
>  static bool npt_enabled;
>  #endif
>  
> +static struct kvm_x86_ops svm_x86_ops;
> +
>  /* allow nested paging (virtualized MMU) for all guests */
>  static int npt = true;
>  module_param(npt, int, S_IRUGO);
> @@ -291,6 +295,18 @@ static inline struct vcpu_svm *to_svm(struct kvm_vcpu *vcpu)
>  	return container_of(vcpu, struct vcpu_svm, vcpu);
>  }
>  
> +static inline bool avic_vcpu_is_running(struct kvm_vcpu *vcpu)
> +{
> +	struct vcpu_svm *svm = to_svm(vcpu);
> +	u64 *entry;
> +
> +	entry = svm->avic_phy_apic_id_cache;
> +	if (!entry)
> +		return false;
> +
> +	return (READ_ONCE(*entry) & AVIC_PHY_APIC_ID__IS_RUN_MSK);

AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK

> +}
> +
>  static void recalc_intercepts(struct vcpu_svm *svm)
>  {
>  	struct vmcb_control_area *c, *h;
> @@ -964,6 +980,8 @@ static __init int svm_hardware_setup(void)
>  
>  	if (avic) {
>  		printk(KERN_INFO "kvm: AVIC enabled\n");
> +	} else {
> +		svm_x86_ops.deliver_posted_interrupt = NULL;
>  	}
>  
>  	return 0;
> @@ -2877,8 +2895,10 @@ static int clgi_interception(struct vcpu_svm *svm)
>  	disable_gif(svm);
>  
>  	/* After a CLGI no interrupts should come */
> -	svm_clear_vintr(svm);
> -	svm->vmcb->control.int_ctl &= ~V_IRQ_MASK;
> +	if (!svm_vcpu_avic_enabled(svm)) {
> +		svm_clear_vintr(svm);
> +		svm->vmcb->control.int_ctl &= ~V_IRQ_MASK;
> +	}

This is for nested virtualization.  Unless you support nested AVIC, the
L2 guest should run without AVIC (i.e. IsRunning should be false) and
use the old VINTR mechanism.

>  	mark_dirty(svm->vmcb, VMCB_INTR);
>  
> @@ -3453,6 +3473,9 @@ static int msr_interception(struct vcpu_svm *svm)
>  
>  static int interrupt_window_interception(struct vcpu_svm *svm)
>  {
> +	if (svm_vcpu_avic_enabled(svm))
> +		BUG();

Please WARN and return, but I suspect this may also need some changes
for nested virtualization.

>  	kvm_make_request(KVM_REQ_EVENT, &svm->vcpu);
>  	svm_clear_vintr(svm);
>  	svm->vmcb->control.int_ctl &= ~V_IRQ_MASK;
> @@ -3767,6 +3790,7 @@ static inline void svm_inject_irq(struct vcpu_svm *svm, int irq)
>  {
>  	struct vmcb_control_area *control;
>  
> +	/* The following fields are ignored when AVIC is enabled */
>  	control = &svm->vmcb->control;
>  	control->int_vector = irq;
>  	control->int_ctl &= ~V_INTR_PRIO_MASK;
> @@ -3844,6 +3868,20 @@ static void svm_sync_pir_to_irr(struct kvm_vcpu *vcpu)
>  	return;
>  }
>  
> +static void svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
> +{
> +	struct vcpu_svm *svm = to_svm(vcpu);
> +
> +	kvm_lapic_set_vector(vec, svm->vcpu.arch.apic->regs + APIC_IRR);
> +	smp_mb__after_atomic();
> +
> +	if (avic_vcpu_is_running(vcpu))
> +		wrmsrl(SVM_AVIC_DOORBELL,
> +		       __default_cpu_present_to_apicid(vcpu->cpu));
> +	else
> +		kvm_vcpu_wake_up(vcpu);
> +}
> +
>  static int svm_nmi_allowed(struct kvm_vcpu *vcpu)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
> @@ -3904,6 +3942,9 @@ static void enable_irq_window(struct kvm_vcpu *vcpu)
>  	 * get that intercept, this function will be called again though and
>  	 * we'll get the vintr intercept.
>  	 */
> +	if (svm_vcpu_avic_enabled(svm))
> +		return;

Same here.

>  	if (gif_set(svm) && nested_svm_intr(svm)) {
>  		svm_set_vintr(svm);
>  		svm_inject_irq(svm, 0x0);
> @@ -4638,6 +4679,7 @@ static struct kvm_x86_ops svm_x86_ops = {
>  	.sched_in = svm_sched_in,
>  
>  	.pmu_ops = &amd_pmu_ops,
> +	.deliver_posted_interrupt = svm_deliver_avic_intr,
>  };
>  
>  static int __init svm_init(void)
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PART1 RFC v3 08/12] KVM: x86: Add trace events for AVIC
  2016-03-18  6:09 ` [PART1 RFC v3 08/12] KVM: x86: Add trace events for AVIC Suravee Suthikulpanit
@ 2016-03-18 10:24   ` Paolo Bonzini
  2016-03-28 11:27     ` Suravee Suthikulpanit
  0 siblings, 1 reply; 44+ messages in thread
From: Paolo Bonzini @ 2016-03-18 10:24 UTC (permalink / raw)
  To: Suravee Suthikulpanit, rkrcmar, joro, bp, gleb, alex.williamson
  Cc: kvm, linux-kernel, wei, sherry.hurwitz



On 18/03/2016 07:09, Suravee Suthikulpanit wrote:
> Introduce trace events for AMD AVIC incomplete IPI vmexit, and
> AVIC unaccelerate access vmexit.
> 
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> ---
>  arch/x86/kvm/trace.h | 57 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  arch/x86/kvm/x86.c   |  2 ++
>  2 files changed, 59 insertions(+)
> 
> diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
> index ad9f6a2..3c85a3d 100644
> --- a/arch/x86/kvm/trace.h
> +++ b/arch/x86/kvm/trace.h
> @@ -1288,6 +1288,63 @@ TRACE_EVENT(kvm_hv_stimer_cleanup,
>  		  __entry->vcpu_id, __entry->timer_index)
>  );
>  
> +/*
> + * Tracepoint for AMD AVIC
> + */
> +TRACE_EVENT(kvm_avic_incomplete_ipi,
> +	    TP_PROTO(u32 vcpu, u32 icrh, u32 icrl, u32 id, u32 index),
> +	    TP_ARGS(vcpu, icrh, icrl, id, index),
> +
> +	TP_STRUCT__entry(
> +		__field(u32, vcpu)
> +		__field(u32, icrh)
> +		__field(u32, icrl)
> +		__field(u32, id)
> +		__field(u32, index)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->vcpu = vcpu;
> +		__entry->icrh = icrh;
> +		__entry->icrl = icrl;
> +		__entry->id = id;
> +		__entry->index = index;
> +	),
> +
> +	TP_printk("vcpu=%#x, icrh:icrl=%#010x:%08x, id=%u, index=%u\n",

vcpus are usually printed with %u.  Apart from this, the patch looks
good.  You can squash it in "svm: Add VMEXIT handlers for AVIC".

Paolo

> +		  __entry->vcpu, __entry->icrh, __entry->icrl,
> +		  __entry->id, __entry->index)
> +);
> +
> +TRACE_EVENT(kvm_avic_unaccelerated_access,
> +	    TP_PROTO(u32 vcpu, u32 offset, bool ft, bool rw, u32 vec),
> +	    TP_ARGS(vcpu, offset, ft, rw, vec),
> +
> +	TP_STRUCT__entry(
> +		__field(u32, vcpu)
> +		__field(u32, offset)
> +		__field(bool, ft)
> +		__field(bool, rw)
> +		__field(u32, vec)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->vcpu = vcpu;
> +		__entry->offset = offset;
> +		__entry->ft = ft;
> +		__entry->rw = rw;
> +		__entry->vec = vec;
> +	),
> +
> +	TP_printk("vcpu=%#x, offset=%#x(%s), %s, %s, vec=%#x\n",
> +		  __entry->vcpu,
> +		  __entry->offset,
> +		  __print_symbolic(__entry->offset, kvm_trace_symbol_apic),
> +		  __entry->ft ? "trap" : "fault",
> +		  __entry->rw ? "write" : "read",
> +		  __entry->vec)
> +);
> +
>  #endif /* _TRACE_KVM_H */
>  
>  #undef TRACE_INCLUDE_PATH
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 4d2961d..775de1c 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -8395,3 +8395,5 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_write_tsc_offset);
>  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_ple_window);
>  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_pml_full);
>  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_pi_irte_update);
> +EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_unaccelerated_access);
> +EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_incomplete_ipi);
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PART1 RFC v3 09/12] svm: Add VMEXIT handlers for AVIC
  2016-03-18  6:09 ` [PART1 RFC v3 09/12] svm: Add VMEXIT handlers " Suravee Suthikulpanit
@ 2016-03-18 11:11   ` Paolo Bonzini
  0 siblings, 0 replies; 44+ messages in thread
From: Paolo Bonzini @ 2016-03-18 11:11 UTC (permalink / raw)
  To: Suravee Suthikulpanit, rkrcmar, joro, bp, gleb, alex.williamson
  Cc: kvm, linux-kernel, wei, sherry.hurwitz



On 18/03/2016 07:09, Suravee Suthikulpanit wrote:
> From: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> 
> Introduce VMEXIT handlers, avic_incp_ipi_interception() and
> avic_noaccel_interception().

The names of the functions have changed.

> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> ---
>  arch/x86/include/uapi/asm/svm.h |   9 +-
>  arch/x86/kvm/lapic.h            |   3 +
>  arch/x86/kvm/svm.c              | 244 ++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 255 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
> index 8a4add8..131b0b9 100644
> --- a/arch/x86/include/uapi/asm/svm.h
> +++ b/arch/x86/include/uapi/asm/svm.h
> @@ -73,6 +73,8 @@
>  #define SVM_EXIT_MWAIT_COND    0x08c
>  #define SVM_EXIT_XSETBV        0x08d
>  #define SVM_EXIT_NPF           0x400
> +#define SVM_EXIT_AVIC_INCOMPLETE_IPI		0x401
> +#define SVM_EXIT_AVIC_UNACCELERATED_ACCESS	0x402
>  
>  #define SVM_EXIT_ERR           -1
>  
> @@ -107,8 +109,10 @@
>  	{ SVM_EXIT_SMI,         "smi" }, \
>  	{ SVM_EXIT_INIT,        "init" }, \
>  	{ SVM_EXIT_VINTR,       "vintr" }, \
> +	{ SVM_EXIT_CR0_SEL_WRITE, "cr0_sec_write" }, \

sel_write.

>  	{ SVM_EXIT_CPUID,       "cpuid" }, \
>  	{ SVM_EXIT_INVD,        "invd" }, \
> +	{ SVM_EXIT_PAUSE,       "pause" }, \
>  	{ SVM_EXIT_HLT,         "hlt" }, \
>  	{ SVM_EXIT_INVLPG,      "invlpg" }, \
>  	{ SVM_EXIT_INVLPGA,     "invlpga" }, \
> @@ -127,7 +131,10 @@
>  	{ SVM_EXIT_MONITOR,     "monitor" }, \
>  	{ SVM_EXIT_MWAIT,       "mwait" }, \
>  	{ SVM_EXIT_XSETBV,      "xsetbv" }, \
> -	{ SVM_EXIT_NPF,         "npf" }
> +	{ SVM_EXIT_NPF,         "npf" }, \
> +	{ SVM_EXIT_RSM,         "rsm" }, \
> +	{ SVM_EXIT_AVIC_INCOMPLETE_IPI,		"avic_incomplete_ipi" }, \
> +	{ SVM_EXIT_AVIC_UNACCELERATED_ACCESS,   "avic_unaccelerated_access" }
>  
>  
>  #endif /* _UAPI__SVM_H */
> diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
> index 7bf8184..ed23af0 100644
> --- a/arch/x86/kvm/lapic.h
> +++ b/arch/x86/kvm/lapic.h
> @@ -9,6 +9,9 @@
>  #define KVM_APIC_SIPI		1
>  #define KVM_APIC_LVT_NUM	6
>  
> +#define KVM_APIC_SHORT_MASK	0xc0000
> +#define KVM_APIC_DEST_MASK	0x800
> +
>  struct kvm_timer {
>  	struct hrtimer timer;
>  	s64 period; 				/* unit: ns */
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 8e31ad3..6303147 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -3508,6 +3508,248 @@ static int mwait_interception(struct vcpu_svm *svm)
>  	return nop_interception(svm);
>  }
>  
> +enum avic_ipi_failure_cause {
> +	AVIC_IPI_FAILURE_INVALID_INT_TYPE,
> +	AVIC_IPI_FAILURE_TARGET_NOT_RUNNING,
> +	AVIC_IPI_FAILURE_INVALID_TARGET,
> +	AVIC_IPI_FAILURE_INVALID_BACKING_PAGE,
> +};
> +
> +static int avic_incomplete_ipi_interception(struct vcpu_svm *svm)
> +{
> +	u32 icrh = svm->vmcb->control.exit_info_1 >> 32;
> +	u32 icrl = svm->vmcb->control.exit_info_1;
> +	u32 id = svm->vmcb->control.exit_info_2 >> 32;
> +	u32 index = svm->vmcb->control.exit_info_2 && 0xFF;
> +	struct kvm_lapic *apic = svm->vcpu.arch.apic;
> +
> +	trace_kvm_avic_incomplete_ipi(svm->vcpu.vcpu_id, icrh, icrl, id, index);
> +
> +	switch (id) {
> +	case AVIC_IPI_FAILURE_INVALID_INT_TYPE:
> +		/*
> +		 * AVIC hardware handles the generation of
> +		 * IPIs when the specified Message Type is Fixed
> +		 * (also known as fixed delivery mode) and
> +		 * the Trigger Mode is edge-triggered. The hardware
> +		 * also supports self and broadcast delivery modes
> +		 * specified via the Destination Shorthand(DSH)
> +		 * field of the ICRL. Logical and physical APIC ID
> +		 * formats are supported. All other IPI types cause
> +		 * a #VMEXIT, which needs to emulated.
> +		 */
> +		kvm_lapic_reg_write(apic, APIC_ICR2, icrh);
> +		kvm_lapic_reg_write(apic, APIC_ICR, icrl);
> +		break;
> +	case AVIC_IPI_FAILURE_TARGET_NOT_RUNNING: {
> +		int i;
> +		struct kvm_vcpu *vcpu;
> +		struct kvm *kvm = svm->vcpu.kvm;
> +		struct kvm_lapic *apic = svm->vcpu.arch.apic;
> +
> +		/*
> +		 * At this point, we expect that the AVIC HW has already
> +		 * set the appropriate IRR bits on the valid target
> +		 * vcpus. So, we just need to kick the appropriate vcpu.
> +		 */
> +		kvm_for_each_vcpu(i, vcpu, kvm) {
> +			bool m = kvm_apic_match_dest(vcpu, apic,
> +						     icrl & KVM_APIC_SHORT_MASK,
> +						     GET_APIC_DEST_FIELD(icrh),
> +						     icrl & KVM_APIC_DEST_MASK);
> +
> +			if (m && !avic_vcpu_is_running(vcpu))
> +				kvm_vcpu_wake_up(vcpu);
> +		}
> +		break;
> +	}
> +	case AVIC_IPI_FAILURE_INVALID_TARGET:
> +		break;
> +	case AVIC_IPI_FAILURE_INVALID_BACKING_PAGE:
> +		WARN_ONCE(1, "Invalid backing page\n");
> +		break;
> +	default:
> +		pr_err("Unknown IPI interception\n");
> +	}
> +
> +	return 1;
> +}
> +
> +static u32 *avic_get_log_apic_id_entry(struct kvm_vcpu *vcpu, u8 mda, bool flat)
> +{
> +	struct kvm_arch *vm_data = &vcpu->kvm->arch;
> +	int index;
> +	u32 *avic_log_ait;

u32 *logical_apic_id_table;

> +
> +	if (flat) { /* flat */
> +		if (mda > 7)
> +			return NULL;
> +		index = mda;
> +	} else { /* cluster */
> +		int apic_id = mda & 0xf;
> +		int cluster_id = (mda & 0xf0) >> 8;
> +
> +		if (apic_id > 4 || cluster_id >= 0xf)
> +			return NULL;
> +		index = (cluster_id << 2) + apic_id;
> +	}
> +	avic_log_ait = (u32 *) page_address(vm_data->avic_log_apic_id_table_page);
> +
> +	return &avic_log_ait[index];
> +}
> +
> +static int avic_handle_ldr_write(struct kvm_vcpu *vcpu, u8 g_phy_apic_id,
> +				 u8 log_apic_id)
> +{
> +	u32 mod;
> +	u32 *entry;
> +	struct vcpu_svm *svm = to_svm(vcpu);
> +
> +	if (!svm)
> +		return -EINVAL;
> +
> +	mod = (kvm_apic_get_reg(svm->vcpu.arch.apic, APIC_DFR) >> 28) & 0xf;
> +	entry = avic_get_log_apic_id_entry(vcpu, log_apic_id, (mod == 0xf));
> +	if (!entry)
> +		return -EINVAL;
> +
> +	*entry &= ~AVIC_LOG_APIC_ID__GUEST_PHY_APIC_ID_MSK;
> +	*entry |= (g_phy_apic_id & AVIC_LOG_APIC_ID__GUEST_PHY_APIC_ID_MSK);
> +	*entry |= AVIC_LOG_APIC_ID__VALID_MSK;

I think you should do these computations in a separate u32 variable and
then assign it to *entry.

Apart from these, it looks pretty good!

Paolo

> +	return 0;
> +}
> +
> +static int avic_unaccel_trap_write(struct vcpu_svm *svm)
> +{
> +	u32 offset = svm->vmcb->control.exit_info_1 & 0xFF0;
> +	struct kvm_lapic *apic = svm->vcpu.arch.apic;
> +	u32 reg = kvm_apic_get_reg(apic, offset);
> +
> +	switch (offset) {
> +	case APIC_ID: {
> +		u32 aid = (reg >> 24) & 0xff;
> +		u64 *o_ent = avic_get_phy_apic_id_entry(&svm->vcpu,
> +							svm->vcpu.vcpu_id);
> +		u64 *n_ent = avic_get_phy_apic_id_entry(&svm->vcpu, aid);
> +
> +		if (!n_ent || !o_ent)
> +			return 0;
> +
> +		/* We need to move phy_apic_entry to new offset */
> +		*n_ent = *o_ent;
> +		*((u64 *)o_ent) = 0ULL;

Unnecessary cast.

> +		svm->avic_phy_apic_id_cache = n_ent;
> +		break;
> +	}
> +	case APIC_LDR: {
> +		int ret, lid;
> +		int dlid = (reg >> 24) & 0xff;
> +
> +		if (!dlid)
> +			return 0;
> +
> +		lid = ffs(dlid) - 1;
> +		ret = avic_handle_ldr_write(&svm->vcpu, svm->vcpu.vcpu_id, lid);
> +		if (ret)
> +			return 0;
> +
> +		break;
> +	}
> +	case APIC_DFR: {
> +		struct kvm_arch *vm_data = &svm->vcpu.kvm->arch;
> +		u32 mod = (reg >> 28) & 0xf;
> +
> +		/*
> +		 * We assume that all local APICs are using the same type.
> +		 * If this changes, we need to rebuild the AVIC logical
> +		 * APID id table with subsequent write to APIC_LDR.
> +		 */
> +		if (vm_data->ldr_mode != mod) {
> +			clear_page(page_address(vm_data->avic_log_apic_id_table_page));
> +			vm_data->ldr_mode = mod;
> +		}
> +		break;
> +	}
> +	default:
> +		break;
> +	}
> +
> +	kvm_lapic_reg_write(apic, offset, reg);
> +
> +	return 1;
> +}
> +
> +static bool is_avic_unaccelerated_access_trap(u32 offset)
> +{
> +	bool ret = false;
> +
> +	switch (offset) {
> +	case APIC_ID:
> +	case APIC_EOI:
> +	case APIC_RRR:
> +	case APIC_LDR:
> +	case APIC_DFR:
> +	case APIC_SPIV:
> +	case APIC_ESR:
> +	case APIC_ICR:
> +	case APIC_LVTT:
> +	case APIC_LVTTHMR:
> +	case APIC_LVTPC:
> +	case APIC_LVT0:
> +	case APIC_LVT1:
> +	case APIC_LVTERR:
> +	case APIC_TMICT:
> +	case APIC_TDCR:
> +		ret = true;
> +		break;
> +	default:
> +		break;
> +	}
> +	return ret;
> +}
> +
> +#define AVIC_UNACCEL_ACCESS_WRITE_MSK		1
> +#define AVIC_UNACCEL_ACCESS_OFFSET_MSK		0xFF0
> +#define AVIC_UNACCEL_ACCESS_VECTOR_MSK		0xFFFFFFFF
> +
> +static int avic_unaccelerated_access_interception(struct vcpu_svm *svm)
> +{
> +	int ret = 0;
> +	u32 offset = svm->vmcb->control.exit_info_1 &
> +		     AVIC_UNACCEL_ACCESS_OFFSET_MSK;
> +	u32 vector = svm->vmcb->control.exit_info_2 &
> +		     AVIC_UNACCEL_ACCESS_VECTOR_MSK;
> +	bool write = (svm->vmcb->control.exit_info_1 >> 32) &
> +		     AVIC_UNACCEL_ACCESS_WRITE_MSK;
> +	bool trap = is_avic_unaccelerated_access_trap(offset);
> +
> +	trace_kvm_avic_unaccelerated_access(svm->vcpu.vcpu_id, offset,
> +					    trap, write, vector);
> +
> +	/**
> +	 * AVIC does not support x2APIC registers, and we only advertise
> +	 * xAPIC when enable AVIC. Therefore, access to these registers
> +	 * will not be supported.
> +	 */
> +	if (offset >= 0x400) {
> +		WARN(1, "Unsupported APIC offset %#x\n", offset);
> +		return ret;
> +	}
> +
> +	if (trap) {
> +		/* Handling Trap */
> +		if (!write) /* Trap read should never happens */
> +			BUG();
> +		ret = avic_unaccel_trap_write(svm);
> +	} else {
> +		/* Handling Fault */
> +		ret = (emulate_instruction(&svm->vcpu, 0) == EMULATE_DONE);
> +	}
> +
> +	return ret;
> +}
> +
>  static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = {
>  	[SVM_EXIT_READ_CR0]			= cr_interception,
>  	[SVM_EXIT_READ_CR3]			= cr_interception,
> @@ -3571,6 +3813,8 @@ static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = {
>  	[SVM_EXIT_XSETBV]			= xsetbv_interception,
>  	[SVM_EXIT_NPF]				= pf_interception,
>  	[SVM_EXIT_RSM]                          = emulate_on_interception,
> +	[SVM_EXIT_AVIC_INCOMPLETE_IPI]		= avic_incomplete_ipi_interception,
> +	[SVM_EXIT_AVIC_UNACCELERATED_ACCESS]	= avic_unaccelerated_access_interception,
>  };
>  
>  static void dump_vmcb(struct kvm_vcpu *vcpu)
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PART1 RFC v3 01/12] KVM: x86: Misc LAPIC changes to expose helper functions
  2016-03-18  6:09 ` [PART1 RFC v3 01/12] KVM: x86: Misc LAPIC changes to expose helper functions Suravee Suthikulpanit
@ 2016-03-18 11:16   ` Paolo Bonzini
  0 siblings, 0 replies; 44+ messages in thread
From: Paolo Bonzini @ 2016-03-18 11:16 UTC (permalink / raw)
  To: Suravee Suthikulpanit, rkrcmar, joro, bp, gleb, alex.williamson
  Cc: kvm, linux-kernel, wei, sherry.hurwitz



On 18/03/2016 07:09, Suravee Suthikulpanit wrote:
> From: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> 
> Exporting LAPIC utility functions and macros for re-use in SVM code.
> 
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> ---
>  arch/x86/kvm/lapic.c | 110 ++++++++++++++++++++++++++-------------------------
>  arch/x86/kvm/lapic.h |   8 ++++
>  2 files changed, 65 insertions(+), 53 deletions(-)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 3a045f3..6da1628 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -59,9 +59,8 @@
>  /* #define apic_debug(fmt,arg...) printk(KERN_WARNING fmt,##arg) */
>  #define apic_debug(fmt, arg...)
>  
> -#define APIC_LVT_NUM			6
>  /* 14 is the version for Xeon and Pentium 8.4.8*/
> -#define APIC_VERSION			(0x14UL | ((APIC_LVT_NUM - 1) << 16))
> +#define APIC_VERSION			(0x14UL | ((KVM_APIC_LVT_NUM - 1) << 16))
>  #define LAPIC_MMIO_LENGTH		(1 << 12)
>  /* followed define is not in apicdef.h */
>  #define APIC_SHORT_MASK			0xc0000
> @@ -76,10 +75,11 @@
>  #define VEC_POS(v) ((v) & (32 - 1))
>  #define REG_POS(v) (((v) >> 5) << 4)
>  
> -static inline void apic_set_reg(struct kvm_lapic *apic, int reg_off, u32 val)
> +void kvm_lapic_set_reg(struct kvm_lapic *apic, int reg_off, u32 val)
>  {
>  	*((u32 *) (apic->regs + reg_off)) = val;
>  }
> +EXPORT_SYMBOL_GPL(kvm_lapic_set_reg);
>  
>  static inline int apic_test_vector(int vec, void *bitmap)
>  {

Please keep the function inline, moving it to lapic.h

> @@ -94,10 +94,11 @@ bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector)
>  		apic_test_vector(vector, apic->regs + APIC_IRR);
>  }
>  
> -static inline void apic_set_vector(int vec, void *bitmap)
> +void kvm_lapic_set_vector(int vec, void *bitmap)
>  {
>  	set_bit(VEC_POS(vec), (bitmap) + REG_POS(vec));
>  }
> +EXPORT_SYMBOL_GPL(kvm_lapic_set_vector);

Same here, and also move apic_set_irr to lapic.h so that you don't have
to use the lower-level apic_set_vector directly.  (Of course renaming it
to kvm_lapic_set_irr).

Otherwise okay.

Paolo

>  static inline void apic_clear_vector(int vec, void *bitmap)
>  {
> @@ -212,7 +213,7 @@ static inline void apic_set_spiv(struct kvm_lapic *apic, u32 val)
>  {
>  	bool enabled = val & APIC_SPIV_APIC_ENABLED;
>  
> -	apic_set_reg(apic, APIC_SPIV, val);
> +	kvm_lapic_set_reg(apic, APIC_SPIV, val);
>  
>  	if (enabled != apic->sw_enabled) {
>  		apic->sw_enabled = enabled;
> @@ -226,13 +227,13 @@ static inline void apic_set_spiv(struct kvm_lapic *apic, u32 val)
>  
>  static inline void kvm_apic_set_id(struct kvm_lapic *apic, u8 id)
>  {
> -	apic_set_reg(apic, APIC_ID, id << 24);
> +	kvm_lapic_set_reg(apic, APIC_ID, id << 24);
>  	recalculate_apic_map(apic->vcpu->kvm);
>  }
>  
>  static inline void kvm_apic_set_ldr(struct kvm_lapic *apic, u32 id)
>  {
> -	apic_set_reg(apic, APIC_LDR, id);
> +	kvm_lapic_set_reg(apic, APIC_LDR, id);
>  	recalculate_apic_map(apic->vcpu->kvm);
>  }
>  
> @@ -240,8 +241,8 @@ static inline void kvm_apic_set_x2apic_id(struct kvm_lapic *apic, u8 id)
>  {
>  	u32 ldr = ((id >> 4) << 16) | (1 << (id & 0xf));
>  
> -	apic_set_reg(apic, APIC_ID, id << 24);
> -	apic_set_reg(apic, APIC_LDR, ldr);
> +	kvm_lapic_set_reg(apic, APIC_ID, id << 24);
> +	kvm_lapic_set_reg(apic, APIC_LDR, ldr);
>  	recalculate_apic_map(apic->vcpu->kvm);
>  }
>  
> @@ -287,10 +288,10 @@ void kvm_apic_set_version(struct kvm_vcpu *vcpu)
>  	feat = kvm_find_cpuid_entry(apic->vcpu, 0x1, 0);
>  	if (feat && (feat->ecx & (1 << (X86_FEATURE_X2APIC & 31))))
>  		v |= APIC_LVR_DIRECTED_EOI;
> -	apic_set_reg(apic, APIC_LVR, v);
> +	kvm_lapic_set_reg(apic, APIC_LVR, v);
>  }
>  
> -static const unsigned int apic_lvt_mask[APIC_LVT_NUM] = {
> +static const unsigned int apic_lvt_mask[KVM_APIC_LVT_NUM] = {
>  	LVT_MASK ,      /* part LVTT mask, timer mode mask added at runtime */
>  	LVT_MASK | APIC_MODE_MASK,	/* LVTTHMR */
>  	LVT_MASK | APIC_MODE_MASK,	/* LVTPC */
> @@ -351,7 +352,7 @@ EXPORT_SYMBOL_GPL(kvm_apic_update_irr);
>  
>  static inline void apic_set_irr(int vec, struct kvm_lapic *apic)
>  {
> -	apic_set_vector(vec, apic->regs + APIC_IRR);
> +	kvm_lapic_set_vector(vec, apic->regs + APIC_IRR);
>  	/*
>  	 * irr_pending must be true if any interrupt is pending; set it after
>  	 * APIC_IRR to avoid race with apic_clear_irr
> @@ -569,7 +570,7 @@ static void apic_update_ppr(struct kvm_lapic *apic)
>  		   apic, ppr, isr, isrv);
>  
>  	if (old_ppr != ppr) {
> -		apic_set_reg(apic, APIC_PROCPRI, ppr);
> +		kvm_lapic_set_reg(apic, APIC_PROCPRI, ppr);
>  		if (ppr < old_ppr)
>  			kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
>  	}
> @@ -577,7 +578,7 @@ static void apic_update_ppr(struct kvm_lapic *apic)
>  
>  static void apic_set_tpr(struct kvm_lapic *apic, u32 tpr)
>  {
> -	apic_set_reg(apic, APIC_TASKPRI, tpr);
> +	kvm_lapic_set_reg(apic, APIC_TASKPRI, tpr);
>  	apic_update_ppr(apic);
>  }
>  
> @@ -674,6 +675,7 @@ bool kvm_apic_match_dest(struct kvm_vcpu *vcpu, struct kvm_lapic *source,
>  		return false;
>  	}
>  }
> +EXPORT_SYMBOL_GPL(kvm_apic_match_dest);
>  
>  bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src,
>  		struct kvm_lapic_irq *irq, int *r, unsigned long *dest_map)
> @@ -844,7 +846,7 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
>  
>  		if (apic_test_vector(vector, apic->regs + APIC_TMR) != !!trig_mode) {
>  			if (trig_mode)
> -				apic_set_vector(vector, apic->regs + APIC_TMR);
> +				kvm_lapic_set_vector(vector, apic->regs + APIC_TMR);
>  			else
>  				apic_clear_vector(vector, apic->regs + APIC_TMR);
>  		}
> @@ -1109,7 +1111,7 @@ static inline struct kvm_lapic *to_lapic(struct kvm_io_device *dev)
>  	return container_of(dev, struct kvm_lapic, dev);
>  }
>  
> -static int apic_reg_read(struct kvm_lapic *apic, u32 offset, int len,
> +int kvm_lapic_reg_read(struct kvm_lapic *apic, u32 offset, int len,
>  		void *data)
>  {
>  	unsigned char alignment = offset & 0xf;
> @@ -1146,6 +1148,7 @@ static int apic_reg_read(struct kvm_lapic *apic, u32 offset, int len,
>  	}
>  	return 0;
>  }
> +EXPORT_SYMBOL_GPL(kvm_lapic_reg_read);
>  
>  static int apic_mmio_in_range(struct kvm_lapic *apic, gpa_t addr)
>  {
> @@ -1163,7 +1166,7 @@ static int apic_mmio_read(struct kvm_vcpu *vcpu, struct kvm_io_device *this,
>  	if (!apic_mmio_in_range(apic, address))
>  		return -EOPNOTSUPP;
>  
> -	apic_reg_read(apic, offset, len, data);
> +	kvm_lapic_reg_read(apic, offset, len, data);
>  
>  	return 0;
>  }
> @@ -1348,7 +1351,7 @@ static void apic_manage_nmi_watchdog(struct kvm_lapic *apic, u32 lvt0_val)
>  	}
>  }
>  
> -static int apic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
> +int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
>  {
>  	int ret = 0;
>  
> @@ -1380,7 +1383,7 @@ static int apic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
>  
>  	case APIC_DFR:
>  		if (!apic_x2apic_mode(apic)) {
> -			apic_set_reg(apic, APIC_DFR, val | 0x0FFFFFFF);
> +			kvm_lapic_set_reg(apic, APIC_DFR, val | 0x0FFFFFFF);
>  			recalculate_apic_map(apic->vcpu->kvm);
>  		} else
>  			ret = 1;
> @@ -1395,10 +1398,10 @@ static int apic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
>  			int i;
>  			u32 lvt_val;
>  
> -			for (i = 0; i < APIC_LVT_NUM; i++) {
> +			for (i = 0; i < KVM_APIC_LVT_NUM; i++) {
>  				lvt_val = kvm_apic_get_reg(apic,
>  						       APIC_LVTT + 0x10 * i);
> -				apic_set_reg(apic, APIC_LVTT + 0x10 * i,
> +				kvm_lapic_set_reg(apic, APIC_LVTT + 0x10 * i,
>  					     lvt_val | APIC_LVT_MASKED);
>  			}
>  			apic_update_lvtt(apic);
> @@ -1409,14 +1412,14 @@ static int apic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
>  	}
>  	case APIC_ICR:
>  		/* No delay here, so we always clear the pending bit */
> -		apic_set_reg(apic, APIC_ICR, val & ~(1 << 12));
> +		kvm_lapic_set_reg(apic, APIC_ICR, val & ~(1 << 12));
>  		apic_send_ipi(apic);
>  		break;
>  
>  	case APIC_ICR2:
>  		if (!apic_x2apic_mode(apic))
>  			val &= 0xff000000;
> -		apic_set_reg(apic, APIC_ICR2, val);
> +		kvm_lapic_set_reg(apic, APIC_ICR2, val);
>  		break;
>  
>  	case APIC_LVT0:
> @@ -1430,7 +1433,7 @@ static int apic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
>  			val |= APIC_LVT_MASKED;
>  
>  		val &= apic_lvt_mask[(reg - APIC_LVTT) >> 4];
> -		apic_set_reg(apic, reg, val);
> +		kvm_lapic_set_reg(apic, reg, val);
>  
>  		break;
>  
> @@ -1438,7 +1441,7 @@ static int apic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
>  		if (!kvm_apic_sw_enabled(apic))
>  			val |= APIC_LVT_MASKED;
>  		val &= (apic_lvt_mask[0] | apic->lapic_timer.timer_mode_mask);
> -		apic_set_reg(apic, APIC_LVTT, val);
> +		kvm_lapic_set_reg(apic, APIC_LVTT, val);
>  		apic_update_lvtt(apic);
>  		break;
>  
> @@ -1447,14 +1450,14 @@ static int apic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
>  			break;
>  
>  		hrtimer_cancel(&apic->lapic_timer.timer);
> -		apic_set_reg(apic, APIC_TMICT, val);
> +		kvm_lapic_set_reg(apic, APIC_TMICT, val);
>  		start_apic_timer(apic);
>  		break;
>  
>  	case APIC_TDCR:
>  		if (val & 4)
>  			apic_debug("KVM_WRITE:TDCR %x\n", val);
> -		apic_set_reg(apic, APIC_TDCR, val);
> +		kvm_lapic_set_reg(apic, APIC_TDCR, val);
>  		update_divide_count(apic);
>  		break;
>  
> @@ -1467,7 +1470,7 @@ static int apic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
>  
>  	case APIC_SELF_IPI:
>  		if (apic_x2apic_mode(apic)) {
> -			apic_reg_write(apic, APIC_ICR, 0x40000 | (val & 0xff));
> +			kvm_lapic_reg_write(apic, APIC_ICR, 0x40000 | (val & 0xff));
>  		} else
>  			ret = 1;
>  		break;
> @@ -1479,6 +1482,7 @@ static int apic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
>  		apic_debug("Local APIC Write to read-only register %x\n", reg);
>  	return ret;
>  }
> +EXPORT_SYMBOL_GPL(kvm_lapic_reg_write);
>  
>  static int apic_mmio_write(struct kvm_vcpu *vcpu, struct kvm_io_device *this,
>  			    gpa_t address, int len, const void *data)
> @@ -1508,7 +1512,7 @@ static int apic_mmio_write(struct kvm_vcpu *vcpu, struct kvm_io_device *this,
>  		apic_debug("%s: offset 0x%x with length 0x%x, and value is "
>  			   "0x%x\n", __func__, offset, len, val);
>  
> -	apic_reg_write(apic, offset & 0xff0, val);
> +	kvm_lapic_reg_write(apic, offset & 0xff0, val);
>  
>  	return 0;
>  }
> @@ -1516,7 +1520,7 @@ static int apic_mmio_write(struct kvm_vcpu *vcpu, struct kvm_io_device *this,
>  void kvm_lapic_set_eoi(struct kvm_vcpu *vcpu)
>  {
>  	if (kvm_vcpu_has_lapic(vcpu))
> -		apic_reg_write(vcpu->arch.apic, APIC_EOI, 0);
> +		kvm_lapic_reg_write(vcpu->arch.apic, APIC_EOI, 0);
>  }
>  EXPORT_SYMBOL_GPL(kvm_lapic_set_eoi);
>  
> @@ -1528,10 +1532,10 @@ void kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset)
>  	/* hw has done the conditional check and inst decode */
>  	offset &= 0xff0;
>  
> -	apic_reg_read(vcpu->arch.apic, offset, 4, &val);
> +	kvm_lapic_reg_read(vcpu->arch.apic, offset, 4, &val);
>  
>  	/* TODO: optimize to just emulate side effect w/o one more write */
> -	apic_reg_write(vcpu->arch.apic, offset, val);
> +	kvm_lapic_reg_write(vcpu->arch.apic, offset, val);
>  }
>  EXPORT_SYMBOL_GPL(kvm_apic_write_nodecode);
>  
> @@ -1670,28 +1674,28 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event)
>  		kvm_apic_set_id(apic, vcpu->vcpu_id);
>  	kvm_apic_set_version(apic->vcpu);
>  
> -	for (i = 0; i < APIC_LVT_NUM; i++)
> -		apic_set_reg(apic, APIC_LVTT + 0x10 * i, APIC_LVT_MASKED);
> +	for (i = 0; i < KVM_APIC_LVT_NUM; i++)
> +		kvm_lapic_set_reg(apic, APIC_LVTT + 0x10 * i, APIC_LVT_MASKED);
>  	apic_update_lvtt(apic);
>  	if (kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_LINT0_REENABLED))
> -		apic_set_reg(apic, APIC_LVT0,
> +		kvm_lapic_set_reg(apic, APIC_LVT0,
>  			     SET_APIC_DELIVERY_MODE(0, APIC_MODE_EXTINT));
>  	apic_manage_nmi_watchdog(apic, kvm_apic_get_reg(apic, APIC_LVT0));
>  
> -	apic_set_reg(apic, APIC_DFR, 0xffffffffU);
> +	kvm_lapic_set_reg(apic, APIC_DFR, 0xffffffffU);
>  	apic_set_spiv(apic, 0xff);
> -	apic_set_reg(apic, APIC_TASKPRI, 0);
> +	kvm_lapic_set_reg(apic, APIC_TASKPRI, 0);
>  	if (!apic_x2apic_mode(apic))
>  		kvm_apic_set_ldr(apic, 0);
> -	apic_set_reg(apic, APIC_ESR, 0);
> -	apic_set_reg(apic, APIC_ICR, 0);
> -	apic_set_reg(apic, APIC_ICR2, 0);
> -	apic_set_reg(apic, APIC_TDCR, 0);
> -	apic_set_reg(apic, APIC_TMICT, 0);
> +	kvm_lapic_set_reg(apic, APIC_ESR, 0);
> +	kvm_lapic_set_reg(apic, APIC_ICR, 0);
> +	kvm_lapic_set_reg(apic, APIC_ICR2, 0);
> +	kvm_lapic_set_reg(apic, APIC_TDCR, 0);
> +	kvm_lapic_set_reg(apic, APIC_TMICT, 0);
>  	for (i = 0; i < 8; i++) {
> -		apic_set_reg(apic, APIC_IRR + 0x10 * i, 0);
> -		apic_set_reg(apic, APIC_ISR + 0x10 * i, 0);
> -		apic_set_reg(apic, APIC_TMR + 0x10 * i, 0);
> +		kvm_lapic_set_reg(apic, APIC_IRR + 0x10 * i, 0);
> +		kvm_lapic_set_reg(apic, APIC_ISR + 0x10 * i, 0);
> +		kvm_lapic_set_reg(apic, APIC_TMR + 0x10 * i, 0);
>  	}
>  	apic->irr_pending = vcpu->arch.apicv_active;
>  	apic->isr_count = vcpu->arch.apicv_active ? 1 : 0;
> @@ -2073,8 +2077,8 @@ int kvm_x2apic_msr_write(struct kvm_vcpu *vcpu, u32 msr, u64 data)
>  
>  	/* if this is ICR write vector before command */
>  	if (reg == APIC_ICR)
> -		apic_reg_write(apic, APIC_ICR2, (u32)(data >> 32));
> -	return apic_reg_write(apic, reg, (u32)data);
> +		kvm_lapic_reg_write(apic, APIC_ICR2, (u32)(data >> 32));
> +	return kvm_lapic_reg_write(apic, reg, (u32)data);
>  }
>  
>  int kvm_x2apic_msr_read(struct kvm_vcpu *vcpu, u32 msr, u64 *data)
> @@ -2091,10 +2095,10 @@ int kvm_x2apic_msr_read(struct kvm_vcpu *vcpu, u32 msr, u64 *data)
>  		return 1;
>  	}
>  
> -	if (apic_reg_read(apic, reg, 4, &low))
> +	if (kvm_lapic_reg_read(apic, reg, 4, &low))
>  		return 1;
>  	if (reg == APIC_ICR)
> -		apic_reg_read(apic, APIC_ICR2, 4, &high);
> +		kvm_lapic_reg_read(apic, APIC_ICR2, 4, &high);
>  
>  	*data = (((u64)high) << 32) | low;
>  
> @@ -2110,8 +2114,8 @@ int kvm_hv_vapic_msr_write(struct kvm_vcpu *vcpu, u32 reg, u64 data)
>  
>  	/* if this is ICR write vector before command */
>  	if (reg == APIC_ICR)
> -		apic_reg_write(apic, APIC_ICR2, (u32)(data >> 32));
> -	return apic_reg_write(apic, reg, (u32)data);
> +		kvm_lapic_reg_write(apic, APIC_ICR2, (u32)(data >> 32));
> +	return kvm_lapic_reg_write(apic, reg, (u32)data);
>  }
>  
>  int kvm_hv_vapic_msr_read(struct kvm_vcpu *vcpu, u32 reg, u64 *data)
> @@ -2122,10 +2126,10 @@ int kvm_hv_vapic_msr_read(struct kvm_vcpu *vcpu, u32 reg, u64 *data)
>  	if (!kvm_vcpu_has_lapic(vcpu))
>  		return 1;
>  
> -	if (apic_reg_read(apic, reg, 4, &low))
> +	if (kvm_lapic_reg_read(apic, reg, 4, &low))
>  		return 1;
>  	if (reg == APIC_ICR)
> -		apic_reg_read(apic, APIC_ICR2, 4, &high);
> +		kvm_lapic_reg_read(apic, APIC_ICR2, 4, &high);
>  
>  	*data = (((u64)high) << 32) | low;
>  
> diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
> index 41bdb35..7bf8184 100644
> --- a/arch/x86/kvm/lapic.h
> +++ b/arch/x86/kvm/lapic.h
> @@ -7,6 +7,7 @@
>  
>  #define KVM_APIC_INIT		0
>  #define KVM_APIC_SIPI		1
> +#define KVM_APIC_LVT_NUM	6
>  
>  struct kvm_timer {
>  	struct hrtimer timer;
> @@ -56,6 +57,12 @@ void kvm_lapic_set_eoi(struct kvm_vcpu *vcpu);
>  void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value);
>  u64 kvm_lapic_get_base(struct kvm_vcpu *vcpu);
>  void kvm_apic_set_version(struct kvm_vcpu *vcpu);
> +void kvm_lapic_set_vector(int vec, void *bitmap);
> +int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val);
> +int kvm_lapic_reg_read(struct kvm_lapic *apic, u32 offset, int len,
> +		       void *data);
> +bool kvm_apic_match_dest(struct kvm_vcpu *vcpu, struct kvm_lapic *source,
> +			   int short_hand, unsigned int dest, int dest_mode);
>  
>  void __kvm_apic_update_irr(u32 *pir, void *regs);
>  void kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir);
> @@ -96,6 +103,7 @@ static inline bool kvm_hv_vapic_assist_page_enabled(struct kvm_vcpu *vcpu)
>  int kvm_lapic_enable_pv_eoi(struct kvm_vcpu *vcpu, u64 data);
>  void kvm_lapic_init(void);
>  
> +void kvm_lapic_set_reg(struct kvm_lapic *apic, int reg_off, u32 val);
>  static inline u32 kvm_apic_get_reg(struct kvm_lapic *apic, int reg_off)
>  {
>  	        return *((u32 *) (apic->regs + reg_off));
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PART1 RFC v3 06/12] KVM: x86: Detect and Initialize AVIC support
  2016-03-18  6:09 ` [PART1 RFC v3 06/12] KVM: x86: Detect and Initialize AVIC support Suravee Suthikulpanit
@ 2016-03-18 11:21   ` Paolo Bonzini
  0 siblings, 0 replies; 44+ messages in thread
From: Paolo Bonzini @ 2016-03-18 11:21 UTC (permalink / raw)
  To: Suravee Suthikulpanit, rkrcmar, joro, bp, gleb, alex.williamson
  Cc: kvm, linux-kernel, wei, sherry.hurwitz



On 18/03/2016 07:09, Suravee Suthikulpanit wrote:
> This patch introduces AVIC-related data structure, and AVIC
> initialization code.
> 
> There are three main data structures for AVIC:
>     * Virtual APIC (vAPIC) backing page (per-VCPU)
>     * Physical APIC ID table (per-VM)
>     * Logical APIC ID table (per-VM)
> 
> Currently, AVIC is disabled by default. Users can manually
> enable AVIC via kernel boot option kvm-amd.avic=1 or during
> kvm-amd module loading with parameter avic=1.
> 
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> ---
>  arch/x86/include/asm/kvm_host.h |   8 ++
>  arch/x86/include/asm/svm.h      |   3 +
>  arch/x86/kvm/svm.c              | 232 +++++++++++++++++++++++++++++++++++++++-
>  3 files changed, 242 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 87eac2a..2698e40 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -754,6 +754,14 @@ struct kvm_arch {
>  
>  	bool irqchip_split;
>  	u8 nr_reserved_ioapic_pins;
> +
> +	/* Struct members for AVIC */
> +	struct hlist_node hnode;

Unused, please remove.

> +	struct kvm *kvm;

Not needed, please use container_of instead like

	struct kvm_arch *ka;
	...
	struct kvm *kvm = container_of(ka, struct kvm, arch)

> +	u32 ldr_mode;
> +	u32 avic_tag;

Where is avic_tag used?

> +	struct page *avic_log_apic_id_table_page;

Remove apic_ if you want to keep the variable name shorter, but please
spell logical and physical in full.

> +	struct page *avic_phy_apic_id_table_page;
>  };
>  
>  struct kvm_vm_stat {
> diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
> index 66e26a0..287e635 100644
> --- a/arch/x86/include/asm/svm.h
> +++ b/arch/x86/include/asm/svm.h
> @@ -116,6 +116,9 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
>  #define V_INTR_MASKING_SHIFT 24
>  #define V_INTR_MASKING_MASK (1 << V_INTR_MASKING_SHIFT)
>  
> +#define AVIC_ENABLE_SHIFT 31
> +#define AVIC_ENABLE_MASK (1 << AVIC_ENABLE_SHIFT)
> +
>  #define SVM_INTERRUPT_SHADOW_MASK 1
>  
>  #define SVM_IOIO_STR_SHIFT 2
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index c13a64b..621c948 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -14,6 +14,9 @@
>   * the COPYING file in the top-level directory.
>   *
>   */
> +
> +#define pr_fmt(fmt) "SVM: " fmt
> +
>  #include <linux/kvm_host.h>
>  
>  #include "irq.h"
> @@ -78,6 +81,11 @@ MODULE_DEVICE_TABLE(x86cpu, svm_cpu_id);
>  #define TSC_RATIO_MIN		0x0000000000000001ULL
>  #define TSC_RATIO_MAX		0x000000ffffffffffULL
>  
> +#define AVIC_HPA_MASK	~((0xFFFULL << 52) || 0xFFF)
> +
> +/* NOTE: Current max index allowed for physical APIC ID table is 255 */
> +#define AVIC_PHY_APIC_ID_MAX	0xFF
> +
>  static bool erratum_383_found __read_mostly;
>  
>  static const u32 host_save_user_msrs[] = {
> @@ -162,8 +170,20 @@ struct vcpu_svm {
>  
>  	/* cached guest cpuid flags for faster access */
>  	bool nrips_enabled	: 1;
> +
> +	struct page *avic_backing_page;
> +	u64 *avic_phy_apic_id_cache;

Spelling same as above.

> +	bool avic_was_running;
>  };
>  
> +#define AVIC_LOG_APIC_ID__GUEST_PHY_APIC_ID_MSK	(0xFF)

MSK->MASK
LOG_APIC->LOGICAL
PHY_APIC->PHYSICAL

> +#define AVIC_LOG_APIC_ID__VALID_MSK		(1 << 31)
> +
> +#define AVIC_PHY_APIC_ID__HOST_PHY_APIC_ID_MSK	(0xFFULL)
> +#define AVIC_PHY_APIC_ID__BACKING_PAGE_MSK	(0xFFFFFFFFFFULL << 12)
> +#define AVIC_PHY_APIC_ID__IS_RUN_MSK		(1ULL << 62)

IS_RUN->IS_RUNNING

> +#define AVIC_PHY_APIC_ID__VALID_MSK		(1ULL << 63)
> +
>  static DEFINE_PER_CPU(u64, current_tsc_ratio);
>  #define TSC_RATIO_DEFAULT	0x0100000000ULL
>  
> @@ -205,6 +225,10 @@ module_param(npt, int, S_IRUGO);
>  static int nested = true;
>  module_param(nested, int, S_IRUGO);
>  
> +/* enable / disable AVIC */
> +static int avic;
> +module_param(avic, int, S_IRUGO);
> +
>  static void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0);
>  static void svm_flush_tlb(struct kvm_vcpu *vcpu);
>  static void svm_complete_interrupts(struct vcpu_svm *svm);
> @@ -234,6 +258,18 @@ enum {
>  /* TPR and CR2 are always written before VMRUN */
>  #define VMCB_ALWAYS_DIRTY_MASK	((1U << VMCB_INTR) | (1U << VMCB_CR2))
>  
> +#define VMCB_AVIC_APIC_BAR_MASK		0xFFFFFFFFFF000ULL
> +
> +static inline bool svm_vcpu_avic_enabled(struct vcpu_svm *svm)
> +{
> +	return (avic && (svm->vmcb->control.int_ctl & AVIC_ENABLE_MASK));
> +}
> +
> +static inline void avic_update_vapic_bar(struct vcpu_svm *svm, u64 data)
> +{
> +	svm->vmcb->control.avic_vapic_bar = data & VMCB_AVIC_APIC_BAR_MASK;
> +}
> +
>  static inline void mark_all_dirty(struct vmcb *vmcb)
>  {
>  	vmcb->control.clean = 0;
> @@ -923,6 +959,13 @@ static __init int svm_hardware_setup(void)
>  	} else
>  		kvm_disable_tdp();
>  
> +	if (avic && (!npt_enabled || !boot_cpu_has(X86_FEATURE_AVIC)))
> +		avic = false;
> +
> +	if (avic) {
> +		printk(KERN_INFO "kvm: AVIC enabled\n");
> +	}
> +
>  	return 0;
>  
>  err:
> @@ -1000,6 +1043,24 @@ static void svm_adjust_tsc_offset_guest(struct kvm_vcpu *vcpu, s64 adjustment)
>  	mark_dirty(svm->vmcb, VMCB_INTERCEPTS);
>  }
>  
> +static void avic_init_vmcb(struct vcpu_svm *svm)
> +{
> +	struct vmcb *vmcb = svm->vmcb;
> +	struct kvm_arch *vm_data = &svm->vcpu.kvm->arch;
> +	phys_addr_t bpa = page_to_phys(svm->avic_backing_page);
> +	phys_addr_t lpa = page_to_phys(vm_data->avic_log_apic_id_table_page);
> +	phys_addr_t ppa = page_to_phys(vm_data->avic_phy_apic_id_table_page);
> +
> +	if (!vmcb)
> +		return;
> +
> +	vmcb->control.avic_backing_page = bpa & AVIC_HPA_MASK;
> +	vmcb->control.avic_log_apic_id = lpa & AVIC_HPA_MASK;
> +	vmcb->control.avic_phy_apic_id = ppa & AVIC_HPA_MASK;
> +	vmcb->control.avic_phy_apic_id |= AVIC_PHY_APIC_ID_MAX;
> +	vmcb->control.int_ctl |= AVIC_ENABLE_MASK;
> +}
> +
>  static void init_vmcb(struct vcpu_svm *svm)
>  {
>  	struct vmcb_control_area *control = &svm->vmcb->control;
> @@ -1113,6 +1174,139 @@ static void init_vmcb(struct vcpu_svm *svm)
>  	mark_all_dirty(svm->vmcb);
>  
>  	enable_gif(svm);
> +
> +	if (avic)
> +		avic_init_vmcb(svm);
> +}
> +
> +static u64 *avic_get_phy_apic_id_entry(struct kvm_vcpu *vcpu, int index)
> +{
> +	u64 *avic_phy_ait;

u64 *avic_physical_id_table;

> +	struct kvm_arch *vm_data = &vcpu->kvm->arch;
> +
> +	/* Note: APIC ID = 0xff is used for broadcast.
> +	 *       APIC ID > 0xff is reserved.
> +	 */
> +	if (index >= 0xff)
> +		return NULL;
> +
> +	avic_phy_ait = page_address(vm_data->avic_phy_apic_id_table_page);
> +
> +	return &avic_phy_ait[index];
> +}
> +
> +static int avic_init_backing_page(struct kvm_vcpu *vcpu)
> +{
> +	u64 *entry, new_entry;
> +	int id = vcpu->vcpu_id;
> +	struct vcpu_svm *svm = to_svm(vcpu);
> +
> +	if (id >= AVIC_PHY_APIC_ID_MAX)
> +		return -EINVAL;
> +
> +	if (!svm->vcpu.arch.apic->regs)
> +		return -EINVAL;
> +
> +	svm->avic_backing_page = virt_to_page(svm->vcpu.arch.apic->regs);
> +
> +	avic_init_vmcb(svm);
> +
> +	/* Setting AVIC backing page address in the phy APIC ID table */
> +	entry = avic_get_phy_apic_id_entry(vcpu, id);
> +	if (!entry)
> +		return -EINVAL;
> +
> +	new_entry = READ_ONCE(*entry);
> +	new_entry = (page_to_phys(svm->avic_backing_page) &
> +		     AVIC_PHY_APIC_ID__BACKING_PAGE_MSK) |
> +		     AVIC_PHY_APIC_ID__VALID_MSK;
> +	WRITE_ONCE(*entry, new_entry);
> +
> +	svm->avic_phy_apic_id_cache = entry;
> +
> +	return 0;
> +}
> +
> +static void avic_vm_uninit(struct kvm *kvm)
> +{
> +	unsigned long flags;
> +	struct kvm_arch *vm_data = &kvm->arch;
> +
> +	if (vm_data->avic_log_apic_id_table_page)
> +		__free_page(vm_data->avic_log_apic_id_table_page);
> +	if (vm_data->avic_phy_apic_id_table_page)
> +		__free_page(vm_data->avic_phy_apic_id_table_page);
> +}
> +
> +static void avic_vcpu_uninit(struct kvm_vcpu *vcpu)
> +{
> +	struct vcpu_svm *svm = to_svm(vcpu);
> +
> +	if (!avic)
> +		return;
> +
> +	if (svm->avic_phy_apic_id_cache)
> +		svm->avic_phy_apic_id_cache = NULL;
> +}
> +
> +static atomic_t avic_tag_gen = ATOMIC_INIT(1);
> +
> +static inline u32 avic_get_next_tag(void)
> +{
> +	u32 tag = atomic_read(&avic_tag_gen);
> +
> +	atomic_inc(&avic_tag_gen);
> +	return tag;

This should be atomic_inc_return, otherwise it's not thread-safe.

> +}
> +
> +static int avic_vm_init(struct kvm *kvm)
> +{
> +	int err = -ENOMEM;
> +	struct kvm_arch *vm_data = &kvm->arch;
> +	struct page *p_page;
> +	struct page *l_page;
> +
> +	/**
> +	 * Note:
> +	 * AVIC hardware walks the nested page table to check permissions,
> +	 * but does not use the SPA address specified in the leaf page
> +	 * table entry since it uses  address in the AVIC_BACKING_PAGE pointer
> +	 * field of the VMCB. Therefore, we set up the
> +	 * APIC_ACCESS_PAGE_PRIVATE_MEMSLOT (4KB) here.
> +	 */
> +	mutex_lock(&kvm->slots_lock);
> +	err = __x86_set_memory_region(kvm,
> +				      APIC_ACCESS_PAGE_PRIVATE_MEMSLOT,
> +				      APIC_DEFAULT_PHYS_BASE,
> +				      PAGE_SIZE);
> +	mutex_unlock(&kvm->slots_lock);

Please use x86_set_memory_region, since you don't have to do anything
else while the lock is taken.

Paolo

> +	if (err)
> +		goto free_avic;
> +	kvm->arch.apic_access_page_done = true;
> +
> +	/* Allocating physical APIC ID table (4KB) */
> +	p_page = alloc_page(GFP_KERNEL);
> +	if (!p_page)
> +		goto free_avic;
> +
> +	vm_data->avic_phy_apic_id_table_page = p_page;
> +	clear_page(page_address(p_page));
> +
> +	/* Allocating logical APIC ID table (4KB) */
> +	l_page = alloc_page(GFP_KERNEL);
> +	if (!l_page)
> +		goto free_avic;
> +
> +	vm_data->avic_log_apic_id_table_page = l_page;
> +	clear_page(page_address(l_page));
> +
> +	vm_data->avic_tag = avic_get_next_tag();
> +
> +	return 0;
> +
> +free_avic:
> +	avic_vm_uninit(kvm);
> +	return err;
>  }
>  
>  static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
> @@ -1131,6 +1325,9 @@ static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
>  
>  	kvm_cpuid(vcpu, &eax, &dummy, &dummy, &dummy);
>  	kvm_register_write(vcpu, VCPU_REGS_RDX, eax);
> +
> +	if (svm_vcpu_avic_enabled(svm) && !init_event)
> +		avic_update_vapic_bar(svm, APIC_DEFAULT_PHYS_BASE);
>  }
>  
>  static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id)
> @@ -1169,6 +1366,14 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id)
>  	if (!hsave_page)
>  		goto free_page3;
>  
> +	if (avic) {
> +		err = avic_init_backing_page(&svm->vcpu);
> +		if (err) {
> +			avic_vcpu_uninit(&svm->vcpu);
> +			goto free_page4;
> +		}
> +	}
> +
>  	svm->nested.hsave = page_address(hsave_page);
>  
>  	svm->msrpm = page_address(msrpm_pages);
> @@ -1187,6 +1392,8 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id)
>  
>  	return &svm->vcpu;
>  
> +free_page4:
> +	__free_page(hsave_page);
>  free_page3:
>  	__free_pages(nested_msrpm_pages, MSRPM_ALLOC_ORDER);
>  free_page2:
> @@ -1209,6 +1416,7 @@ static void svm_free_vcpu(struct kvm_vcpu *vcpu)
>  	__free_pages(virt_to_page(svm->msrpm), MSRPM_ALLOC_ORDER);
>  	__free_page(virt_to_page(svm->nested.hsave));
>  	__free_pages(virt_to_page(svm->nested.msrpm), MSRPM_ALLOC_ORDER);
> +	avic_vcpu_uninit(vcpu);
>  	kvm_vcpu_uninit(vcpu);
>  	kmem_cache_free(kvm_vcpu_cache, svm);
>  }
> @@ -3371,6 +3579,7 @@ static void dump_vmcb(struct kvm_vcpu *vcpu)
>  	pr_err("%-20s%08x\n", "exit_int_info_err:", control->exit_int_info_err);
>  	pr_err("%-20s%lld\n", "nested_ctl:", control->nested_ctl);
>  	pr_err("%-20s%016llx\n", "nested_cr3:", control->nested_cr3);
> +	pr_err("%-20s%016llx\n", "avic_vapic_bar:", control->avic_vapic_bar);
>  	pr_err("%-20s%08x\n", "event_inj:", control->event_inj);
>  	pr_err("%-20s%08x\n", "event_inj_err:", control->event_inj_err);
>  	pr_err("%-20s%lld\n", "lbr_ctl:", control->lbr_ctl);
> @@ -3602,11 +3811,27 @@ static void svm_set_virtual_x2apic_mode(struct kvm_vcpu *vcpu, bool set)
>  
>  static bool svm_get_enable_apicv(void)
>  {
> -	return false;
> +	return avic;
>  }
>  
> +static void svm_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr)
> +{
> +}
> +
> +static void svm_hwapic_isr_update(struct kvm *kvm, int isr)
> +{
> +}
> +
> +/* Note: Currently only used by Hyper-V. */
>  static void svm_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
>  {
> +	struct vcpu_svm *svm = to_svm(vcpu);
> +	struct vmcb *vmcb = svm->vmcb;
> +
> +	if (!avic)
> +		return;
> +
> +	vmcb->control.int_ctl &= ~AVIC_ENABLE_MASK;
>  }
>  
>  static void svm_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)
> @@ -4318,6 +4543,9 @@ static struct kvm_x86_ops svm_x86_ops = {
>  	.vcpu_free = svm_free_vcpu,
>  	.vcpu_reset = svm_vcpu_reset,
>  
> +	.vm_init = avic_vm_init,
> +	.vm_uninit = avic_vm_uninit,
> +
>  	.prepare_guest_switch = svm_prepare_guest_switch,
>  	.vcpu_load = svm_vcpu_load,
>  	.vcpu_put = svm_vcpu_put,
> @@ -4375,6 +4603,8 @@ static struct kvm_x86_ops svm_x86_ops = {
>  	.refresh_apicv_exec_ctrl = svm_refresh_apicv_exec_ctrl,
>  	.load_eoi_exitmap = svm_load_eoi_exitmap,
>  	.sync_pir_to_irr = svm_sync_pir_to_irr,
> +	.hwapic_irr_update = svm_hwapic_irr_update,
> +	.hwapic_isr_update = svm_hwapic_isr_update,
>  
>  	.set_tss_addr = svm_set_tss_addr,
>  	.get_tdp_level = get_npt_level,
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PART1 RFC v3 10/12] svm: Do not expose x2APIC when enable AVIC
  2016-03-18  6:09 ` [PART1 RFC v3 10/12] svm: Do not expose x2APIC when enable AVIC Suravee Suthikulpanit
@ 2016-03-18 20:59   ` Radim Krčmář
  2016-03-31  4:15     ` Suravee Suthikulpanit
  0 siblings, 1 reply; 44+ messages in thread
From: Radim Krčmář @ 2016-03-18 20:59 UTC (permalink / raw)
  To: Suravee Suthikulpanit
  Cc: pbonzini, joro, bp, gleb, alex.williamson, kvm, linux-kernel,
	wei, sherry.hurwitz

2016-03-18 01:09-0500, Suravee Suthikulpanit:
> From: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> 
> Since AVIC only virtualizes xAPIC hardware for the guest, we need to:
>     * Intercept APIC BAR msr accesses to disable x2APIC
>     * Intercept CPUID access to not advertise x2APIC support
>     * Hide x2APIC support when checking via KVM ioctl
> 
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> ---
>  arch/x86/kvm/svm.c | 49 ++++++++++++++++++++++++++++++++++++++++++-------
>  1 file changed, 42 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 6303147..ba84d57 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -209,6 +209,7 @@ static const struct svm_direct_access_msrs {
>  	{ .index = MSR_IA32_LASTBRANCHTOIP,		.always = false },
>  	{ .index = MSR_IA32_LASTINTFROMIP,		.always = false },
>  	{ .index = MSR_IA32_LASTINTTOIP,		.always = false },
> +	{ .index = MSR_IA32_APICBASE,			.always = false },
>  	{ .index = MSR_INVALID,				.always = false },
>  };
>  
> @@ -853,6 +854,9 @@ static void svm_vcpu_init_msrpm(u32 *msrpm)
>  
>  		set_msr_interception(msrpm, direct_access_msrs[i].index, 1, 1);
>  	}
> +
> +	if (svm_vcpu_avic_enabled(svm))
> +		set_msr_interception(msrpm, MSR_IA32_APICBASE, 1, 1);

AVIC really won't exit on writes to MSR_IA32_APICBASE otherwise?

> @@ -3308,6 +3312,18 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  			msr_info->data = 0x1E;
>  		}
>  		break;
> +	case MSR_IA32_APICBASE:
> +		if (svm_vcpu_avic_enabled(svm)) {
> +			/* Note:
> +			 * For AVIC, we need to disable X2APIC
> +			 * and enable XAPIC
> +			 */
> +			kvm_get_msr_common(vcpu, msr_info);
> +			msr_info->data &= ~X2APIC_ENABLE;
> +			msr_info->data |= XAPIC_ENABLE;
> +			break;

No.  This won't make the guest switch to xAPIC.
x2APIC can only be enabled if CPUID has that flag and it's impossible to
toggle that CPUID flag it during runtime.

> +		}
> +		/* Follow through if not AVIC */
>  	default:
>  		return kvm_get_msr_common(vcpu, msr_info);
>  	}
> @@ -3436,6 +3452,10 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
>  	case MSR_VM_IGNNE:
>  		vcpu_unimpl(vcpu, "unimplemented wrmsr: 0x%x data 0x%llx\n", ecx, data);
>  		break;
> +	case MSR_IA32_APICBASE:
> +		if (svm_vcpu_avic_enabled(svm))
> +			avic_update_vapic_bar(to_svm(vcpu), data);

There is no connection to x2APIC, please do it in a different patch.

> +		/* Follow through */
>  	default:
>  		return kvm_set_msr_common(vcpu, msr);
>  	}
> @@ -4554,11 +4574,26 @@ static void svm_cpuid_update(struct kvm_vcpu *vcpu)
>  
>  	/* Update nrips enabled cache */
>  	svm->nrips_enabled = !!guest_cpuid_has_nrips(&svm->vcpu);
> +
> +	/* Do not support X2APIC when enable AVIC */
> +	if (svm_vcpu_avic_enabled(svm)) {
> +		int i;
> +
> +		for (i = 0 ; i < vcpu->arch.cpuid_nent ; i++) {
> +			if (vcpu->arch.cpuid_entries[i].function == 1)

Please use kvm_find_cpuid_entry for the search.

> +				vcpu->arch.cpuid_entries[i].ecx &= ~(1 << 21);

and X86_FEATURE_X2APIC (or something with X2APIC in name) for the bit.

The code will become so obvious that the comment can be removed. :)

> +		}
> +	}
>  }
>  
>  static void svm_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 *entry)
>  {
>  	switch (func) {
> +	case 0x00000001:
> +		/* Do not support X2APIC when enable AVIC */
> +		if (avic)
> +			entry->ecx &= ~(1 << 21);

I think this might be the right place for the code you have in
svm_cpuid_update.

Btw. how does x2APIC behave under AVIC?
We definitely shouldn't recommend/expose x2APIC with AVIC as AVIC
doesn't accelerate x2APIC guest-facing interface, but the MSR interface
is going to exit and host-side interrupt delivery will probably still
work, so I don't see a huge problem with it.

> +		break;
>  	case 0x80000001:
>  		if (nested)
>  			entry->ecx |= (1 << 2); /* Set SVM bit */
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PART1 RFC v3 11/12] svm: Do not intercept CR8 when enable AVIC
  2016-03-18  6:09 ` [PART1 RFC v3 11/12] svm: Do not intercept CR8 " Suravee Suthikulpanit
@ 2016-03-18 21:10   ` Radim Krčmář
  2016-03-30 12:15     ` Suravee Suthikulpanit
  0 siblings, 1 reply; 44+ messages in thread
From: Radim Krčmář @ 2016-03-18 21:10 UTC (permalink / raw)
  To: Suravee Suthikulpanit
  Cc: pbonzini, joro, bp, gleb, alex.williamson, kvm, linux-kernel,
	wei, sherry.hurwitz

2016-03-18 01:09-0500, Suravee Suthikulpanit:
> When enable AVIC:
>     * Do not intercept CR8 since this should be handled by AVIC HW.
>     * Also update TPR in APIC backing page when syncing CR8 before VMRUN
> 
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> ---
>  arch/x86/kvm/svm.c | 15 +++++++++++----
>  1 file changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index ba84d57..d5418c3 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -984,6 +984,8 @@ static __init int svm_hardware_setup(void)
>  
>  	if (avic) {
>  		printk(KERN_INFO "kvm: AVIC enabled\n");
> +
> +		svm_x86_ops.update_cr8_intercept = NULL;

This doesn't look right.

>  	} else {
>  		svm_x86_ops.deliver_posted_interrupt = NULL;
>  	}
> @@ -1097,7 +1099,8 @@ static void init_vmcb(struct vcpu_svm *svm)
>  	set_cr_intercept(svm, INTERCEPT_CR0_WRITE);
>  	set_cr_intercept(svm, INTERCEPT_CR3_WRITE);
>  	set_cr_intercept(svm, INTERCEPT_CR4_WRITE);
> -	set_cr_intercept(svm, INTERCEPT_CR8_WRITE);
> +	if (!svm_vcpu_avic_enabled(svm))
> +		set_cr_intercept(svm, INTERCEPT_CR8_WRITE);
>  
>  	set_dr_intercepts(svm);
>  
> @@ -4080,7 +4083,8 @@ static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
>  
> -	if (is_guest_mode(vcpu) && (vcpu->arch.hflags & HF_VINTR_MASK))
> +	if ((is_guest_mode(vcpu) && (vcpu->arch.hflags & HF_VINTR_MASK)) ||
> +	     svm_vcpu_avic_enabled(svm))
>  		return;
>  
>  	clr_cr_intercept(svm, INTERCEPT_CR8_WRITE);
> @@ -4271,9 +4275,12 @@ static inline void sync_lapic_to_cr8(struct kvm_vcpu *vcpu)
>  	if (is_guest_mode(vcpu) && (vcpu->arch.hflags & HF_VINTR_MASK))
>  		return;

I think we can exit early with svm_vcpu_avic_enabled().

>  
> -	cr8 = kvm_get_cr8(vcpu);
> +	cr8 = kvm_get_cr8(vcpu) & V_TPR_MASK;
>  	svm->vmcb->control.int_ctl &= ~V_TPR_MASK;
> -	svm->vmcb->control.int_ctl |= cr8 & V_TPR_MASK;
> +	svm->vmcb->control.int_ctl |= cr8;
> +
> +	if (svm_vcpu_avic_enabled(svm))
> +		kvm_lapic_set_reg(svm->vcpu.arch.apic, APIC_TASKPRI, (u32)cr8 << 4);

kvm_get_cr8() == kvm_lapic_get_reg(APIC_TASKPRI) >> 4.

>  }
>  
>  static void svm_complete_interrupts(struct vcpu_svm *svm)
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PART1 RFC v3 12/12] svm: Manage vcpu load/unload when enable AVIC
  2016-03-18  6:09 ` [PART1 RFC v3 12/12] svm: Manage vcpu load/unload " Suravee Suthikulpanit
@ 2016-03-18 21:44   ` Radim Krčmář
  2016-03-31  8:52     ` Suravee Suthikulpanit
  0 siblings, 1 reply; 44+ messages in thread
From: Radim Krčmář @ 2016-03-18 21:44 UTC (permalink / raw)
  To: Suravee Suthikulpanit
  Cc: pbonzini, joro, bp, gleb, alex.williamson, kvm, linux-kernel,
	wei, sherry.hurwitz

2016-03-18 01:09-0500, Suravee Suthikulpanit:
> From: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> 
> When a vcpu is loaded/unloaded to a physical core, we need to update
> host physical APIC ID information in the Physical APIC-ID table
> accordingly.
> 
> Also, when vCPU is blocking/un-blocking (due to halt instruction),
> we need to make sure that the is-running bit in set accordingly in the
> physical APIC-ID table.
> 
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> ---
>  arch/x86/kvm/svm.c | 121 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 121 insertions(+)
> 
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> +static int avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu, bool is_load)
> +{
> +	int h_phy_apic_id;

(Paolo said a lot about those names.)

> +	u64 *entry, new_entry;
> +	struct vcpu_svm *svm = to_svm(vcpu);
> +	int ret = 0;
> +	if (!svm_vcpu_avic_enabled(svm))
> +		return 0;

(The check for NULL below feels weird when it has already been used.)

> +
> +	if (!svm)
> +		return -EINVAL;

!svm means that KVM completely blew up ... don't check for it.
(See implementation of to_svm.)

> +
> +	/* Note: APIC ID = 0xff is used for broadcast.
> +	 *       APIC ID > 0xff is reserved.
> +	 */
> +	h_phy_apic_id = __default_cpu_present_to_apicid(cpu);
> +
> +	if (h_phy_apic_id >= AVIC_PHY_APIC_ID_MAX)
> +		return -EINVAL;
> +
> +	entry = svm->avic_phy_apic_id_cache;

The naming is confusing ... can avic_phy_apic_id_cache change during
execution of this function?
If yes, then add READ_ONCE and distinguish the pointer name.
If not, then use svm->avic_phy_apic_id_cache directly.

entry would be ok name for current new_entry.

> +	if (!entry)
> +		return -EINVAL;
> +
> +	if (is_load) {
> +		new_entry = READ_ONCE(*entry);

Please move this before the if.

> +		BUG_ON(new_entry & AVIC_PHY_APIC_ID__IS_RUN_MSK);
> +
> +		new_entry &= ~AVIC_PHY_APIC_ID__HOST_PHY_APIC_ID_MSK;
> +		new_entry |= (h_phy_apic_id & AVIC_PHY_APIC_ID__HOST_PHY_APIC_ID_MSK);
> +
> +		/**
> +		 * Restore AVIC running flag if it was set during
> +		 * vcpu unload.
> +		 */
> +		if (svm->avic_was_running)
> +			new_entry |= AVIC_PHY_APIC_ID__IS_RUN_MSK;
> +		else
> +			new_entry &= ~AVIC_PHY_APIC_ID__IS_RUN_MSK;

You even BUG_ON when AVIC_PHY_APIC_ID__IS_RUN_MSK is set, so there is no
reason to clear it.

(Also, don't BUG.)

> +
> +		WRITE_ONCE(*entry, new_entry);

This will translate to two writes in 32 bit mode and we need to write
physical ID first to avoid spurious doorbells ...
is the order guaranteed?

> +	} else {
> +		new_entry = READ_ONCE(*entry);
> +		/**
> +		 * This handles the case when vcpu is scheduled out
> +		 * and has not yet not called blocking. We save the
> +		 * AVIC running flag so that we can restore later.
> +		 */

is_running must be disabled in between ...blocking and ...unblocking,
because we don't want to miss interrupts and block forever.
I somehow don't get it from the comment. :)

> +		if (new_entry & AVIC_PHY_APIC_ID__IS_RUN_MSK) {
> +			svm->avic_was_running = true;
> +			new_entry &= ~AVIC_PHY_APIC_ID__IS_RUN_MSK;
> +			WRITE_ONCE(*entry, new_entry);
> +		} else {
> +			svm->avic_was_running = false;
> +		}

(This can be shorter by setting avic_was_running first.)

> +	}
> +
> +	return ret;

(return 0;)

> +}
> +
> +/**
> + * This function is called during VCPU halt/unhalt.
> + */
> +static int avic_set_running(struct kvm_vcpu *vcpu, bool is_run)
> +{
> +	int ret = 0;
> +	int h_phy_apic_id;
> +	u64 *entry, new_entry;
> +	struct vcpu_svm *svm = to_svm(vcpu);
> +
> +	if (!svm_vcpu_avic_enabled(svm))
> +		return 0;
> +
> +	/* Note: APIC ID = 0xff is used for broadcast.
> +	 *       APIC ID > 0xff is reserved.
> +	 */
> +	h_phy_apic_id = __default_cpu_present_to_apicid(vcpu->cpu);
> +
> +	if (h_phy_apic_id >= AVIC_PHY_APIC_ID_MAX)
> +		return -EINVAL;

The cache should be valid only if this condition is true.
We can get rid of it in both function.

> +
> +	entry = svm->avic_phy_apic_id_cache;
> +	if (!entry)
> +		return -EINVAL;
> +
> +	if (is_run) {

Both READ_ONCE and WRITE_ONCE belong outside of the if.

> +		/* Handle vcpu unblocking after HLT */
> +		new_entry = READ_ONCE(*entry);
> +		new_entry |= AVIC_PHY_APIC_ID__IS_RUN_MSK;
> +		WRITE_ONCE(*entry, new_entry);
> +	} else {
> +		/* Handle vcpu blocking due to HLT */
> +		new_entry = READ_ONCE(*entry);
> +		new_entry &= ~AVIC_PHY_APIC_ID__IS_RUN_MSK;
> +		WRITE_ONCE(*entry, new_entry);
> +	}
> +
> +	return ret;
> +}
> +
>  static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PART1 RFC v3 08/12] KVM: x86: Add trace events for AVIC
  2016-03-18 10:24   ` Paolo Bonzini
@ 2016-03-28 11:27     ` Suravee Suthikulpanit
  0 siblings, 0 replies; 44+ messages in thread
From: Suravee Suthikulpanit @ 2016-03-28 11:27 UTC (permalink / raw)
  To: Paolo Bonzini, rkrcmar, joro, bp, gleb, alex.williamson
  Cc: kvm, linux-kernel, wei, sherry.hurwitz

Hi Paolo,

On 3/18/16 17:24, Paolo Bonzini wrote:
>> +	TP_printk("vcpu=%#x, icrh:icrl=%#010x:%08x, id=%u, index=%u\n",
> vcpus are usually printed with %u.  Apart from this, the patch looks
> good.  You can squash it in "svm: Add VMEXIT handlers for AVIC".
>
> Paolo
>

Sure, thanks for the feedback.

Suravee

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PART1 RFC v3 02/12] KVM: x86: Introducing kvm_x86_ops VM init/uninit hooks
  2016-03-18 10:11   ` Paolo Bonzini
@ 2016-03-29  5:27     ` Suravee Suthikulpanit
  2016-03-29 10:21       ` Paolo Bonzini
  0 siblings, 1 reply; 44+ messages in thread
From: Suravee Suthikulpanit @ 2016-03-29  5:27 UTC (permalink / raw)
  To: Paolo Bonzini, rkrcmar, joro, bp, gleb, alex.williamson
  Cc: kvm, linux-kernel, wei, sherry.hurwitz

Hi Paolo,

On 3/18/16 17:11, Paolo Bonzini wrote:
>
> On 18/03/2016 07:09, Suravee Suthikulpanit wrote:
>> >Adding function pointers in struct kvm_x86_ops for processor-specific
>> >layer to provide hooks for when KVM initialize and un-initialize VM.
> This is not the only thing your patch is doing, and the "other" change
> definitely needs a lot more explanation about why you did it and how you
> audited the code to ensure that it is safe.
>
> Paolo
>

Sorry, for not mentioning this earlier. I am moving the 
kvm_arch_init_vm() call mainly to go after mutex_init(&kvm->slots_lock) 
since I am calling the x86_set_memory_region() (which uses slots_lock) 
in the vm_init() hooks (for AVIC initialization).

Lemme re-check if this would be safe for other code.

Thanks,
Suravee

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PART1 RFC v3 02/12] KVM: x86: Introducing kvm_x86_ops VM init/uninit hooks
  2016-03-29  5:27     ` Suravee Suthikulpanit
@ 2016-03-29 10:21       ` Paolo Bonzini
  2016-03-29 11:47         ` Suravee Suthikulpanit
  0 siblings, 1 reply; 44+ messages in thread
From: Paolo Bonzini @ 2016-03-29 10:21 UTC (permalink / raw)
  To: Suravee Suthikulpanit, rkrcmar, joro, bp, gleb, alex.williamson
  Cc: kvm, linux-kernel, wei, sherry.hurwitz



On 29/03/2016 07:27, Suravee Suthikulpanit wrote:
>>
>>> >Adding function pointers in struct kvm_x86_ops for processor-specific
>>> >layer to provide hooks for when KVM initialize and un-initialize VM.
>> This is not the only thing your patch is doing, and the "other" change
>> definitely needs a lot more explanation about why you did it and how you
>> audited the code to ensure that it is safe.
>>
>> Paolo
>>
> 
> Sorry, for not mentioning this earlier. I am moving the
> kvm_arch_init_vm() call mainly to go after mutex_init(&kvm->slots_lock)
> since I am calling the x86_set_memory_region() (which uses slots_lock)
> in the vm_init() hooks (for AVIC initialization).
> 
> Lemme re-check if this would be safe for other code.

No problem.  In the meanwhile a patch got in ("KVM: fix spin_lock_init
order on x86") that should help you.

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PART1 RFC v3 02/12] KVM: x86: Introducing kvm_x86_ops VM init/uninit hooks
  2016-03-29 10:21       ` Paolo Bonzini
@ 2016-03-29 11:47         ` Suravee Suthikulpanit
  2016-03-30 10:00           ` Suravee Suthikulpanit
  0 siblings, 1 reply; 44+ messages in thread
From: Suravee Suthikulpanit @ 2016-03-29 11:47 UTC (permalink / raw)
  To: Paolo Bonzini, rkrcmar, joro, bp, gleb, alex.williamson
  Cc: kvm, linux-kernel, wei, sherry.hurwitz

Hi,

On 03/29/2016 05:21 PM, Paolo Bonzini wrote:
>
>
> On 29/03/2016 07:27, Suravee Suthikulpanit wrote:
>>>
>>>>> Adding function pointers in struct kvm_x86_ops for processor-specific
>>>>> layer to provide hooks for when KVM initialize and un-initialize VM.
>>> This is not the only thing your patch is doing, and the "other" change
>>> definitely needs a lot more explanation about why you did it and how you
>>> audited the code to ensure that it is safe.
>>>
>>> Paolo
>>>
>>
>> Sorry, for not mentioning this earlier. I am moving the
>> kvm_arch_init_vm() call mainly to go after mutex_init(&kvm->slots_lock)
>> since I am calling the x86_set_memory_region() (which uses slots_lock)
>> in the vm_init() hooks (for AVIC initialization).
>>
>> Lemme re-check if this would be safe for other code.
>
> No problem.  In the meanwhile a patch got in ("KVM: fix spin_lock_init
> order on x86") that should help you.
>
> Thanks,
>
> Paolo
>

Right.... that's just what I need :) I'll re-base to the latest tip then.

Thanks,
Suravee

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PART1 RFC v3 02/12] KVM: x86: Introducing kvm_x86_ops VM init/uninit hooks
  2016-03-29 11:47         ` Suravee Suthikulpanit
@ 2016-03-30 10:00           ` Suravee Suthikulpanit
  2016-03-30 12:07             ` Paolo Bonzini
  0 siblings, 1 reply; 44+ messages in thread
From: Suravee Suthikulpanit @ 2016-03-30 10:00 UTC (permalink / raw)
  To: Paolo Bonzini, rkrcmar, joro, bp, gleb, alex.williamson
  Cc: kvm, linux-kernel, wei, sherry.hurwitz

Hi Paolo,

On 3/29/16 18:47, Suravee Suthikulpanit wrote:
> Hi,
>
> On 03/29/2016 05:21 PM, Paolo Bonzini wrote:
>>
>>
>> On 29/03/2016 07:27, Suravee Suthikulpanit wrote:
>>>>
>>>>>> Adding function pointers in struct kvm_x86_ops for processor-specific
>>>>>> layer to provide hooks for when KVM initialize and un-initialize VM.
>>>> This is not the only thing your patch is doing, and the "other" change
>>>> definitely needs a lot more explanation about why you did it and how
>>>> you
>>>> audited the code to ensure that it is safe.
>>>>
>>>> Paolo
>>>>
>>>
>>> Sorry, for not mentioning this earlier. I am moving the
>>> kvm_arch_init_vm() call mainly to go after mutex_init(&kvm->slots_lock)
>>> since I am calling the x86_set_memory_region() (which uses slots_lock)
>>> in the vm_init() hooks (for AVIC initialization).
>>>
>>> Lemme re-check if this would be safe for other code.
>>
>> No problem.  In the meanwhile a patch got in ("KVM: fix spin_lock_init
>> order on x86") that should help you.
>>
>> Thanks,
>>
>> Paolo
>>
>
> Right.... that's just what I need :) I'll re-base to the latest tip then.

Actually, in the file virt/kvm/kvm_main.c, I still need to move the 
kvm_arch_init_vm() to some place after the call to kvm_alloc_memslots() 
since I am calling x86_set_memory_region() in the vm_init hook.

         r = -ENOMEM;
         for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
                 kvm->memslots[i] = kvm_alloc_memslots();
                 if (!kvm->memslots[i])
                         goto out_err_no_srcu;
         }

         if (init_srcu_struct(&kvm->srcu))
                 goto out_err_no_srcu;
         if (init_srcu_struct(&kvm->irq_srcu))
                 goto out_err_no_irq_srcu;
         for (i = 0; i < KVM_NR_BUSES; i++) {
                 kvm->buses[i] = kzalloc(sizeof(struct kvm_io_bus),
                                         GFP_KERNEL);
                 if (!kvm->buses[i])
                         goto out_err;
         }
//HERE
         r = kvm_arch_init_vm(kvm, type);
         if (r)
                 goto out_err;

Do you think that would be a problem?

Thanks,
Suravee

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PART1 RFC v3 02/12] KVM: x86: Introducing kvm_x86_ops VM init/uninit hooks
  2016-03-30 10:00           ` Suravee Suthikulpanit
@ 2016-03-30 12:07             ` Paolo Bonzini
  2016-03-30 12:18               ` Suravee Suthikulpanit
  0 siblings, 1 reply; 44+ messages in thread
From: Paolo Bonzini @ 2016-03-30 12:07 UTC (permalink / raw)
  To: Suravee Suthikulpanit, rkrcmar, joro, bp, gleb, alex.williamson
  Cc: kvm, linux-kernel, wei, sherry.hurwitz



On 30/03/2016 12:00, Suravee Suthikulpanit wrote:
> Hi Paolo,
> 
> On 3/29/16 18:47, Suravee Suthikulpanit wrote:
>> Hi,
>>
>> On 03/29/2016 05:21 PM, Paolo Bonzini wrote:
>>>
>>>
>>> On 29/03/2016 07:27, Suravee Suthikulpanit wrote:
>>>>>
>>>>>>> Adding function pointers in struct kvm_x86_ops for
>>>>>>> processor-specific
>>>>>>> layer to provide hooks for when KVM initialize and un-initialize VM.
>>>>> This is not the only thing your patch is doing, and the "other" change
>>>>> definitely needs a lot more explanation about why you did it and how
>>>>> you
>>>>> audited the code to ensure that it is safe.
>>>>>
>>>>> Paolo
>>>>>
>>>>
>>>> Sorry, for not mentioning this earlier. I am moving the
>>>> kvm_arch_init_vm() call mainly to go after mutex_init(&kvm->slots_lock)
>>>> since I am calling the x86_set_memory_region() (which uses slots_lock)
>>>> in the vm_init() hooks (for AVIC initialization).
>>>>
>>>> Lemme re-check if this would be safe for other code.
>>>
>>> No problem.  In the meanwhile a patch got in ("KVM: fix spin_lock_init
>>> order on x86") that should help you.
>>>
>>> Thanks,
>>>
>>> Paolo
>>>
>>
>> Right.... that's just what I need :) I'll re-base to the latest tip then.
> 
> Actually, in the file virt/kvm/kvm_main.c, I still need to move the
> kvm_arch_init_vm() to some place after the call to kvm_alloc_memslots()
> since I am calling x86_set_memory_region() in the vm_init hook.
> 
>         r = -ENOMEM;
>         for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
>                 kvm->memslots[i] = kvm_alloc_memslots();
>                 if (!kvm->memslots[i])
>                         goto out_err_no_srcu;
>         }
> 
>         if (init_srcu_struct(&kvm->srcu))
>                 goto out_err_no_srcu;
>         if (init_srcu_struct(&kvm->irq_srcu))
>                 goto out_err_no_irq_srcu;
>         for (i = 0; i < KVM_NR_BUSES; i++) {
>                 kvm->buses[i] = kzalloc(sizeof(struct kvm_io_bus),
>                                         GFP_KERNEL);
>                 if (!kvm->buses[i])
>                         goto out_err;
>         }
> //HERE
>         r = kvm_arch_init_vm(kvm, type);
>         if (r)
>                 goto out_err;
> 
> Do you think that would be a problem?

Can you delay that to after the creation of the first VCPU?

Allocating AVIC data structures is not required if userspace has not
called KVM_CREATE_IRQCHIP or enabled KVM_CAP_SPLIT_IRQCHIP.

Paolo

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PART1 RFC v3 11/12] svm: Do not intercept CR8 when enable AVIC
  2016-03-18 21:10   ` Radim Krčmář
@ 2016-03-30 12:15     ` Suravee Suthikulpanit
  0 siblings, 0 replies; 44+ messages in thread
From: Suravee Suthikulpanit @ 2016-03-30 12:15 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: pbonzini, joro, bp, gleb, alex.williamson, kvm, linux-kernel,
	wei, sherry.hurwitz

Hi Radim,

On 3/19/16 04:10, Radim Krčmář wrote:
> 2016-03-18 01:09-0500, Suravee Suthikulpanit:
>> When enable AVIC:
>>      * Do not intercept CR8 since this should be handled by AVIC HW.
>>      * Also update TPR in APIC backing page when syncing CR8 before VMRUN
>>
>> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
>> ---
>>   arch/x86/kvm/svm.c | 15 +++++++++++----
>>   1 file changed, 11 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>> index ba84d57..d5418c3 100644
>> --- a/arch/x86/kvm/svm.c
>> +++ b/arch/x86/kvm/svm.c
>> @@ -984,6 +984,8 @@ static __init int svm_hardware_setup(void)
>>
>>   	if (avic) {
>>   		printk(KERN_INFO "kvm: AVIC enabled\n");
>> +
>> +		svm_x86_ops.update_cr8_intercept = NULL;
>
> This doesn't look right.

Actually, I don't think this change isn't necessary since we don't even 
call kvm_x86_ops->update_cr8_intercept() if vcpu->arch.apicv_active (See 
arch/x86/kvm/x86.c: update_cr8_intercept()).

> [....]
>> @@ -4080,7 +4083,8 @@ static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
>>   {
>>   	struct vcpu_svm *svm = to_svm(vcpu);
>>
>> -	if (is_guest_mode(vcpu) && (vcpu->arch.hflags & HF_VINTR_MASK))
>> +	if ((is_guest_mode(vcpu) && (vcpu->arch.hflags & HF_VINTR_MASK)) ||
>> +	     svm_vcpu_avic_enabled(svm))
>>   		return;
>>
>>   	clr_cr_intercept(svm, INTERCEPT_CR8_WRITE);
>> @@ -4271,9 +4275,12 @@ static inline void sync_lapic_to_cr8(struct kvm_vcpu *vcpu)
>>   	if (is_guest_mode(vcpu) && (vcpu->arch.hflags & HF_VINTR_MASK))
>>   		return;
>
> I think we can exit early with svm_vcpu_avic_enabled().

Right.

>
>>
>> -	cr8 = kvm_get_cr8(vcpu);
>> +	cr8 = kvm_get_cr8(vcpu) & V_TPR_MASK;
>>   	svm->vmcb->control.int_ctl &= ~V_TPR_MASK;
>> -	svm->vmcb->control.int_ctl |= cr8 & V_TPR_MASK;
>> +	svm->vmcb->control.int_ctl |= cr8;
>> +
>> +	if (svm_vcpu_avic_enabled(svm))
>> +		kvm_lapic_set_reg(svm->vcpu.arch.apic, APIC_TASKPRI, (u32)cr8 << 4);
>
> kvm_get_cr8() == kvm_lapic_get_reg(APIC_TASKPRI) >> 4.

Good point. I'll clean up and simplify this function.

Thanks,
Suravee

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PART1 RFC v3 02/12] KVM: x86: Introducing kvm_x86_ops VM init/uninit hooks
  2016-03-30 12:07             ` Paolo Bonzini
@ 2016-03-30 12:18               ` Suravee Suthikulpanit
  0 siblings, 0 replies; 44+ messages in thread
From: Suravee Suthikulpanit @ 2016-03-30 12:18 UTC (permalink / raw)
  To: Paolo Bonzini, rkrcmar, joro, bp, gleb, alex.williamson
  Cc: kvm, linux-kernel, wei, sherry.hurwitz

Hi Paolo,

On 3/30/16 19:07, Paolo Bonzini wrote:
>
>
> On 30/03/2016 12:00, Suravee Suthikulpanit wrote:
>> Hi Paolo,
>>
>> On 3/29/16 18:47, Suravee Suthikulpanit wrote:
>>> Hi,
>>>
>>> On 03/29/2016 05:21 PM, Paolo Bonzini wrote:
>>>>
>>>>
>>>> On 29/03/2016 07:27, Suravee Suthikulpanit wrote:
>>>>>>
>>>>>>>> Adding function pointers in struct kvm_x86_ops for
>>>>>>>> processor-specific
>>>>>>>> layer to provide hooks for when KVM initialize and un-initialize VM.
>>>>>> This is not the only thing your patch is doing, and the "other" change
>>>>>> definitely needs a lot more explanation about why you did it and how
>>>>>> you
>>>>>> audited the code to ensure that it is safe.
>>>>>>
>>>>>> Paolo
>>>>>>
>>>>>
>>>>> Sorry, for not mentioning this earlier. I am moving the
>>>>> kvm_arch_init_vm() call mainly to go after mutex_init(&kvm->slots_lock)
>>>>> since I am calling the x86_set_memory_region() (which uses slots_lock)
>>>>> in the vm_init() hooks (for AVIC initialization).
>>>>>
>>>>> Lemme re-check if this would be safe for other code.
>>>>
>>>> No problem.  In the meanwhile a patch got in ("KVM: fix spin_lock_init
>>>> order on x86") that should help you.
>>>>
>>>> Thanks,
>>>>
>>>> Paolo
>>>>
>>>
>>> Right.... that's just what I need :) I'll re-base to the latest tip then.
>>
>> Actually, in the file virt/kvm/kvm_main.c, I still need to move the
>> kvm_arch_init_vm() to some place after the call to kvm_alloc_memslots()
>> since I am calling x86_set_memory_region() in the vm_init hook.
>>
>>          r = -ENOMEM;
>>          for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
>>                  kvm->memslots[i] = kvm_alloc_memslots();
>>                  if (!kvm->memslots[i])
>>                          goto out_err_no_srcu;
>>          }
>>
>>          if (init_srcu_struct(&kvm->srcu))
>>                  goto out_err_no_srcu;
>>          if (init_srcu_struct(&kvm->irq_srcu))
>>                  goto out_err_no_irq_srcu;
>>          for (i = 0; i < KVM_NR_BUSES; i++) {
>>                  kvm->buses[i] = kzalloc(sizeof(struct kvm_io_bus),
>>                                          GFP_KERNEL);
>>                  if (!kvm->buses[i])
>>                          goto out_err;
>>          }
>> //HERE
>>          r = kvm_arch_init_vm(kvm, type);
>>          if (r)
>>                  goto out_err;
>>
>> Do you think that would be a problem?
>
> Can you delay that to after the creation of the first VCPU?

Sure, I can. That's what I was doing originally before we introduce the 
vm_init hook. I just thought that this would make a nice place.

> Allocating AVIC data structures is not required if userspace has not
> called KVM_CREATE_IRQCHIP or enabled KVM_CAP_SPLIT_IRQCHIP.
>
> Paolo
>

Okay.

Thanks,
Suravee

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PART1 RFC v3 10/12] svm: Do not expose x2APIC when enable AVIC
  2016-03-18 20:59   ` Radim Krčmář
@ 2016-03-31  4:15     ` Suravee Suthikulpanit
  2016-03-31 11:23       ` Paolo Bonzini
  0 siblings, 1 reply; 44+ messages in thread
From: Suravee Suthikulpanit @ 2016-03-31  4:15 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: pbonzini, joro, bp, gleb, alex.williamson, kvm, linux-kernel,
	wei, sherry.hurwitz

Hi Radim,

On 03/19/2016 03:59 AM, Radim Krčmář wrote:
> 2016-03-18 01:09-0500, Suravee Suthikulpanit:
>> From: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
>>
>> Since AVIC only virtualizes xAPIC hardware for the guest, we need to:
>>      * Intercept APIC BAR msr accesses to disable x2APIC
>>      * Intercept CPUID access to not advertise x2APIC support
>>      * Hide x2APIC support when checking via KVM ioctl
>>
>> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
>> ---
>>   arch/x86/kvm/svm.c | 49 ++++++++++++++++++++++++++++++++++++++++++-------
>>   1 file changed, 42 insertions(+), 7 deletions(-)
>>
>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>> index 6303147..ba84d57 100644
>> --- a/arch/x86/kvm/svm.c
>> +++ b/arch/x86/kvm/svm.c
>> @@ -209,6 +209,7 @@ static const struct svm_direct_access_msrs {
>>   	{ .index = MSR_IA32_LASTBRANCHTOIP,		.always = false },
>>   	{ .index = MSR_IA32_LASTINTFROMIP,		.always = false },
>>   	{ .index = MSR_IA32_LASTINTTOIP,		.always = false },
>> +	{ .index = MSR_IA32_APICBASE,			.always = false },
>>   	{ .index = MSR_INVALID,				.always = false },
>>   };
>>
>> @@ -853,6 +854,9 @@ static void svm_vcpu_init_msrpm(u32 *msrpm)
>>
>>   		set_msr_interception(msrpm, direct_access_msrs[i].index, 1, 1);
>>   	}
>> +
>> +	if (svm_vcpu_avic_enabled(svm))
>> +		set_msr_interception(msrpm, MSR_IA32_APICBASE, 1, 1);
>
> AVIC really won't exit on writes to MSR_IA32_APICBASE otherwise?

Actually, I got confused about this part. This should not be needed.

>
>> @@ -3308,6 +3312,18 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>>   			msr_info->data = 0x1E;
>>   		}
>>   		break;
>> +	case MSR_IA32_APICBASE:
>> +		if (svm_vcpu_avic_enabled(svm)) {
>> +			/* Note:
>> +			 * For AVIC, we need to disable X2APIC
>> +			 * and enable XAPIC
>> +			 */
>> +			kvm_get_msr_common(vcpu, msr_info);
>> +			msr_info->data &= ~X2APIC_ENABLE;
>> +			msr_info->data |= XAPIC_ENABLE;
>> +			break;
>
> No.  This won't make the guest switch to xAPIC.
> x2APIC can only be enabled if CPUID has that flag and it's impossible to
> toggle that CPUID flag it during runtime.

This is also not needed since we already disable the x2APIC in the CPUID 
below.

>> +		}
>> +		/* Follow through if not AVIC */
>>   	default:
>>   		return kvm_get_msr_common(vcpu, msr_info);
>>   	}
>> @@ -3436,6 +3452,10 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
>>   	case MSR_VM_IGNNE:
>>   		vcpu_unimpl(vcpu, "unimplemented wrmsr: 0x%x data 0x%llx\n", ecx, data);
>>   		break;
>> +	case MSR_IA32_APICBASE:
>> +		if (svm_vcpu_avic_enabled(svm))
>> +			avic_update_vapic_bar(to_svm(vcpu), data);
>
> There is no connection to x2APIC, please do it in a different patch.

Right. I'll move this.

>
>> +		/* Follow through */
>>   	default:
>>   		return kvm_set_msr_common(vcpu, msr);
>>   	}
>> @@ -4554,11 +4574,26 @@ static void svm_cpuid_update(struct kvm_vcpu *vcpu)
>>
>>   	/* Update nrips enabled cache */
>>   	svm->nrips_enabled = !!guest_cpuid_has_nrips(&svm->vcpu);
>> +
>> +	/* Do not support X2APIC when enable AVIC */
>> +	if (svm_vcpu_avic_enabled(svm)) {
>> +		int i;
>> +
>> +		for (i = 0 ; i < vcpu->arch.cpuid_nent ; i++) {
>> +			if (vcpu->arch.cpuid_entries[i].function == 1)
>
> Please use kvm_find_cpuid_entry for the search.
>
>> +				vcpu->arch.cpuid_entries[i].ecx &= ~(1 << 21);
>
> and X86_FEATURE_X2APIC (or something with X2APIC in name) for the bit.
>
> The code will become so obvious that the comment can be removed. :)

Good point. I can only find example of using (X86_FEATURE_X2APIC % 32) 
== 21.

>> +		}
>> +	}
>>   }
>>
>>   static void svm_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 *entry)
>>   {
>>   	switch (func) {
>> +	case 0x00000001:
>> +		/* Do not support X2APIC when enable AVIC */
>> +		if (avic)
>> +			entry->ecx &= ~(1 << 21);
>
> I think this might be the right place for the code you have in
> svm_cpuid_update.

Right. I'll also make change to use (X86_FEATURE_X2APIC % 32)

> Btw. how does x2APIC behave under AVIC?
> We definitely shouldn't recommend/expose x2APIC with AVIC as AVIC
> doesn't accelerate x2APIC guest-facing interface,

Access to offset 0x400+ would generate #VMEXIT no accel fault 
read/write.  So, we will need to handle and emulate this in the host.

> but the MSR interface is going to exit and host-side interrupt
 > delivery will probably still work, so I don't see
 > a huge problem with it.

Agree that it will still work. However, in such case, the guest code 
would likely default to using x2APIC interface, which will not be 
handled by the AVIC hardware, and resulting in no performance 
improvement that we are trying to introduce.

Thanks,
Suravee

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PART1 RFC v3 12/12] svm: Manage vcpu load/unload when enable AVIC
  2016-03-18 21:44   ` Radim Krčmář
@ 2016-03-31  8:52     ` Suravee Suthikulpanit
  2016-03-31 14:19       ` Radim Krčmář
  0 siblings, 1 reply; 44+ messages in thread
From: Suravee Suthikulpanit @ 2016-03-31  8:52 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: pbonzini, joro, bp, gleb, alex.williamson, kvm, linux-kernel,
	wei, sherry.hurwitz

Hi Radim,

On 03/19/2016 04:44 AM, Radim Krčmář wrote:
> 2016-03-18 01:09-0500, Suravee Suthikulpanit:
>> From: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
>> +
>> +		WRITE_ONCE(*entry, new_entry);
>
> This will translate to two writes in 32 bit mode and we need to write
> physical ID first to avoid spurious doorbells ...
> is the order guaranteed?

Hm.. not sure on that.

>> +	} else {
>> +		new_entry = READ_ONCE(*entry);
>> +		/**
>> +		 * This handles the case when vcpu is scheduled out
>> +		 * and has not yet not called blocking. We save the
>> +		 * AVIC running flag so that we can restore later.
>> +		 */
>
> is_running must be disabled in between ...blocking and ...unblocking,
> because we don't want to miss interrupts and block forever.
> I somehow don't get it from the comment. :)

Not sure if I understand your concern.  However, the is_running bit 
setting/clearing should be handled in the avic_set_running below. This 
part only handles othe case when the is_running bit still set when 
calling vcpu_put (and later on loading some other vcpus). This way, when 
we are re-loading this vcpu, we can restore the is_running bit accordingly.

Thanks,
Suravee

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PART1 RFC v3 10/12] svm: Do not expose x2APIC when enable AVIC
  2016-03-31  4:15     ` Suravee Suthikulpanit
@ 2016-03-31 11:23       ` Paolo Bonzini
  2016-04-05 10:14         ` Suravee Suthikulpanit
  0 siblings, 1 reply; 44+ messages in thread
From: Paolo Bonzini @ 2016-03-31 11:23 UTC (permalink / raw)
  To: Suravee Suthikulpanit, Radim Krčmář
  Cc: joro, bp, gleb, alex.williamson, kvm, linux-kernel, wei, sherry.hurwitz



On 31/03/2016 06:15, Suravee Suthikulpanit wrote:
>>> +                vcpu->arch.cpuid_entries[i].ecx &= ~(1 << 21);
>> 
>> and X86_FEATURE_X2APIC (or something with X2APIC in name) for the
>> bit.
>> 
>> The code will become so obvious that the comment can be removed.
>> :)
> 
> Good point. I can only find example of using (X86_FEATURE_X2APIC %
> 32) == 21.

You can use bit(X86_FEATURE_X2APIC), it is defined in arch/x86/kvm/x86.h.

>> but the MSR interface is going to exit and host-side interrupt 
>> delivery will probably still work, so I don't see a huge problem
>> with it.
> 
> Agree that it will still work. However, in such case, the guest code
> would likely default to using x2APIC interface, which will not be
> handled by the AVIC hardware, and resulting in no performance
> improvement that we are trying to introduce.

You would still get some improvement from exit-free interrupt delivery.

Paolo

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PART1 RFC v3 12/12] svm: Manage vcpu load/unload when enable AVIC
  2016-03-31  8:52     ` Suravee Suthikulpanit
@ 2016-03-31 14:19       ` Radim Krčmář
  2016-04-05 10:07         ` Suravee Suthikulpanit
  0 siblings, 1 reply; 44+ messages in thread
From: Radim Krčmář @ 2016-03-31 14:19 UTC (permalink / raw)
  To: Suravee Suthikulpanit
  Cc: pbonzini, joro, bp, gleb, alex.williamson, kvm, linux-kernel,
	wei, sherry.hurwitz

2016-03-31 15:52+0700, Suravee Suthikulpanit:
> On 03/19/2016 04:44 AM, Radim Krčmář wrote:
>>2016-03-18 01:09-0500, Suravee Suthikulpanit:
>>>+	} else {
>>>+		new_entry = READ_ONCE(*entry);
>>>+		/**
>>>+		 * This handles the case when vcpu is scheduled out
>>>+		 * and has not yet not called blocking. We save the
>>>+		 * AVIC running flag so that we can restore later.
>>>+		 */
>>
>>is_running must be disabled in between ...blocking and ...unblocking,
>>because we don't want to miss interrupts and block forever.
>>I somehow don't get it from the comment. :)
> 
> Not sure if I understand your concern.  However, the is_running bit
> setting/clearing should be handled in the avic_set_running below. This part
> only handles othe case when the is_running bit still set when calling
> vcpu_put (and later on loading some other vcpus). This way, when we are
> re-loading this vcpu, we can restore the is_running bit accordingly.

I think that the comment is misleading.  The saved is_running flag only
matters after svm_vcpu_blocking, yet the comment says that it handles
the irrelevant case before.

Another minor bug is that was_running isn't initialized to 1, so we need
to halt before is_running gets set.
It would be clearer to toggle a flag in svm_vcpu_(un)blocking and set
is_running = !is_blocking.  Doing so will also be immeasurably faster,
because avic_vcpu_load is called far more than svm_vcpu_(un)blocking.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PART1 RFC v3 12/12] svm: Manage vcpu load/unload when enable AVIC
  2016-03-31 14:19       ` Radim Krčmář
@ 2016-04-05 10:07         ` Suravee Suthikulpanit
  2016-04-05 14:56           ` Radim Krčmář
  0 siblings, 1 reply; 44+ messages in thread
From: Suravee Suthikulpanit @ 2016-04-05 10:07 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: pbonzini, joro, bp, gleb, alex.williamson, kvm, linux-kernel,
	wei, sherry.hurwitz

Hi Radim,

On 3/31/16 21:19, Radim Krčmář wrote:
> 2016-03-31 15:52+0700, Suravee Suthikulpanit:
>> On 03/19/2016 04:44 AM, Radim Krčmář wrote:
>>> 2016-03-18 01:09-0500, Suravee Suthikulpanit:
>>>> +	} else {
>>>> +		new_entry = READ_ONCE(*entry);
>>>> +		/**
>>>> +		 * This handles the case when vcpu is scheduled out
>>>> +		 * and has not yet not called blocking. We save the
>>>> +		 * AVIC running flag so that we can restore later.
>>>> +		 */
>>>
>>> is_running must be disabled in between ...blocking and ...unblocking,
>>> because we don't want to miss interrupts and block forever.
>>> I somehow don't get it from the comment. :)
>>
>> Not sure if I understand your concern.  However, the is_running bit
>> setting/clearing should be handled in the avic_set_running below. This part
>> only handles othe case when the is_running bit still set when calling
>> vcpu_put (and later on loading some other vcpus). This way, when we are
>> re-loading this vcpu, we can restore the is_running bit accordingly.
>
> I think that the comment is misleading.  The saved is_running flag only
> matters after svm_vcpu_blocking, yet the comment says that it handles
> the irrelevant case before.

Actually, my understanding is if the svm_vcpu_blocking() is called, the 
is_running bit would have been cleared. At this point, if the vcpu is 
unloaded. We should not need to worry about it. Is that not the case here?

> Another minor bug is that was_running isn't initialized to 1, so we need
> to halt before is_running gets set.

Just to make sure, you are referring to the point where the is_running 
is not set for first time the vcpu is loaded?  If so, I agree. Thanks 
for the good catch.

> It would be clearer to toggle a flag in svm_vcpu_(un)blocking and set
> is_running = !is_blocking.

Not sure what you meant here. We are already setting/unsetting the 
is_running bit when vcpu is blocking/unblocking. Are you suggesting just 
simply move the current avic_set_running() into the svm_vcpu_blocking 
and svm_vcpu_unblocking()?


> Doing so will also be immeasurably faster,
> because avic_vcpu_load is called far more than svm_vcpu_(un)blocking.

Actually, this is not the same as handling normal vcpu blocking and 
unblocking, which we are already setting/un-setting the is_running bit 
in the avic_set_running(). The was_running should only be set to 1 if 
the vcpu is unloaded but has not yet calling halt.

Am I missing your points somehow?

Thanks,
Suravee

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PART1 RFC v3 10/12] svm: Do not expose x2APIC when enable AVIC
  2016-03-31 11:23       ` Paolo Bonzini
@ 2016-04-05 10:14         ` Suravee Suthikulpanit
  0 siblings, 0 replies; 44+ messages in thread
From: Suravee Suthikulpanit @ 2016-04-05 10:14 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář
  Cc: joro, bp, gleb, alex.williamson, kvm, linux-kernel, wei, sherry.hurwitz

Hi Paolo,

On 3/31/16 18:23, Paolo Bonzini wrote:
>
>
> On 31/03/2016 06:15, Suravee Suthikulpanit wrote:
>>>> +                vcpu->arch.cpuid_entries[i].ecx &= ~(1 << 21);
>>>
>>> and X86_FEATURE_X2APIC (or something with X2APIC in name) for the
>>> bit.
>>>
>>> The code will become so obvious that the comment can be removed.
>>> :)
>>
>> Good point. I can only find example of using (X86_FEATURE_X2APIC %
>> 32) == 21.
>
> You can use bit(X86_FEATURE_X2APIC), it is defined in arch/x86/kvm/x86.h.

Ahh, thanks.

>
>>> but the MSR interface is going to exit and host-side interrupt
>>> delivery will probably still work, so I don't see a huge problem
>>> with it.
>>
>> Agree that it will still work. However, in such case, the guest code
>> would likely default to using x2APIC interface, which will not be
>> handled by the AVIC hardware, and resulting in no performance
>> improvement that we are trying to introduce.
>
> You would still get some improvement from exit-free interrupt delivery.

Let me look into this and investigate some more.

Thanks,
Suravee

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PART1 RFC v3 07/12] svm: Add interrupt injection via AVIC
  2016-03-18 10:22   ` Paolo Bonzini
@ 2016-04-05 13:26     ` Suravee Suthikulpanit
  2016-04-05 15:56       ` Suravee Suthikulpanit
  0 siblings, 1 reply; 44+ messages in thread
From: Suravee Suthikulpanit @ 2016-04-05 13:26 UTC (permalink / raw)
  To: Paolo Bonzini, rkrcmar, joro, bp, gleb, alex.williamson
  Cc: kvm, linux-kernel, wei, sherry.hurwitz

Hi Paolo,

On 3/18/16 17:22, Paolo Bonzini wrote:
>
>
> On 18/03/2016 07:09, Suravee Suthikulpanit wrote:
>> This patch introduces a new mechanism to inject interrupt using AVIC.
>> Since VINTR is not supported when enable AVIC, we need to inject
>> interrupt via APIC backing page instead.
>>
>> This patch also adds support for AVIC doorbell, which is used by
>> KVM to signal a running vcpu to check IRR for injected interrupts.
>>
>> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
>
> Looks good, but I think it breaks nested virtualization.  See below.
>
>> [...]
>> @@ -2877,8 +2895,10 @@ static int clgi_interception(struct vcpu_svm *svm)
>>   	disable_gif(svm);
>>
>>   	/* After a CLGI no interrupts should come */
>> -	svm_clear_vintr(svm);
>> -	svm->vmcb->control.int_ctl &= ~V_IRQ_MASK;
>> +	if (!svm_vcpu_avic_enabled(svm)) {
>> +		svm_clear_vintr(svm);
>> +		svm->vmcb->control.int_ctl &= ~V_IRQ_MASK;
>> +	}
>
> This is for nested virtualization.  Unless you support nested AVIC, the
> L2 guest should run without AVIC (i.e. IsRunning should be false) and
> use the old VINTR mechanism.

I see. I am not planning to supported nested AVIC at the L2 level for 
the moment. If it is alright, I would like to get the basic AVIC and 
IOMMU in first (unless you have a different opinion).

In that case, I think I should also make sure to not expose AVIC CPUID 
to the guest VM.

>> [...]
>> @@ -3904,6 +3942,9 @@ static void enable_irq_window(struct kvm_vcpu *vcpu)
>>   	 * get that intercept, this function will be called again though and
>>   	 * we'll get the vintr intercept.
>>   	 */
>> +	if (svm_vcpu_avic_enabled(svm))
>> +		return;
>
> Same here.

If I make change so that we do not expose the AVIC CPUID to the L1 
guest, then the L1 KVM driver should not be setting up AVIC for the L2 
vcpus. And, in this case, the svm_vcpu_avic_enabled(svm) should return 
false. I've not tested with nested VM. I will give that a try.

Thanks,
Suravee

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PART1 RFC v3 12/12] svm: Manage vcpu load/unload when enable AVIC
  2016-04-05 10:07         ` Suravee Suthikulpanit
@ 2016-04-05 14:56           ` Radim Krčmář
  2016-04-06  3:40             ` Suravee Suthikulpanit
  0 siblings, 1 reply; 44+ messages in thread
From: Radim Krčmář @ 2016-04-05 14:56 UTC (permalink / raw)
  To: Suravee Suthikulpanit
  Cc: pbonzini, joro, bp, gleb, alex.williamson, kvm, linux-kernel,
	wei, sherry.hurwitz

2016-04-05 17:07+0700, Suravee Suthikulpanit:
> On 3/31/16 21:19, Radim Krčmář wrote:
>>2016-03-31 15:52+0700, Suravee Suthikulpanit:
>>>On 03/19/2016 04:44 AM, Radim Krčmář wrote:
>>>>2016-03-18 01:09-0500, Suravee Suthikulpanit:
>>>>>+	} else {
>>>>>+		new_entry = READ_ONCE(*entry);
>>>>>+		/**
>>>>>+		 * This handles the case when vcpu is scheduled out
>>>>>+		 * and has not yet not called blocking. We save the
>>>>>+		 * AVIC running flag so that we can restore later.
>>>>>+		 */
>>>>
>>>>is_running must be disabled in between ...blocking and ...unblocking,
>>>>because we don't want to miss interrupts and block forever.
>>>>I somehow don't get it from the comment. :)
>>>
>>>Not sure if I understand your concern.  However, the is_running bit
>>>setting/clearing should be handled in the avic_set_running below. This part
>>>only handles othe case when the is_running bit still set when calling
>>>vcpu_put (and later on loading some other vcpus). This way, when we are
>>>re-loading this vcpu, we can restore the is_running bit accordingly.
>>
>>I think that the comment is misleading.  The saved is_running flag only
>>matters after svm_vcpu_blocking, yet the comment says that it handles
>>the irrelevant case before.
> 
> Actually, my understanding is if the svm_vcpu_blocking() is called, the
> is_running bit would have been cleared. At this point, if the vcpu is
> unloaded. We should not need to worry about it. Is that not the case here?

svm_vcpu_blocking() clears is_running so we don't wait infinitely if an
interrupt arrives between kvm_vcpu_check_block() and schedule().
was_running ensures that preempt notifiers don't set is_running between
kvm_vcpu_check_block() and schedule() and it's the only place where we
need to worry about was_running causing a bug.

The comment would be better if it covered the case we actually care
about and I think that we can change was_running to make it clear even
without a comment.

>>Another minor bug is that was_running isn't initialized to 1, so we need
>>to halt before is_running gets set.
> 
> Just to make sure, you are referring to the point where the is_running is
> not set for first time the vcpu is loaded?

Yes.

>>It would be clearer to toggle a flag in svm_vcpu_(un)blocking and set
>>is_running = !is_blocking.
> 
> Not sure what you meant here. We are already setting/unsetting the
> is_running bit when vcpu is blocking/unblocking. Are you suggesting just
> simply move the current avic_set_running() into the svm_vcpu_blocking and
> svm_vcpu_unblocking()?

No, that would be buggy.  (The code needs to force is_running to true on
svm_vcpu_unblocking().)

I meant to change the place where we remember that is_running must not
be true.  Something like

  svm_vcpu_blocking(struct kvm_vcpu *vcpu):
         vcpu->is_blocking = true;
         avic_set_running(vcpu, false);

  avic_vcpu_load(struct kvm_vcpu *vcpu, bool is_load):
         avic_set_running(vcpu, is_load && !vcpu->is_blocking)

>>Doing so will also be immeasurably faster,
>>because avic_vcpu_load is called far more than svm_vcpu_(un)blocking.
> 
> Actually, this is not the same as handling normal vcpu blocking and
> unblocking, which we are already setting/un-setting the is_running bit in
> the avic_set_running().

There is no practical difference after fixing the bug where was_running
starts as 0.

>                         The was_running should only be set to 1 if the vcpu
> is unloaded but has not yet calling halt.

Yes.  was_running must be 0 inside of svm_vcpu_blocking and
svm_vcpu_unblocking and should be 1 outside.

> Am I missing your points somehow?

I'm not sure ...

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PART1 RFC v3 07/12] svm: Add interrupt injection via AVIC
  2016-04-05 13:26     ` Suravee Suthikulpanit
@ 2016-04-05 15:56       ` Suravee Suthikulpanit
  0 siblings, 0 replies; 44+ messages in thread
From: Suravee Suthikulpanit @ 2016-04-05 15:56 UTC (permalink / raw)
  To: Paolo Bonzini, rkrcmar, joro, bp, gleb, alex.williamson
  Cc: kvm, linux-kernel, wei, sherry.hurwitz

Hi Paolo,

On 4/5/16 20:26, Suravee Suthikulpanit wrote:
>>> @@ -2877,8 +2895,10 @@ static int clgi_interception(struct vcpu_svm
>>> *svm)
>>>       disable_gif(svm);
>>>
>>>       /* After a CLGI no interrupts should come */
>>> -    svm_clear_vintr(svm);
>>> -    svm->vmcb->control.int_ctl &= ~V_IRQ_MASK;
>>> +    if (!svm_vcpu_avic_enabled(svm)) {
>>> +        svm_clear_vintr(svm);
>>> +        svm->vmcb->control.int_ctl &= ~V_IRQ_MASK;
>>> +    }
>>
>> This is for nested virtualization.  Unless you support nested AVIC, the
>> L2 guest should run without AVIC (i.e. IsRunning should be false) and
>> use the old VINTR mechanism.
>
> I see. I am not planning to supported nested AVIC at the L2 level for
> the moment. If it is alright, I would like to get the basic AVIC and
> IOMMU in first (unless you have a different opinion).
>
> In that case, I think I should also make sure to not expose AVIC CPUID
> to the guest VM.

Actually, it should have already not set the AVIC CPUID in the L1 guest.

Thanks,
Suravee

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PART1 RFC v3 12/12] svm: Manage vcpu load/unload when enable AVIC
  2016-04-05 14:56           ` Radim Krčmář
@ 2016-04-06  3:40             ` Suravee Suthikulpanit
  2016-04-06 12:36               ` Radim Krčmář
  0 siblings, 1 reply; 44+ messages in thread
From: Suravee Suthikulpanit @ 2016-04-06  3:40 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: pbonzini, joro, bp, gleb, alex.williamson, kvm, linux-kernel,
	wei, sherry.hurwitz

Radim,

On 04/05/2016 09:56 PM, Radim Krčmář wrote:
> I meant to change the place where we remember that is_running must not
> be true.  Something like
>
>    svm_vcpu_blocking(struct kvm_vcpu *vcpu):
>           vcpu->is_blocking = true;
>           avic_set_running(vcpu, false);
>
>    avic_vcpu_load(struct kvm_vcpu *vcpu, bool is_load):
>           avic_set_running(vcpu, is_load && !vcpu->is_blocking)

I assume that you also imply that we would also need:

	svm_vcpu_unblocking(struct kvm_vcpu *vcpu) {
		avic_set_running(vcpu, false);
		vcpu->is_blocking = false;
	}

Thanks,
Suravee

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PART1 RFC v3 12/12] svm: Manage vcpu load/unload when enable AVIC
  2016-04-06  3:40             ` Suravee Suthikulpanit
@ 2016-04-06 12:36               ` Radim Krčmář
  0 siblings, 0 replies; 44+ messages in thread
From: Radim Krčmář @ 2016-04-06 12:36 UTC (permalink / raw)
  To: Suravee Suthikulpanit
  Cc: pbonzini, joro, bp, gleb, alex.williamson, kvm, linux-kernel,
	wei, sherry.hurwitz

2016-04-06 10:40+0700, Suravee Suthikulpanit:
> On 04/05/2016 09:56 PM, Radim Krčmář wrote:
>>I meant to change the place where we remember that is_running must not
>>be true.  Something like
>>
>>   svm_vcpu_blocking(struct kvm_vcpu *vcpu):
>>          vcpu->is_blocking = true;
>>          avic_set_running(vcpu, false);
>>
>>   avic_vcpu_load(struct kvm_vcpu *vcpu, bool is_load):
>>          avic_set_running(vcpu, is_load && !vcpu->is_blocking)
> 
> I assume that you also imply that we would also need:
> 
> 	svm_vcpu_unblocking(struct kvm_vcpu *vcpu) {
> 		avic_set_running(vcpu, false);
> 		vcpu->is_blocking = false;
> 	}

Yes, thought the order should be flipped in order to avoid suboptimal
case when preemption hits us after avic_set_running().

   static void svm_vcpu_unblocking(struct kvm_vcpu *vcpu) {
   	vcpu->is_blocking = false;
   	avic_set_running(vcpu, true);
   }

avic_set_running has barriers that prevent GCC from harmful reordering.

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2016-04-06 12:36 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-18  6:09 [PART1 RFC v3 00/12] KVM: x86: Introduce SVM AVIC support Suravee Suthikulpanit
2016-03-18  6:09 ` [PART1 RFC v3 01/12] KVM: x86: Misc LAPIC changes to expose helper functions Suravee Suthikulpanit
2016-03-18 11:16   ` Paolo Bonzini
2016-03-18  6:09 ` [PART1 RFC v3 02/12] KVM: x86: Introducing kvm_x86_ops VM init/uninit hooks Suravee Suthikulpanit
2016-03-18 10:11   ` Paolo Bonzini
2016-03-29  5:27     ` Suravee Suthikulpanit
2016-03-29 10:21       ` Paolo Bonzini
2016-03-29 11:47         ` Suravee Suthikulpanit
2016-03-30 10:00           ` Suravee Suthikulpanit
2016-03-30 12:07             ` Paolo Bonzini
2016-03-30 12:18               ` Suravee Suthikulpanit
2016-03-18  6:09 ` [PART1 RFC v3 03/12] KVM: x86: Introducing kvm_x86_ops VCPU blocking/unblocking hooks Suravee Suthikulpanit
2016-03-18 10:11   ` Paolo Bonzini
2016-03-18  6:09 ` [PART1 RFC v3 04/12] KVM: split kvm_vcpu_wake_up from kvm_vcpu_kick Suravee Suthikulpanit
2016-03-18 10:11   ` Paolo Bonzini
2016-03-18  6:09 ` [PART1 RFC v3 05/12] svm: Introduce new AVIC VMCB registers Suravee Suthikulpanit
2016-03-18 10:11   ` Paolo Bonzini
2016-03-18  6:09 ` [PART1 RFC v3 06/12] KVM: x86: Detect and Initialize AVIC support Suravee Suthikulpanit
2016-03-18 11:21   ` Paolo Bonzini
2016-03-18  6:09 ` [PART1 RFC v3 07/12] svm: Add interrupt injection via AVIC Suravee Suthikulpanit
2016-03-18 10:22   ` Paolo Bonzini
2016-04-05 13:26     ` Suravee Suthikulpanit
2016-04-05 15:56       ` Suravee Suthikulpanit
2016-03-18  6:09 ` [PART1 RFC v3 08/12] KVM: x86: Add trace events for AVIC Suravee Suthikulpanit
2016-03-18 10:24   ` Paolo Bonzini
2016-03-28 11:27     ` Suravee Suthikulpanit
2016-03-18  6:09 ` [PART1 RFC v3 09/12] svm: Add VMEXIT handlers " Suravee Suthikulpanit
2016-03-18 11:11   ` Paolo Bonzini
2016-03-18  6:09 ` [PART1 RFC v3 10/12] svm: Do not expose x2APIC when enable AVIC Suravee Suthikulpanit
2016-03-18 20:59   ` Radim Krčmář
2016-03-31  4:15     ` Suravee Suthikulpanit
2016-03-31 11:23       ` Paolo Bonzini
2016-04-05 10:14         ` Suravee Suthikulpanit
2016-03-18  6:09 ` [PART1 RFC v3 11/12] svm: Do not intercept CR8 " Suravee Suthikulpanit
2016-03-18 21:10   ` Radim Krčmář
2016-03-30 12:15     ` Suravee Suthikulpanit
2016-03-18  6:09 ` [PART1 RFC v3 12/12] svm: Manage vcpu load/unload " Suravee Suthikulpanit
2016-03-18 21:44   ` Radim Krčmář
2016-03-31  8:52     ` Suravee Suthikulpanit
2016-03-31 14:19       ` Radim Krčmář
2016-04-05 10:07         ` Suravee Suthikulpanit
2016-04-05 14:56           ` Radim Krčmář
2016-04-06  3:40             ` Suravee Suthikulpanit
2016-04-06 12:36               ` Radim Krčmář

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).