linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PART1 RFC v2 00/10] KVM: x86: Introduce SVM AVIC support
@ 2016-03-04 20:45 Suravee Suthikulpanit
  2016-03-04 20:45 ` [PART1 RFC v2 01/10] KVM: x86: Misc LAPIC changes to exposes helper functions Suravee Suthikulpanit
                   ` (9 more replies)
  0 siblings, 10 replies; 54+ messages in thread
From: Suravee Suthikulpanit @ 2016-03-04 20:45 UTC (permalink / raw)
  To: pbonzini, rkrcmar, joro, bp, gleb, alex.williamson
  Cc: kvm, linux-kernel, wei, sherry.hurwitz, Suravee Suthikulpanit

CHANGES FROM RFCv1:
==================
    * Rebased from tip.git
    * Use the vAPIC backing page as the emulated LAPIC register page.
    * Clarified with HW engineer that Avic sets the IRR of all the cores
      targeted whether they are running or not. It sends doorbells to the
      ones that are running to get them to evaluate their new state.
      If one or more weren't running it does the VMEXIT to indicate that
      the host needs to go run the not running guests. It gives it the
      ID of one of the guests that it observed not running which should
      be enough of a hint for the hypervisor to track them all down.
    * Rewrite the logic for handling AVIC_INCOMPLETE_IPI #vmexit for
      IPI target not running case based on information above.
    * Rewrite the logic in avic_vcpu_load() and avic_set_running().
    * Rewrite the interrupt injection to remove the avic_pending_cnt
      variable.

GITHUB
======
Latest git tree can be found at:
    http://github.com/ssuthiku/linux.git    avic_part1_rfc_v2

OVERVIEW
========
This patch set is the first of the two-part patch series to introduce 
the new AMD Advance Virtual Interrupt Controller (AVIC) support.

Basically, SVM AVIC hardware virtualizes local APIC registers of each
vCPU via the virtual APIC (vAPIC) backing page. This allows guest access
to certain APIC registers without the need to emulate the hardware behavior
in the hypervisor. More information about AVIC can be found in the
AMD64 Architecture Programmer’s Manual Volume 2 - System Programming.

  http://support.amd.com/TechDocs/24593.pdf

For SVM AVIC, we extend the existing kvm_amd driver to:
  * Check CPUID to detect AVIC support in the processor
  * Program new fields in VMCB to enable AVIC
  * Introduce new AVIC data structures and add code to manage them
  * Handle two new AVIC #VMEXITs
  * Add new interrupt intjection code using vAPIC backing page
    instead of the existing V_IRQ, V_INTR_PRIO, V_INTR_VECTOR,
    and V_IGN_TPR fields

Currently, this patch series does not enable AVIC by default.
Users can enable SVM AVIC by specifying avic=1 during insmod kvm-amd.

Later, in part 2, we will introduce the IOMMU AVIC support, which
provides speed up for PCI device pass-through use case by allowing
the IOMMU hardware to inject interrupt directly into the guest via
the vAPIC backing page.

PERFORMANCE RESULTS
===================
Currently, AVIC is supported in the AMD family 15h models 6Xh
(Carrizo) processors. Therefore, it is used to collect the 
perforamance data shown below.

Generaly, SVM AVIC alone (w/o IOMMU AVIC) should provide speedup for
IPI interrupt since hypervisor does not require VMEXIT to inject
these interrupts. Also, it should speed up the case when hypervisor
wants to inject an interrupt into a running guest by setting the
corresponded IRR bit in the vAPIC backing page and trigger
AVIC_DOORBELL MSR.

IPI PERFORMANCE
===============

* BENCHMARK 1: HACKBENCH
For IPI, I have collected some performance number on 2 and 4 CPU running
hackbech with the following detail:

  hackbench -p -l 100000
  Running in process mode with 10 groups using 40 file descriptors each (== 400 tasks)
  Each sender will pass 100000 messages of 100 bytes

                |    2 vcpus     |    4 vcpus 
 ------------------------------------------------
         Vanila |  273.76        |  190.21	
  AVIC disabled |  260.51 (~5%)  |  184.40 (~5%)
           AVIC |  248.53 (~10%) |  155.01 (~20%)

OVERALL PERFORMANCE
===================
Enabling AVIC should helps speeding up workloads, which generate
large amount of interrupts. However, it requires additional logics to
maintain AVIC-specific data structures during vCPU load/unload
due to vcpu scheduling.

The goal is to minimize the overhead of AVIC in most cases, so that
we can achieve equivalent or improvement in overall performance when
enabling AVIC.

* BENCHMARK 1: TAR DECOMPRESSION
This test measures the average running time (of 10 runs) of the following
tar decompression command with 1, 2, and 4 vcpus.

   tar xf linux-4.3.3.tar.xz

                |      4 vcpus 
 ---------------------------------
         Vanila |  10.26  
  AVIC disabled |  10.10 (~1.5%)
           AVIC |  10.07 (~1.8%)

Note: The unit of result below is in seconds (lower is better). 

* BENCHMARK 2: NETPERF w/ virtual network
This test creates a virtual network by setting up bridge and tap device
on the host and pass it into the VM as virtio-net-pci device w/ vhost. 
Then it sets up netserver in the host machine, and run netperf
in the VM with following option:

  netperf -H <netserver ip> -l 60 -t TCP_RR -D 2

                |    1 vcpu      
 ------------------------------------
         Vanila |  21623.887
  AVIC disabled |  21538.09 (~-.4%)
           AVIC |  21712.68 (~0.4%)

Note: The unit of result below is trans/sec (higher is better).

Preliminary result of both benchmarks show AVIC performance are slightly
better than the other two cases.

CURRENT UNTESTED USE-CASES
===========================
    - VM Migration (work in progress)
    - Nested VM

Any feedback and comments are very much appreciated.

Thank you,
Suravee

Suravee Suthikulpanit (10):
  KVM: x86: Misc LAPIC changes to exposes helper functions
  KVM: x86: Introducing kvm_x86_ops VCPU blocking/unblocking
  svm: Introduce new AVIC VMCB registers
  svm: clean up V_TPR, V_IRQ, V_INTR_PRIO, and V_INTR_MASKING
  KVM: x86: Detect and Initialize AVIC support
  svm: Add interrupt injection via AVIC
  svm: Add VMEXIT handlers for AVIC
  svm: Do not expose x2APIC when enable AVIC
  svm: Do not intercept CR8 when enable AVIC
  svm: Manage vcpu load/unload when enable AVIC

 arch/x86/include/asm/kvm_host.h |  19 +-
 arch/x86/include/asm/svm.h      |  38 +-
 arch/x86/include/uapi/asm/svm.h |   9 +-
 arch/x86/kvm/lapic.c            |  51 ++-
 arch/x86/kvm/lapic.h            |   7 +
 arch/x86/kvm/svm.c              | 937 +++++++++++++++++++++++++++++++++++++++-
 6 files changed, 1003 insertions(+), 58 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PART1 RFC v2 01/10] KVM: x86: Misc LAPIC changes to exposes helper functions
  2016-03-04 20:45 [PART1 RFC v2 00/10] KVM: x86: Introduce SVM AVIC support Suravee Suthikulpanit
@ 2016-03-04 20:45 ` Suravee Suthikulpanit
  2016-03-04 20:46 ` [PART1 RFC v2 02/10] KVM: x86: Introducing kvm_x86_ops VCPU blocking/unblocking Suravee Suthikulpanit
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 54+ messages in thread
From: Suravee Suthikulpanit @ 2016-03-04 20:45 UTC (permalink / raw)
  To: pbonzini, rkrcmar, joro, bp, gleb, alex.williamson
  Cc: kvm, linux-kernel, wei, sherry.hurwitz, Suravee Suthikulpanit

From: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>

Exporting LAPIC utility functions and macros to reuse.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
 arch/x86/kvm/lapic.c | 51 +++++++++++++++++++++++++++------------------------
 arch/x86/kvm/lapic.h |  7 +++++++
 2 files changed, 34 insertions(+), 24 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 3a045f3..a8d91b89 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -59,9 +59,8 @@
 /* #define apic_debug(fmt,arg...) printk(KERN_WARNING fmt,##arg) */
 #define apic_debug(fmt, arg...)
 
-#define APIC_LVT_NUM			6
 /* 14 is the version for Xeon and Pentium 8.4.8*/
-#define APIC_VERSION			(0x14UL | ((APIC_LVT_NUM - 1) << 16))
+#define APIC_VERSION			(0x14UL | ((KVM_APIC_LVT_NUM - 1) << 16))
 #define LAPIC_MMIO_LENGTH		(1 << 12)
 /* followed define is not in apicdef.h */
 #define APIC_SHORT_MASK			0xc0000
@@ -94,10 +93,11 @@ bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector)
 		apic_test_vector(vector, apic->regs + APIC_IRR);
 }
 
-static inline void apic_set_vector(int vec, void *bitmap)
+void kvm_lapic_set_vector(int vec, void *bitmap)
 {
 	set_bit(VEC_POS(vec), (bitmap) + REG_POS(vec));
 }
+EXPORT_SYMBOL_GPL(kvm_lapic_set_vector);
 
 static inline void apic_clear_vector(int vec, void *bitmap)
 {
@@ -290,7 +290,7 @@ void kvm_apic_set_version(struct kvm_vcpu *vcpu)
 	apic_set_reg(apic, APIC_LVR, v);
 }
 
-static const unsigned int apic_lvt_mask[APIC_LVT_NUM] = {
+static const unsigned int apic_lvt_mask[KVM_APIC_LVT_NUM] = {
 	LVT_MASK ,      /* part LVTT mask, timer mode mask added at runtime */
 	LVT_MASK | APIC_MODE_MASK,	/* LVTTHMR */
 	LVT_MASK | APIC_MODE_MASK,	/* LVTPC */
@@ -351,7 +351,7 @@ EXPORT_SYMBOL_GPL(kvm_apic_update_irr);
 
 static inline void apic_set_irr(int vec, struct kvm_lapic *apic)
 {
-	apic_set_vector(vec, apic->regs + APIC_IRR);
+	kvm_lapic_set_vector(vec, apic->regs + APIC_IRR);
 	/*
 	 * irr_pending must be true if any interrupt is pending; set it after
 	 * APIC_IRR to avoid race with apic_clear_irr
@@ -674,6 +674,7 @@ bool kvm_apic_match_dest(struct kvm_vcpu *vcpu, struct kvm_lapic *source,
 		return false;
 	}
 }
+EXPORT_SYMBOL_GPL(kvm_apic_match_dest);
 
 bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src,
 		struct kvm_lapic_irq *irq, int *r, unsigned long *dest_map)
@@ -844,7 +845,7 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
 
 		if (apic_test_vector(vector, apic->regs + APIC_TMR) != !!trig_mode) {
 			if (trig_mode)
-				apic_set_vector(vector, apic->regs + APIC_TMR);
+				kvm_lapic_set_vector(vector, apic->regs + APIC_TMR);
 			else
 				apic_clear_vector(vector, apic->regs + APIC_TMR);
 		}
@@ -1109,7 +1110,7 @@ static inline struct kvm_lapic *to_lapic(struct kvm_io_device *dev)
 	return container_of(dev, struct kvm_lapic, dev);
 }
 
-static int apic_reg_read(struct kvm_lapic *apic, u32 offset, int len,
+int kvm_lapic_reg_read(struct kvm_lapic *apic, u32 offset, int len,
 		void *data)
 {
 	unsigned char alignment = offset & 0xf;
@@ -1146,6 +1147,7 @@ static int apic_reg_read(struct kvm_lapic *apic, u32 offset, int len,
 	}
 	return 0;
 }
+EXPORT_SYMBOL_GPL(kvm_lapic_reg_read);
 
 static int apic_mmio_in_range(struct kvm_lapic *apic, gpa_t addr)
 {
@@ -1163,7 +1165,7 @@ static int apic_mmio_read(struct kvm_vcpu *vcpu, struct kvm_io_device *this,
 	if (!apic_mmio_in_range(apic, address))
 		return -EOPNOTSUPP;
 
-	apic_reg_read(apic, offset, len, data);
+	kvm_lapic_reg_read(apic, offset, len, data);
 
 	return 0;
 }
@@ -1348,7 +1350,7 @@ static void apic_manage_nmi_watchdog(struct kvm_lapic *apic, u32 lvt0_val)
 	}
 }
 
-static int apic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
+int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
 {
 	int ret = 0;
 
@@ -1395,7 +1397,7 @@ static int apic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
 			int i;
 			u32 lvt_val;
 
-			for (i = 0; i < APIC_LVT_NUM; i++) {
+			for (i = 0; i < KVM_APIC_LVT_NUM; i++) {
 				lvt_val = kvm_apic_get_reg(apic,
 						       APIC_LVTT + 0x10 * i);
 				apic_set_reg(apic, APIC_LVTT + 0x10 * i,
@@ -1467,7 +1469,7 @@ static int apic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
 
 	case APIC_SELF_IPI:
 		if (apic_x2apic_mode(apic)) {
-			apic_reg_write(apic, APIC_ICR, 0x40000 | (val & 0xff));
+			kvm_lapic_reg_write(apic, APIC_ICR, 0x40000 | (val & 0xff));
 		} else
 			ret = 1;
 		break;
@@ -1479,6 +1481,7 @@ static int apic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
 		apic_debug("Local APIC Write to read-only register %x\n", reg);
 	return ret;
 }
+EXPORT_SYMBOL_GPL(kvm_lapic_reg_write);
 
 static int apic_mmio_write(struct kvm_vcpu *vcpu, struct kvm_io_device *this,
 			    gpa_t address, int len, const void *data)
@@ -1508,7 +1511,7 @@ static int apic_mmio_write(struct kvm_vcpu *vcpu, struct kvm_io_device *this,
 		apic_debug("%s: offset 0x%x with length 0x%x, and value is "
 			   "0x%x\n", __func__, offset, len, val);
 
-	apic_reg_write(apic, offset & 0xff0, val);
+	kvm_lapic_reg_write(apic, offset & 0xff0, val);
 
 	return 0;
 }
@@ -1516,7 +1519,7 @@ static int apic_mmio_write(struct kvm_vcpu *vcpu, struct kvm_io_device *this,
 void kvm_lapic_set_eoi(struct kvm_vcpu *vcpu)
 {
 	if (kvm_vcpu_has_lapic(vcpu))
-		apic_reg_write(vcpu->arch.apic, APIC_EOI, 0);
+		kvm_lapic_reg_write(vcpu->arch.apic, APIC_EOI, 0);
 }
 EXPORT_SYMBOL_GPL(kvm_lapic_set_eoi);
 
@@ -1528,10 +1531,10 @@ void kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset)
 	/* hw has done the conditional check and inst decode */
 	offset &= 0xff0;
 
-	apic_reg_read(vcpu->arch.apic, offset, 4, &val);
+	kvm_lapic_reg_read(vcpu->arch.apic, offset, 4, &val);
 
 	/* TODO: optimize to just emulate side effect w/o one more write */
-	apic_reg_write(vcpu->arch.apic, offset, val);
+	kvm_lapic_reg_write(vcpu->arch.apic, offset, val);
 }
 EXPORT_SYMBOL_GPL(kvm_apic_write_nodecode);
 
@@ -1670,7 +1673,7 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event)
 		kvm_apic_set_id(apic, vcpu->vcpu_id);
 	kvm_apic_set_version(apic->vcpu);
 
-	for (i = 0; i < APIC_LVT_NUM; i++)
+	for (i = 0; i < KVM_APIC_LVT_NUM; i++)
 		apic_set_reg(apic, APIC_LVTT + 0x10 * i, APIC_LVT_MASKED);
 	apic_update_lvtt(apic);
 	if (kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_LINT0_REENABLED))
@@ -2073,8 +2076,8 @@ int kvm_x2apic_msr_write(struct kvm_vcpu *vcpu, u32 msr, u64 data)
 
 	/* if this is ICR write vector before command */
 	if (reg == APIC_ICR)
-		apic_reg_write(apic, APIC_ICR2, (u32)(data >> 32));
-	return apic_reg_write(apic, reg, (u32)data);
+		kvm_lapic_reg_write(apic, APIC_ICR2, (u32)(data >> 32));
+	return kvm_lapic_reg_write(apic, reg, (u32)data);
 }
 
 int kvm_x2apic_msr_read(struct kvm_vcpu *vcpu, u32 msr, u64 *data)
@@ -2091,10 +2094,10 @@ int kvm_x2apic_msr_read(struct kvm_vcpu *vcpu, u32 msr, u64 *data)
 		return 1;
 	}
 
-	if (apic_reg_read(apic, reg, 4, &low))
+	if (kvm_lapic_reg_read(apic, reg, 4, &low))
 		return 1;
 	if (reg == APIC_ICR)
-		apic_reg_read(apic, APIC_ICR2, 4, &high);
+		kvm_lapic_reg_read(apic, APIC_ICR2, 4, &high);
 
 	*data = (((u64)high) << 32) | low;
 
@@ -2110,8 +2113,8 @@ int kvm_hv_vapic_msr_write(struct kvm_vcpu *vcpu, u32 reg, u64 data)
 
 	/* if this is ICR write vector before command */
 	if (reg == APIC_ICR)
-		apic_reg_write(apic, APIC_ICR2, (u32)(data >> 32));
-	return apic_reg_write(apic, reg, (u32)data);
+		kvm_lapic_reg_write(apic, APIC_ICR2, (u32)(data >> 32));
+	return kvm_lapic_reg_write(apic, reg, (u32)data);
 }
 
 int kvm_hv_vapic_msr_read(struct kvm_vcpu *vcpu, u32 reg, u64 *data)
@@ -2122,10 +2125,10 @@ int kvm_hv_vapic_msr_read(struct kvm_vcpu *vcpu, u32 reg, u64 *data)
 	if (!kvm_vcpu_has_lapic(vcpu))
 		return 1;
 
-	if (apic_reg_read(apic, reg, 4, &low))
+	if (kvm_lapic_reg_read(apic, reg, 4, &low))
 		return 1;
 	if (reg == APIC_ICR)
-		apic_reg_read(apic, APIC_ICR2, 4, &high);
+		kvm_lapic_reg_read(apic, APIC_ICR2, 4, &high);
 
 	*data = (((u64)high) << 32) | low;
 
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 41bdb35..936fd10 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -7,6 +7,7 @@
 
 #define KVM_APIC_INIT		0
 #define KVM_APIC_SIPI		1
+#define KVM_APIC_LVT_NUM	6
 
 struct kvm_timer {
 	struct hrtimer timer;
@@ -56,6 +57,12 @@ void kvm_lapic_set_eoi(struct kvm_vcpu *vcpu);
 void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value);
 u64 kvm_lapic_get_base(struct kvm_vcpu *vcpu);
 void kvm_apic_set_version(struct kvm_vcpu *vcpu);
+void kvm_lapic_set_vector(int vec, void *bitmap);
+int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val);
+int kvm_lapic_reg_read(struct kvm_lapic *apic, u32 offset, int len,
+		       void *data);
+bool kvm_apic_match_dest(struct kvm_vcpu *vcpu, struct kvm_lapic *source,
+			   int short_hand, unsigned int dest, int dest_mode);
 
 void __kvm_apic_update_irr(u32 *pir, void *regs);
 void kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PART1 RFC v2 02/10] KVM: x86: Introducing kvm_x86_ops VCPU blocking/unblocking
  2016-03-04 20:45 [PART1 RFC v2 00/10] KVM: x86: Introduce SVM AVIC support Suravee Suthikulpanit
  2016-03-04 20:45 ` [PART1 RFC v2 01/10] KVM: x86: Misc LAPIC changes to exposes helper functions Suravee Suthikulpanit
@ 2016-03-04 20:46 ` Suravee Suthikulpanit
  2016-03-07 15:42   ` Paolo Bonzini
  2016-03-04 20:46 ` [PART1 RFC v2 03/10] svm: Introduce new AVIC VMCB registers Suravee Suthikulpanit
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 54+ messages in thread
From: Suravee Suthikulpanit @ 2016-03-04 20:46 UTC (permalink / raw)
  To: pbonzini, rkrcmar, joro, bp, gleb, alex.williamson
  Cc: kvm, linux-kernel, wei, sherry.hurwitz, Suravee Suthikulpanit,
	Suravee Suthikulpanit

This patch add new function hooks to the struct kvm_x86_ops,
and calling them from the kvm_arch_vcpu[blocking/unblocking].

This will be used later on by SVM AVIC code.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
 arch/x86/include/asm/kvm_host.h | 17 +++++++++++++++--
 arch/x86/kvm/svm.c              | 12 ++++++++++++
 2 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 44adbb8..9a61669 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -969,6 +969,10 @@ struct kvm_x86_ops {
 	 */
 	int (*pre_block)(struct kvm_vcpu *vcpu);
 	void (*post_block)(struct kvm_vcpu *vcpu);
+
+	void (*vcpu_blocking)(struct kvm_vcpu *vcpu);
+	void (*vcpu_unblocking)(struct kvm_vcpu *vcpu);
+
 	int (*update_pi_irte)(struct kvm *kvm, unsigned int host_irq,
 			      uint32_t guest_irq, bool set);
 };
@@ -1320,7 +1324,16 @@ bool kvm_intr_is_single_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq,
 void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
 		     struct kvm_lapic_irq *irq);
 
-static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
-static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
+static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu)
+{
+	if (kvm_x86_ops->vcpu_blocking)
+		kvm_x86_ops->vcpu_blocking(vcpu);
+}
+
+static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu)
+{
+	if (kvm_x86_ops->vcpu_unblocking)
+		kvm_x86_ops->vcpu_unblocking(vcpu);
+}
 
 #endif /* _ASM_X86_KVM_HOST_H */
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index c13a64b..28f8618 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1265,6 +1265,16 @@ static void svm_vcpu_put(struct kvm_vcpu *vcpu)
 		wrmsrl(host_save_user_msrs[i], svm->host_user_msrs[i]);
 }
 
+static void svm_vcpu_blocking(struct kvm_vcpu *vcpu)
+{
+	avic_set_running(vcpu, false);
+}
+
+static void svm_vcpu_unblocking(struct kvm_vcpu *vcpu)
+{
+	avic_set_running(vcpu, true);
+}
+
 static unsigned long svm_get_rflags(struct kvm_vcpu *vcpu)
 {
 	return to_svm(vcpu)->vmcb->save.rflags;
@@ -4321,6 +4331,8 @@ static struct kvm_x86_ops svm_x86_ops = {
 	.prepare_guest_switch = svm_prepare_guest_switch,
 	.vcpu_load = svm_vcpu_load,
 	.vcpu_put = svm_vcpu_put,
+	.vcpu_blocking = svm_vcpu_blocking,
+	.vcpu_unblocking = svm_vcpu_unblocking,
 
 	.update_bp_intercept = update_bp_intercept,
 	.get_msr = svm_get_msr,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PART1 RFC v2 03/10] svm: Introduce new AVIC VMCB registers
  2016-03-04 20:45 [PART1 RFC v2 00/10] KVM: x86: Introduce SVM AVIC support Suravee Suthikulpanit
  2016-03-04 20:45 ` [PART1 RFC v2 01/10] KVM: x86: Misc LAPIC changes to exposes helper functions Suravee Suthikulpanit
  2016-03-04 20:46 ` [PART1 RFC v2 02/10] KVM: x86: Introducing kvm_x86_ops VCPU blocking/unblocking Suravee Suthikulpanit
@ 2016-03-04 20:46 ` Suravee Suthikulpanit
  2016-03-07 15:44   ` Paolo Bonzini
  2016-03-04 20:46 ` [PART1 RFC v2 04/10] svm: clean up V_TPR, V_IRQ, V_INTR_PRIO, and V_INTR_MASKING Suravee Suthikulpanit
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 54+ messages in thread
From: Suravee Suthikulpanit @ 2016-03-04 20:46 UTC (permalink / raw)
  To: pbonzini, rkrcmar, joro, bp, gleb, alex.williamson
  Cc: kvm, linux-kernel, wei, sherry.hurwitz, Suravee Suthikulpanit

From: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>

Introduce new AVIC VMCB registers. Also breakdown int_ctl register
into bit-field for ease of use.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
 arch/x86/include/asm/svm.h | 29 ++++++++++++++++++++++++-----
 1 file changed, 24 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 6136d99..db5d7af 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -67,10 +67,24 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
 	u32 asid;
 	u8 tlb_ctl;
 	u8 reserved_2[3];
-	u32 int_ctl;
+	union { /* Offset 0x60 */
+		u32 int_ctl;
+
+		struct __attribute__ ((__packed__)) {
+			u32 v_tpr		: 8,
+			    v_irq		: 1,
+			    reserved_3		: 7,
+			    v_intr_prio		: 4,
+			    v_ign_tpr		: 1,
+			    reserved_4		: 3,
+			    v_intr_masking	: 1,
+			    reserved_5		: 6,
+			    avic_enable		: 1;
+		};
+	};
 	u32 int_vector;
 	u32 int_state;
-	u8 reserved_3[4];
+	u8 reserved_6[4];
 	u32 exit_code;
 	u32 exit_code_hi;
 	u64 exit_info_1;
@@ -78,17 +92,22 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
 	u32 exit_int_info;
 	u32 exit_int_info_err;
 	u64 nested_ctl;
-	u8 reserved_4[16];
+	u64 avic_vapic_bar;
+	u8 reserved_7[8];
 	u32 event_inj;
 	u32 event_inj_err;
 	u64 nested_cr3;
 	u64 lbr_ctl;
 	u32 clean;
-	u32 reserved_5;
+	u32 reserved_8;
 	u64 next_rip;
 	u8 insn_len;
 	u8 insn_bytes[15];
-	u8 reserved_6[800];
+	u64 avic_bk_page;	/* Offset 0xe0 */
+	u8 reserved_9[8];	/* Offset 0xe8 */
+	u64 avic_log_apic_id;	/* Offset 0xf0 */
+	u64 avic_phy_apic_id;	/* Offset 0xf8 */
+	u8 reserved_10[768];
 };
 
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PART1 RFC v2 04/10] svm: clean up V_TPR, V_IRQ, V_INTR_PRIO, and V_INTR_MASKING
  2016-03-04 20:45 [PART1 RFC v2 00/10] KVM: x86: Introduce SVM AVIC support Suravee Suthikulpanit
                   ` (2 preceding siblings ...)
  2016-03-04 20:46 ` [PART1 RFC v2 03/10] svm: Introduce new AVIC VMCB registers Suravee Suthikulpanit
@ 2016-03-04 20:46 ` Suravee Suthikulpanit
  2016-03-04 20:46 ` [PART1 RFC v2 05/10] KVM: x86: Detect and Initialize AVIC support Suravee Suthikulpanit
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 54+ messages in thread
From: Suravee Suthikulpanit @ 2016-03-04 20:46 UTC (permalink / raw)
  To: pbonzini, rkrcmar, joro, bp, gleb, alex.williamson
  Cc: kvm, linux-kernel, wei, sherry.hurwitz, Suravee Suthikulpanit

From: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>

Now that we have defined the bit field, use them to replace existing
macros. This patch should not have functional change.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
 arch/x86/include/asm/svm.h |  9 ---------
 arch/x86/kvm/svm.c         | 24 ++++++++++++------------
 2 files changed, 12 insertions(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index db5d7af..7bb34c9 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -118,18 +118,9 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
 
 #define V_TPR_MASK 0x0f
 
-#define V_IRQ_SHIFT 8
-#define V_IRQ_MASK (1 << V_IRQ_SHIFT)
-
-#define V_INTR_PRIO_SHIFT 16
-#define V_INTR_PRIO_MASK (0x0f << V_INTR_PRIO_SHIFT)
-
 #define V_IGN_TPR_SHIFT 20
 #define V_IGN_TPR_MASK (1 << V_IGN_TPR_SHIFT)
 
-#define V_INTR_MASKING_SHIFT 24
-#define V_INTR_MASKING_MASK (1 << V_INTR_MASKING_SHIFT)
-
 #define SVM_INTERRUPT_SHADOW_MASK 1
 
 #define SVM_IOIO_STR_SHIFT 2
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 28f8618..6ab66d0 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1052,7 +1052,7 @@ static void init_vmcb(struct vcpu_svm *svm)
 
 	control->iopm_base_pa = iopm_base;
 	control->msrpm_base_pa = __pa(svm->msrpm);
-	control->int_ctl = V_INTR_MASKING_MASK;
+	control->v_intr_masking = 1;
 
 	init_seg(&save->es);
 	init_seg(&save->ss);
@@ -2316,7 +2316,7 @@ static int nested_svm_vmexit(struct vcpu_svm *svm)
 
 	/* We always set V_INTR_MASKING and remember the old value in hflags */
 	if (!(svm->vcpu.arch.hflags & HF_VINTR_MASK))
-		nested_vmcb->control.int_ctl &= ~V_INTR_MASKING_MASK;
+		nested_vmcb->control.v_intr_masking = 0;
 
 	/* Restore the original control entries */
 	copy_vmcb_control_area(vmcb, hsave);
@@ -2526,8 +2526,9 @@ static bool nested_svm_vmrun(struct vcpu_svm *svm)
 	svm->nested.intercept            = nested_vmcb->control.intercept;
 
 	svm_flush_tlb(&svm->vcpu);
-	svm->vmcb->control.int_ctl = nested_vmcb->control.int_ctl | V_INTR_MASKING_MASK;
-	if (nested_vmcb->control.int_ctl & V_INTR_MASKING_MASK)
+	svm->vmcb->control.int_ctl = nested_vmcb->control.int_ctl;
+	svm->vmcb->control.v_intr_masking = 1;
+	if (nested_vmcb->control.v_intr_masking)
 		svm->vcpu.arch.hflags |= HF_VINTR_MASK;
 	else
 		svm->vcpu.arch.hflags &= ~HF_VINTR_MASK;
@@ -2680,7 +2681,7 @@ static int clgi_interception(struct vcpu_svm *svm)
 
 	/* After a CLGI no interrupts should come */
 	svm_clear_vintr(svm);
-	svm->vmcb->control.int_ctl &= ~V_IRQ_MASK;
+	svm->vmcb->control.v_irq = 0;
 
 	mark_dirty(svm->vmcb, VMCB_INTR);
 
@@ -3257,7 +3258,7 @@ static int interrupt_window_interception(struct vcpu_svm *svm)
 {
 	kvm_make_request(KVM_REQ_EVENT, &svm->vcpu);
 	svm_clear_vintr(svm);
-	svm->vmcb->control.int_ctl &= ~V_IRQ_MASK;
+	svm->vmcb->control.v_irq = 0;
 	mark_dirty(svm->vmcb, VMCB_INTR);
 	++svm->vcpu.stat.irq_window_exits;
 	return 1;
@@ -3568,11 +3569,11 @@ static inline void svm_inject_irq(struct vcpu_svm *svm, int irq)
 {
 	struct vmcb_control_area *control;
 
+
 	control = &svm->vmcb->control;
 	control->int_vector = irq;
-	control->int_ctl &= ~V_INTR_PRIO_MASK;
-	control->int_ctl |= V_IRQ_MASK |
-		((/*control->int_vector >> 4*/ 0xf) << V_INTR_PRIO_SHIFT);
+	control->v_intr_prio = 0xf;
+	control->v_irq = 1;
 	mark_dirty(svm->vmcb, VMCB_INTR);
 }
 
@@ -3738,7 +3739,7 @@ static inline void sync_cr8_to_lapic(struct kvm_vcpu *vcpu)
 		return;
 
 	if (!is_cr_intercept(svm, INTERCEPT_CR8_WRITE)) {
-		int cr8 = svm->vmcb->control.int_ctl & V_TPR_MASK;
+		int cr8 = svm->vmcb->control.v_tpr & V_TPR_MASK;
 		kvm_set_cr8(vcpu, cr8);
 	}
 }
@@ -3752,8 +3753,7 @@ static inline void sync_lapic_to_cr8(struct kvm_vcpu *vcpu)
 		return;
 
 	cr8 = kvm_get_cr8(vcpu);
-	svm->vmcb->control.int_ctl &= ~V_TPR_MASK;
-	svm->vmcb->control.int_ctl |= cr8 & V_TPR_MASK;
+	svm->vmcb->control.v_tpr = cr8 & V_TPR_MASK;
 }
 
 static void svm_complete_interrupts(struct vcpu_svm *svm)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PART1 RFC v2 05/10] KVM: x86: Detect and Initialize AVIC support
  2016-03-04 20:45 [PART1 RFC v2 00/10] KVM: x86: Introduce SVM AVIC support Suravee Suthikulpanit
                   ` (3 preceding siblings ...)
  2016-03-04 20:46 ` [PART1 RFC v2 04/10] svm: clean up V_TPR, V_IRQ, V_INTR_PRIO, and V_INTR_MASKING Suravee Suthikulpanit
@ 2016-03-04 20:46 ` Suravee Suthikulpanit
  2016-03-07 16:41   ` Paolo Bonzini
  2016-03-04 20:46 ` [PART1 RFC v2 06/10] svm: Add interrupt injection via AVIC Suravee Suthikulpanit
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 54+ messages in thread
From: Suravee Suthikulpanit @ 2016-03-04 20:46 UTC (permalink / raw)
  To: pbonzini, rkrcmar, joro, bp, gleb, alex.williamson
  Cc: kvm, linux-kernel, wei, sherry.hurwitz, Suravee Suthikulpanit,
	Suravee Suthikulpanit

This patch introduces AVIC-related data structure, and AVIC
initialization code.

There are three main data structures for AVIC:
    * Virtual APIC (vAPIC) backing page (per-VCPU)
    * Physical APIC ID table (per-VM)
    * Logical APIC ID table (per-VM)

In order to accommodate the new per-VM tables, we introduce
a new per-VM arch-specific void pointer, struct kvm_arch.arch_data.
This will point to the newly introduced struct svm_vm_data.

Currently, AVIC is disabled by default. Users can manually
enable AVIC via kernel boot option kvm-amd.avic=1 or during
kvm-amd module loading with parameter avic=1.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
 arch/x86/include/asm/kvm_host.h |   2 +
 arch/x86/kvm/svm.c              | 416 +++++++++++++++++++++++++++++++++++++++-
 2 files changed, 417 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9a61669..15f6cd8 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -754,6 +754,8 @@ struct kvm_arch {
 
 	bool irqchip_split;
 	u8 nr_reserved_ioapic_pins;
+
+	void *arch_data;
 };
 
 struct kvm_vm_stat {
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 6ab66d0..c703149 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -14,6 +14,9 @@
  * the COPYING file in the top-level directory.
  *
  */
+
+#define pr_fmt(fmt) "SVM: " fmt
+
 #include <linux/kvm_host.h>
 
 #include "irq.h"
@@ -78,6 +81,11 @@ MODULE_DEVICE_TABLE(x86cpu, svm_cpu_id);
 #define TSC_RATIO_MIN		0x0000000000000001ULL
 #define TSC_RATIO_MAX		0x000000ffffffffffULL
 
+#define AVIC_HPA_MASK	~((0xFFFULL << 52) || 0xFFF)
+
+/* NOTE: Current max index allowed for physical APIC ID table is 255 */
+#define AVIC_PHY_APIC_ID_MAX	0xFF
+
 static bool erratum_383_found __read_mostly;
 
 static const u32 host_save_user_msrs[] = {
@@ -162,6 +170,37 @@ struct vcpu_svm {
 
 	/* cached guest cpuid flags for faster access */
 	bool nrips_enabled	: 1;
+
+	struct page *avic_bk_page;
+	void *in_kernel_lapic_regs;
+};
+
+struct __attribute__ ((__packed__))
+svm_avic_log_ait_entry {
+	u32 guest_phy_apic_id	: 8,
+	    res			: 23,
+	    valid		: 1;
+};
+
+struct __attribute__ ((__packed__))
+svm_avic_phy_ait_entry {
+	u64 host_phy_apic_id	: 8,
+	    res1		: 4,
+	    bk_pg_ptr		: 40,
+	    res2		: 10,
+	    is_running		: 1,
+	    valid		: 1;
+};
+
+/* Note: This structure is per VM */
+struct svm_vm_data {
+	atomic_t count;
+	u32 ldr_mode;
+	u32 avic_max_vcpu_id;
+	u32 avic_tag;
+
+	struct page *avic_log_ait_page;
+	struct page *avic_phy_ait_page;
 };
 
 static DEFINE_PER_CPU(u64, current_tsc_ratio);
@@ -205,6 +244,10 @@ module_param(npt, int, S_IRUGO);
 static int nested = true;
 module_param(nested, int, S_IRUGO);
 
+/* enable / disable AVIC */
+static int avic;
+module_param(avic, int, S_IRUGO);
+
 static void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0);
 static void svm_flush_tlb(struct kvm_vcpu *vcpu);
 static void svm_complete_interrupts(struct vcpu_svm *svm);
@@ -234,6 +277,13 @@ enum {
 /* TPR and CR2 are always written before VMRUN */
 #define VMCB_ALWAYS_DIRTY_MASK	((1U << VMCB_INTR) | (1U << VMCB_CR2))
 
+#define VMCB_AVIC_APIC_BAR_MASK		0xFFFFFFFFFF000ULL
+
+static inline void avic_update_vapic_bar(struct vcpu_svm *svm, u64 data)
+{
+	svm->vmcb->control.avic_vapic_bar = data & VMCB_AVIC_APIC_BAR_MASK;
+}
+
 static inline void mark_all_dirty(struct vmcb *vmcb)
 {
 	vmcb->control.clean = 0;
@@ -923,6 +973,13 @@ static __init int svm_hardware_setup(void)
 	} else
 		kvm_disable_tdp();
 
+	if (avic && (!npt_enabled || !boot_cpu_has(X86_FEATURE_AVIC)))
+		avic = false;
+
+	if (avic) {
+		printk(KERN_INFO "kvm: AVIC enabled\n");
+	}
+
 	return 0;
 
 err:
@@ -1000,6 +1057,41 @@ static void svm_adjust_tsc_offset_guest(struct kvm_vcpu *vcpu, s64 adjustment)
 	mark_dirty(svm->vmcb, VMCB_INTERCEPTS);
 }
 
+static void avic_init_vmcb(struct vcpu_svm *svm)
+{
+	void *vapic_bkpg;
+	struct vmcb *vmcb = svm->vmcb;
+	struct svm_vm_data *vm_data = svm->vcpu.kvm->arch.arch_data;
+	phys_addr_t bpa = PFN_PHYS(page_to_pfn(svm->avic_bk_page));
+	phys_addr_t lpa = PFN_PHYS(page_to_pfn(vm_data->avic_log_ait_page));
+	phys_addr_t ppa = PFN_PHYS(page_to_pfn(vm_data->avic_phy_ait_page));
+
+	if (!vmcb)
+		return;
+
+	pr_debug("%s: bpa=%#llx, lpa=%#llx, ppa=%#llx\n",
+		 __func__, (unsigned long long)bpa,
+		 (unsigned long long)lpa, (unsigned long long)ppa);
+
+
+	/*
+	 * Here we copy the value from the emulated LAPIC register
+	 * to the vAPIC backign page, and then flip regs frame.
+	 */
+	if (!svm->in_kernel_lapic_regs)
+		svm->in_kernel_lapic_regs = svm->vcpu.arch.apic->regs;
+
+	vapic_bkpg = pfn_to_kaddr(page_to_pfn(svm->avic_bk_page));
+	memcpy(vapic_bkpg, svm->in_kernel_lapic_regs, PAGE_SIZE);
+	svm->vcpu.arch.apic->regs = vapic_bkpg;
+
+	vmcb->control.avic_bk_page = bpa & AVIC_HPA_MASK;
+	vmcb->control.avic_log_apic_id = lpa & AVIC_HPA_MASK;
+	vmcb->control.avic_phy_apic_id = ppa & AVIC_HPA_MASK;
+	vmcb->control.avic_phy_apic_id |= AVIC_PHY_APIC_ID_MAX;
+	vmcb->control.avic_enable = 1;
+}
+
 static void init_vmcb(struct vcpu_svm *svm)
 {
 	struct vmcb_control_area *control = &svm->vmcb->control;
@@ -1113,6 +1205,293 @@ static void init_vmcb(struct vcpu_svm *svm)
 	mark_all_dirty(svm->vmcb);
 
 	enable_gif(svm);
+
+	if (avic)
+		avic_init_vmcb(svm);
+}
+
+static struct svm_avic_phy_ait_entry *
+avic_get_phy_ait_entry(struct kvm_vcpu *vcpu, int index)
+{
+	struct svm_avic_phy_ait_entry *avic_phy_ait;
+	struct svm_vm_data *vm_data = vcpu->kvm->arch.arch_data;
+
+	if (!vm_data)
+		return NULL;
+
+	/* Note: APIC ID = 0xff is used for broadcast.
+	 *       APIC ID > 0xff is reserved.
+	 */
+	if (index >= 0xff)
+		return NULL;
+
+	avic_phy_ait = page_address(vm_data->avic_phy_ait_page);
+
+	return &avic_phy_ait[index];
+}
+
+struct svm_avic_log_ait_entry *
+avic_get_log_ait_entry(struct kvm_vcpu *vcpu, u8 mda, bool is_flat)
+{
+	struct svm_vm_data *vm_data = vcpu->kvm->arch.arch_data;
+	int index;
+	struct svm_avic_log_ait_entry *avic_log_ait;
+
+	if (!vm_data)
+		return NULL;
+
+	if (is_flat) { /* flat */
+		if (mda > 7)
+			return NULL;
+		index = mda;
+	} else { /* cluster */
+		int apic_id = mda & 0xf;
+		int cluster_id = (mda & 0xf0) >> 8;
+
+		if (apic_id > 4 || cluster_id >= 0xf)
+			return NULL;
+		index = (cluster_id << 2) + apic_id;
+	}
+	avic_log_ait = (struct svm_avic_log_ait_entry *)
+				page_address(vm_data->avic_log_ait_page);
+
+	return &avic_log_ait[index];
+}
+
+static inline void
+avic_set_bk_page_entry(struct vcpu_svm *svm, int reg_off, u32 val)
+{
+	void *avic_bk = page_address(svm->avic_bk_page);
+
+	*((u32 *) (avic_bk + reg_off)) = val;
+}
+
+static inline u32 *avic_get_bk_page_entry(struct vcpu_svm *svm, u32 offset)
+{
+	char *tmp = (char *)page_address(svm->avic_bk_page);
+
+	return (u32 *)(tmp+offset);
+}
+
+static int avic_init_log_apic_entry(struct kvm_vcpu *vcpu, u8 g_phy_apic_id,
+				    u8 log_apic_id)
+{
+	u32 mod;
+	struct svm_avic_log_ait_entry *entry;
+	struct vcpu_svm *svm = to_svm(vcpu);
+
+	if (!svm)
+		return -EINVAL;
+
+	mod = (*avic_get_bk_page_entry(svm, APIC_DFR) >> 28) & 0xf;
+	entry = avic_get_log_ait_entry(vcpu, log_apic_id, (mod == 0xf));
+	if (!entry)
+		return -EINVAL;
+	entry->guest_phy_apic_id = g_phy_apic_id;
+	entry->valid = 1;
+
+	return 0;
+}
+
+static int avic_init_bk_page(struct kvm_vcpu *vcpu)
+{
+	u64 addr;
+	struct page *page;
+	int id = vcpu->vcpu_id;
+	struct kvm *kvm = vcpu->kvm;
+	struct vcpu_svm *svm = to_svm(vcpu);
+
+	addr = APIC_DEFAULT_PHYS_BASE + (id * PAGE_SIZE);
+	page = gfn_to_page(kvm, addr >> PAGE_SHIFT);
+	if (is_error_page(page))
+		return -EFAULT;
+
+	/*
+	 * Do not pin the page in memory, so that memory hot-unplug
+	 * is able to migrate it.
+	 */
+	put_page(page);
+
+	/* Setting up AVIC Backing Page */
+	svm->avic_bk_page = page;
+
+	clear_page(kmap(page));
+	pr_debug("%s: vAPIC bk page: cpu=%u, addr=%#llx, pa=%#llx\n",
+		 __func__, id, addr,
+		 (unsigned long long) PFN_PHYS(page_to_pfn(page)));
+
+	avic_init_vmcb(svm);
+
+	return 0;
+}
+
+static inline void avic_unalloc_bk_page(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+
+	if (svm->avic_bk_page)
+		kunmap(svm->avic_bk_page);
+}
+
+static int avic_alloc_bk_page(struct vcpu_svm *svm, int id)
+{
+	int ret = 0, i;
+	bool realloc = false;
+	struct kvm_vcpu *vcpu;
+	struct kvm *kvm = svm->vcpu.kvm;
+	struct svm_vm_data *vm_data = kvm->arch.arch_data;
+
+	mutex_lock(&kvm->slots_lock);
+
+	/* Check if we have already allocated vAPIC backing
+	 * page for this vCPU. If not, we need to realloc
+	 * a new one and re-assign all other vCPU.
+	 */
+	if (kvm->arch.apic_access_page_done &&
+	    (id > vm_data->avic_max_vcpu_id)) {
+		kvm_for_each_vcpu(i, vcpu, kvm)
+			avic_unalloc_bk_page(vcpu);
+
+		__x86_set_memory_region(kvm, APIC_ACCESS_PAGE_PRIVATE_MEMSLOT,
+					0, 0);
+		realloc = true;
+		vm_data->avic_max_vcpu_id = 0;
+	}
+
+	/*
+	 * We are allocating vAPIC backing page
+	 * upto the max vCPU ID
+	 */
+	if (id >= vm_data->avic_max_vcpu_id) {
+		ret = __x86_set_memory_region(kvm,
+					      APIC_ACCESS_PAGE_PRIVATE_MEMSLOT,
+					      APIC_DEFAULT_PHYS_BASE,
+					      PAGE_SIZE * (id + 1));
+		if (ret)
+			goto out;
+
+		vm_data->avic_max_vcpu_id = id;
+	}
+
+	/* Reinit vAPIC backing page for exisinting vcpus */
+	if (realloc)
+		kvm_for_each_vcpu(i, vcpu, kvm)
+			avic_init_bk_page(vcpu);
+
+	avic_init_bk_page(&svm->vcpu);
+
+	kvm->arch.apic_access_page_done = true;
+
+out:
+	mutex_unlock(&kvm->slots_lock);
+	return ret;
+}
+
+static void avic_vm_uninit(struct kvm *kvm)
+{
+	struct svm_vm_data *vm_data = kvm->arch.arch_data;
+
+	if (!vm_data)
+		return;
+
+	if (vm_data->avic_log_ait_page)
+		__free_page(vm_data->avic_log_ait_page);
+	if (vm_data->avic_phy_ait_page)
+		__free_page(vm_data->avic_phy_ait_page);
+	kfree(vm_data);
+	kvm->arch.arch_data = NULL;
+}
+
+static void avic_vcpu_uninit(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+	struct svm_vm_data *vm_data = vcpu->kvm->arch.arch_data;
+
+	if (!avic)
+		return;
+
+	avic_unalloc_bk_page(vcpu);
+	svm->vcpu.arch.apic->regs = svm->in_kernel_lapic_regs;
+	svm->in_kernel_lapic_regs = NULL;
+
+	if (vm_data &&
+	    (atomic_read(&vm_data->count) == 0 ||
+	     atomic_dec_and_test(&vm_data->count)))
+		avic_vm_uninit(vcpu->kvm);
+}
+
+static atomic_t avic_tag_gen = ATOMIC_INIT(1);
+
+static inline u32 avic_get_next_tag(void)
+{
+	u32 tag = atomic_read(&avic_tag_gen);
+
+	atomic_inc(&avic_tag_gen);
+	return tag;
+}
+
+static int avic_vm_init(struct kvm *kvm)
+{
+	int err = -ENOMEM;
+	struct svm_vm_data *vm_data;
+	struct page *avic_phy_ait_page;
+	struct page *avic_log_ait_page;
+
+	vm_data = kzalloc(sizeof(struct svm_vm_data),
+				      GFP_KERNEL);
+	if (!vm_data)
+		return err;
+
+	kvm->arch.arch_data = vm_data;
+	atomic_set(&vm_data->count, 0);
+
+	/* Allocating physical APIC ID table (4KB) */
+	avic_phy_ait_page = alloc_page(GFP_KERNEL);
+	if (!avic_phy_ait_page)
+		goto free_avic;
+
+	vm_data->avic_phy_ait_page = avic_phy_ait_page;
+	clear_page(page_address(avic_phy_ait_page));
+
+	/* Allocating logical APIC ID table (4KB) */
+	avic_log_ait_page = alloc_page(GFP_KERNEL);
+	if (!avic_log_ait_page)
+		goto free_avic;
+
+	vm_data->avic_log_ait_page = avic_log_ait_page;
+	clear_page(page_address(avic_log_ait_page));
+
+	vm_data->avic_tag = avic_get_next_tag();
+
+	return 0;
+
+free_avic:
+	avic_vm_uninit(kvm);
+	return err;
+}
+
+static int avic_vcpu_init(struct kvm *kvm, struct vcpu_svm *svm, int id)
+{
+	int err;
+	struct svm_vm_data *vm_data = NULL;
+
+	/* Note: svm_vm_data is per VM */
+	if (!kvm->arch.arch_data) {
+		err = avic_vm_init(kvm);
+		if (err)
+			return err;
+	}
+
+	err = avic_alloc_bk_page(svm, id);
+	if (err) {
+		avic_vcpu_uninit(&svm->vcpu);
+		return err;
+	}
+
+	vm_data = kvm->arch.arch_data;
+	atomic_inc(&vm_data->count);
+
+	return 0;
 }
 
 static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
@@ -1131,6 +1510,9 @@ static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 
 	kvm_cpuid(vcpu, &eax, &dummy, &dummy, &dummy);
 	kvm_register_write(vcpu, VCPU_REGS_RDX, eax);
+
+	if (avic && !init_event)
+		avic_update_vapic_bar(svm, APIC_DEFAULT_PHYS_BASE);
 }
 
 static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id)
@@ -1169,6 +1551,12 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id)
 	if (!hsave_page)
 		goto free_page3;
 
+	if (avic) {
+		err = avic_vcpu_init(kvm, svm, id);
+		if (err)
+			goto free_page4;
+	}
+
 	svm->nested.hsave = page_address(hsave_page);
 
 	svm->msrpm = page_address(msrpm_pages);
@@ -1187,6 +1575,8 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id)
 
 	return &svm->vcpu;
 
+free_page4:
+	__free_page(hsave_page);
 free_page3:
 	__free_pages(nested_msrpm_pages, MSRPM_ALLOC_ORDER);
 free_page2:
@@ -1209,6 +1599,7 @@ static void svm_free_vcpu(struct kvm_vcpu *vcpu)
 	__free_pages(virt_to_page(svm->msrpm), MSRPM_ALLOC_ORDER);
 	__free_page(virt_to_page(svm->nested.hsave));
 	__free_pages(virt_to_page(svm->nested.msrpm), MSRPM_ALLOC_ORDER);
+	avic_vcpu_uninit(vcpu);
 	kvm_vcpu_uninit(vcpu);
 	kmem_cache_free(kvm_vcpu_cache, svm);
 }
@@ -3382,6 +3773,7 @@ static void dump_vmcb(struct kvm_vcpu *vcpu)
 	pr_err("%-20s%08x\n", "exit_int_info_err:", control->exit_int_info_err);
 	pr_err("%-20s%lld\n", "nested_ctl:", control->nested_ctl);
 	pr_err("%-20s%016llx\n", "nested_cr3:", control->nested_cr3);
+	pr_err("%-20s%016llx\n", "avic_vapic_bar:", control->avic_vapic_bar);
 	pr_err("%-20s%08x\n", "event_inj:", control->event_inj);
 	pr_err("%-20s%08x\n", "event_inj_err:", control->event_inj_err);
 	pr_err("%-20s%lld\n", "lbr_ctl:", control->lbr_ctl);
@@ -3613,11 +4005,31 @@ static void svm_set_virtual_x2apic_mode(struct kvm_vcpu *vcpu, bool set)
 
 static bool svm_get_enable_apicv(void)
 {
-	return false;
+	return avic;
+}
+
+static void svm_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr)
+{
 }
 
+static void svm_hwapic_isr_update(struct kvm *kvm, int isr)
+{
+}
+
+/* Note: Currently only used by Hyper-V. */
 static void svm_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
 {
+	struct vcpu_svm *svm = to_svm(vcpu);
+	struct vmcb *vmcb = svm->vmcb;
+
+	if (!avic)
+		return;
+
+	if (!svm->in_kernel_lapic_regs)
+		return;
+
+	svm->vcpu.arch.apic->regs = svm->in_kernel_lapic_regs;
+	vmcb->control.avic_enable = 0;
 }
 
 static void svm_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)
@@ -4387,6 +4799,8 @@ static struct kvm_x86_ops svm_x86_ops = {
 	.refresh_apicv_exec_ctrl = svm_refresh_apicv_exec_ctrl,
 	.load_eoi_exitmap = svm_load_eoi_exitmap,
 	.sync_pir_to_irr = svm_sync_pir_to_irr,
+	.hwapic_irr_update = svm_hwapic_irr_update,
+	.hwapic_isr_update = svm_hwapic_isr_update,
 
 	.set_tss_addr = svm_set_tss_addr,
 	.get_tdp_level = get_npt_level,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PART1 RFC v2 06/10] svm: Add interrupt injection via AVIC
  2016-03-04 20:45 [PART1 RFC v2 00/10] KVM: x86: Introduce SVM AVIC support Suravee Suthikulpanit
                   ` (4 preceding siblings ...)
  2016-03-04 20:46 ` [PART1 RFC v2 05/10] KVM: x86: Detect and Initialize AVIC support Suravee Suthikulpanit
@ 2016-03-04 20:46 ` Suravee Suthikulpanit
  2016-03-07 15:36   ` Paolo Bonzini
  2016-03-04 20:46 ` [PART1 RFC v2 07/10] svm: Add VMEXIT handlers for AVIC Suravee Suthikulpanit
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 54+ messages in thread
From: Suravee Suthikulpanit @ 2016-03-04 20:46 UTC (permalink / raw)
  To: pbonzini, rkrcmar, joro, bp, gleb, alex.williamson
  Cc: kvm, linux-kernel, wei, sherry.hurwitz, Suravee Suthikulpanit,
	Suravee Suthikulpanit

This patch introduces a new mechanism to inject interrupt using AVIC.
Since VINTR is not supported when enable AVIC, we need to inject
interrupt via APIC backing page instead.

This patch also adds support for AVIC doorbell, which is used by
KVM to signal a running vcpu to check IRR for injected interrupts.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
 arch/x86/kvm/svm.c | 35 ++++++++++++++++++++++++++++++++---
 1 file changed, 32 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index c703149..8f11200 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -71,6 +71,8 @@ MODULE_DEVICE_TABLE(x86cpu, svm_cpu_id);
 #define SVM_FEATURE_DECODE_ASSIST  (1 <<  7)
 #define SVM_FEATURE_PAUSE_FILTER   (1 << 10)
 
+#define SVM_AVIC_DOORBELL	0xc001011b
+
 #define NESTED_EXIT_HOST	0	/* Exit handled on host level */
 #define NESTED_EXIT_DONE	1	/* Exit caused nested vmexit  */
 #define NESTED_EXIT_CONTINUE	2	/* Further checks needed      */
@@ -236,6 +238,8 @@ static bool npt_enabled = true;
 static bool npt_enabled;
 #endif
 
+static struct kvm_x86_ops svm_x86_ops;
+
 /* allow nested paging (virtualized MMU) for all guests */
 static int npt = true;
 module_param(npt, int, S_IRUGO);
@@ -978,6 +982,8 @@ static __init int svm_hardware_setup(void)
 
 	if (avic) {
 		printk(KERN_INFO "kvm: AVIC enabled\n");
+	} else {
+		svm_x86_ops.deliver_posted_interrupt = NULL;
 	}
 
 	return 0;
@@ -3071,8 +3077,10 @@ static int clgi_interception(struct vcpu_svm *svm)
 	disable_gif(svm);
 
 	/* After a CLGI no interrupts should come */
-	svm_clear_vintr(svm);
-	svm->vmcb->control.v_irq = 0;
+	if (!avic) {
+		svm_clear_vintr(svm);
+		svm->vmcb->control.v_irq = 0;
+	}
 
 	mark_dirty(svm->vmcb, VMCB_INTR);
 
@@ -3647,6 +3655,9 @@ static int msr_interception(struct vcpu_svm *svm)
 
 static int interrupt_window_interception(struct vcpu_svm *svm)
 {
+	if (avic)
+		BUG();
+
 	kvm_make_request(KVM_REQ_EVENT, &svm->vcpu);
 	svm_clear_vintr(svm);
 	svm->vmcb->control.v_irq = 0;
@@ -3961,7 +3972,7 @@ static inline void svm_inject_irq(struct vcpu_svm *svm, int irq)
 {
 	struct vmcb_control_area *control;
 
-
+	/* The following fields are ignored when AVIC is enabled */
 	control = &svm->vmcb->control;
 	control->int_vector = irq;
 	control->v_intr_prio = 0xf;
@@ -4042,6 +4053,20 @@ static void svm_sync_pir_to_irr(struct kvm_vcpu *vcpu)
 	return;
 }
 
+static void svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+
+	kvm_lapic_set_vector(vec, avic_get_bk_page_entry(svm, APIC_IRR));
+
+	if (vcpu->mode == IN_GUEST_MODE) {
+		wrmsrl(SVM_AVIC_DOORBELL,
+		       __default_cpu_present_to_apicid(vcpu->cpu));
+	} else {
+		kvm_vcpu_kick(vcpu);
+	}
+}
+
 static int svm_nmi_allowed(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
@@ -4102,6 +4127,9 @@ static void enable_irq_window(struct kvm_vcpu *vcpu)
 	 * get that intercept, this function will be called again though and
 	 * we'll get the vintr intercept.
 	 */
+	if (avic)
+		return;
+
 	if (gif_set(svm) && nested_svm_intr(svm)) {
 		svm_set_vintr(svm);
 		svm_inject_irq(svm, 0x0);
@@ -4834,6 +4862,7 @@ static struct kvm_x86_ops svm_x86_ops = {
 	.sched_in = svm_sched_in,
 
 	.pmu_ops = &amd_pmu_ops,
+	.deliver_posted_interrupt = svm_deliver_avic_intr,
 };
 
 static int __init svm_init(void)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PART1 RFC v2 07/10] svm: Add VMEXIT handlers for AVIC
  2016-03-04 20:45 [PART1 RFC v2 00/10] KVM: x86: Introduce SVM AVIC support Suravee Suthikulpanit
                   ` (5 preceding siblings ...)
  2016-03-04 20:46 ` [PART1 RFC v2 06/10] svm: Add interrupt injection via AVIC Suravee Suthikulpanit
@ 2016-03-04 20:46 ` Suravee Suthikulpanit
  2016-03-07 15:58   ` Paolo Bonzini
  2016-03-09 20:55   ` Radim Krčmář
  2016-03-04 20:46 ` [PART1 RFC v2 08/10] svm: Do not expose x2APIC when enable AVIC Suravee Suthikulpanit
                   ` (2 subsequent siblings)
  9 siblings, 2 replies; 54+ messages in thread
From: Suravee Suthikulpanit @ 2016-03-04 20:46 UTC (permalink / raw)
  To: pbonzini, rkrcmar, joro, bp, gleb, alex.williamson
  Cc: kvm, linux-kernel, wei, sherry.hurwitz, Suravee Suthikulpanit

From: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>

Introduce VMEXIT handlers, avic_incp_ipi_interception() and
avic_noaccel_interception().

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
 arch/x86/include/uapi/asm/svm.h |   9 +-
 arch/x86/kvm/svm.c              | 260 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 268 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
index 8a4add8..ebfdf8d 100644
--- a/arch/x86/include/uapi/asm/svm.h
+++ b/arch/x86/include/uapi/asm/svm.h
@@ -73,6 +73,8 @@
 #define SVM_EXIT_MWAIT_COND    0x08c
 #define SVM_EXIT_XSETBV        0x08d
 #define SVM_EXIT_NPF           0x400
+#define SVM_EXIT_AVIC_INCMP_IPI 0x401
+#define SVM_EXIT_AVIC_NOACCEL  0x402
 
 #define SVM_EXIT_ERR           -1
 
@@ -107,8 +109,10 @@
 	{ SVM_EXIT_SMI,         "smi" }, \
 	{ SVM_EXIT_INIT,        "init" }, \
 	{ SVM_EXIT_VINTR,       "vintr" }, \
+	{ SVM_EXIT_CR0_SEL_WRITE, "cr0_sec_write" }, \
 	{ SVM_EXIT_CPUID,       "cpuid" }, \
 	{ SVM_EXIT_INVD,        "invd" }, \
+	{ SVM_EXIT_PAUSE,       "pause" }, \
 	{ SVM_EXIT_HLT,         "hlt" }, \
 	{ SVM_EXIT_INVLPG,      "invlpg" }, \
 	{ SVM_EXIT_INVLPGA,     "invlpga" }, \
@@ -127,7 +131,10 @@
 	{ SVM_EXIT_MONITOR,     "monitor" }, \
 	{ SVM_EXIT_MWAIT,       "mwait" }, \
 	{ SVM_EXIT_XSETBV,      "xsetbv" }, \
-	{ SVM_EXIT_NPF,         "npf" }
+	{ SVM_EXIT_NPF,         "npf" }, \
+	{ SVM_EXIT_RSM,         "rsm" }, \
+	{ SVM_EXIT_AVIC_INCMP_IPI, "avic_incomp_ipi" }, \
+	{ SVM_EXIT_AVIC_NOACCEL,   "avic_noaccel" }
 
 
 #endif /* _UAPI__SVM_H */
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 8f11200..a177781 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3690,6 +3690,264 @@ static int mwait_interception(struct vcpu_svm *svm)
 	return nop_interception(svm);
 }
 
+enum avic_incmp_ipi_err_code {
+	AVIC_INCMP_IPI_ERR_INVALID_INT_TYPE,
+	AVIC_INCMP_IPI_ERR_TARGET_NOT_RUN,
+	AVIC_INCMP_IPI_ERR_INV_TARGET,
+	AVIC_INCMP_IPI_ERR_INV_BK_PAGE,
+};
+
+#define APIC_SHORT_MASK			0xc0000
+#define APIC_DEST_MASK			0x800
+static int avic_incomp_ipi_interception(struct vcpu_svm *svm)
+{
+	u32 icrh = svm->vmcb->control.exit_info_1 >> 32;
+	u32 icrl = svm->vmcb->control.exit_info_1;
+	u32 id = svm->vmcb->control.exit_info_2 >> 32;
+	u32 index = svm->vmcb->control.exit_info_2 && 0xFF;
+	struct kvm_lapic *apic = svm->vcpu.arch.apic;
+
+	pr_debug("%s: cpu=%#x, vcpu=%#x, icrh:icrl=%#010x:%08x, id=%u, index=%u\n",
+		 __func__, svm->vcpu.cpu, svm->vcpu.vcpu_id,
+		 icrh, icrl, id, index);
+
+	switch (id) {
+	case AVIC_INCMP_IPI_ERR_INVALID_INT_TYPE:
+		/*
+		 * AVIC hardware handles the generation of
+		 * IPIs when the specified Message Type is Fixed
+		 * (also known as fixed delivery mode) and
+		 * the Trigger Mode is edge-triggered. The hardware
+		 * also supports self and broadcast delivery modes
+		 * specified via the Destination Shorthand(DSH)
+		 * field of the ICRL. Logical and physical APIC ID
+		 * formats are supported. All other IPI types cause
+		 * a #VMEXIT, which needs to emulated.
+		 */
+		kvm_lapic_reg_write(apic, APIC_ICR2, icrh);
+		kvm_lapic_reg_write(apic, APIC_ICR, icrl);
+		break;
+	case AVIC_INCMP_IPI_ERR_TARGET_NOT_RUN: {
+		int i;
+		struct kvm_vcpu *vcpu;
+		struct kvm *kvm = svm->vcpu.kvm;
+		struct kvm_lapic *apic = svm->vcpu.arch.apic;
+
+		/*
+		 * At this point, we expect that the AVIC HW has already
+		 * set the appropriate IRR bits on the valid target
+		 * vcpus. So, we just need to kick the appropriate vcpu.
+		 */
+		kvm_for_each_vcpu(i, vcpu, kvm) {
+			if (!kvm_apic_match_dest(vcpu, apic,
+						 icrl & APIC_SHORT_MASK,
+						 GET_APIC_DEST_FIELD(icrh),
+						 icrl & APIC_DEST_MASK))
+				continue;
+
+			kvm_vcpu_kick(vcpu);
+		}
+		break;
+	}
+	case AVIC_INCMP_IPI_ERR_INV_TARGET:
+		pr_err("%s: Invalid IPI target (icr=%#08x:%08x, idx=%u)\n",
+		       __func__, icrh, icrl, index);
+		BUG();
+		break;
+	case AVIC_INCMP_IPI_ERR_INV_BK_PAGE:
+		pr_err("%s: Invalid bk page (icr=%#08x:%08x, idx=%u)\n",
+		       __func__, icrh, icrl, index);
+		BUG();
+		break;
+	default:
+		pr_err("Unknown IPI interception\n");
+	}
+
+	return 1;
+}
+
+static int avic_noaccel_trap_write(struct vcpu_svm *svm)
+{
+	u32 offset = svm->vmcb->control.exit_info_1 & 0xFF0;
+	struct svm_vm_data *vm_data = svm->vcpu.kvm->arch.arch_data;
+	struct kvm_lapic *apic = svm->vcpu.arch.apic;
+	u32 reg = *avic_get_bk_page_entry(svm, offset);
+
+	pr_debug("%s: offset=%#x, val=%#x, (cpu=%x) (vcpu_id=%x)\n",
+		 __func__, offset, reg, svm->vcpu.cpu, svm->vcpu.vcpu_id);
+
+	switch (offset) {
+	case APIC_ID: {
+		u32 aid = (reg >> 24) & 0xff;
+		struct svm_avic_phy_ait_entry *o_ent =
+				avic_get_phy_ait_entry(&svm->vcpu, svm->vcpu.vcpu_id);
+		struct svm_avic_phy_ait_entry *n_ent =
+				avic_get_phy_ait_entry(&svm->vcpu, aid);
+
+		if (!n_ent || !o_ent)
+			return 0;
+
+		pr_debug("%s: APIC_ID=%#x (id=%x)\n", __func__, reg, aid);
+
+		/* We need to move phy_apic_entry to new offset */
+		*n_ent = *o_ent;
+		*((u64 *)o_ent) = 0ULL;
+		break;
+	}
+	case APIC_LDR: {
+		int ret, lid;
+		int dlid = (reg >> 24) & 0xff;
+
+		if (!dlid)
+			return 0;
+
+		lid = ffs(dlid) - 1;
+		pr_debug("%s: LDR=%0#10x (lid=%x)\n", __func__, reg, lid);
+		ret = avic_init_log_apic_entry(&svm->vcpu, svm->vcpu.vcpu_id,
+					       lid);
+		if (ret)
+			return 0;
+
+		break;
+	}
+	case APIC_DFR: {
+		u32 mod = (*avic_get_bk_page_entry(svm, offset) >> 28) & 0xf;
+
+		pr_debug("%s: DFR=%#x (%s)\n", __func__,
+			 mod, (mod == 0xf) ? "flat" : "cluster");
+
+		/*
+		 * We assume that all local APICs are using the same type.
+		 * If this changes, we need to rebuild the AVIC logical
+		 * APID id table with subsequent write to APIC_LDR.
+		 */
+		if (vm_data->ldr_mode != mod) {
+			clear_page(page_address(vm_data->avic_log_ait_page));
+			vm_data->ldr_mode = mod;
+		}
+		break;
+	}
+	case APIC_TMICT: {
+		u32 val = kvm_apic_get_reg(apic, APIC_TMICT);
+
+		pr_debug("%s: TMICT=%#x,%#x\n", __func__, val, reg);
+		break;
+	}
+	case APIC_ESR: {
+		u32 val = kvm_apic_get_reg(apic, APIC_ESR);
+
+		pr_debug("%s: ESR=%#x,%#x\n", __func__, val, reg);
+		break;
+	}
+	case APIC_LVTERR: {
+		u32 val = kvm_apic_get_reg(apic, APIC_LVTERR);
+
+		pr_debug("%s: LVTERR=%#x,%#x\n", __func__, val, reg);
+		break;
+	}
+	default:
+		break;
+	}
+
+	kvm_lapic_reg_write(apic, offset, reg);
+
+	return 1;
+}
+
+static int avic_noaccel_fault_read(struct vcpu_svm *svm)
+{
+	u32 val;
+	u32 offset = svm->vmcb->control.exit_info_1 & 0xFF0;
+	struct kvm_lapic *apic = svm->vcpu.arch.apic;
+
+	pr_debug("%s: offset=%x\n", __func__, offset);
+
+	switch (offset) {
+	case APIC_TMCCT: {
+		if (kvm_lapic_reg_read(apic, offset, 4, &val))
+			return 0;
+
+		pr_debug("%s: TMCCT: rip=%#lx, next_rip=%#llx, val=%#x)\n",
+			 __func__, kvm_rip_read(&svm->vcpu), svm->next_rip, val);
+
+		*avic_get_bk_page_entry(svm, offset) = val;
+		break;
+	}
+	default:
+		pr_debug("%s: (rip=%#lx), offset=%#x\n", __func__,
+			 kvm_rip_read(&svm->vcpu), offset);
+		break;
+	}
+
+	return 1;
+}
+
+static int avic_noaccel_fault_write(struct vcpu_svm *svm)
+{
+	u32 offset = svm->vmcb->control.exit_info_1 & 0xFF0;
+
+	pr_debug("%s: offset=%x\n", __func__, offset);
+
+	switch (offset) {
+	case APIC_ARBPRI: /* APR: Arbitration Priority Register */
+	case APIC_TMCCT: /* Timer Current Count */
+		/* TODO */
+		break;
+	default:
+		BUG();
+	}
+
+	return 1;
+}
+
+static int avic_noaccel_interception(struct vcpu_svm *svm)
+{
+	int ret = 0;
+	u32 offset = svm->vmcb->control.exit_info_1 & 0xFF0;
+	u32 rw = (svm->vmcb->control.exit_info_1 >> 32) & 0x1;
+	u32 vector = svm->vmcb->control.exit_info_2 & 0xFFFFFFFF;
+
+	pr_debug("%s: offset=%#x, rw=%#x, vector=%#x, vcpu_id=%#x, cpu=%#x\n",
+		 __func__, offset, rw, vector, svm->vcpu.vcpu_id, svm->vcpu.cpu);
+
+	BUG_ON(offset >= 0x400);
+
+	switch (offset) {
+	case APIC_ID:
+	case APIC_EOI:
+	case APIC_RRR:
+	case APIC_LDR:
+	case APIC_DFR:
+	case APIC_SPIV:
+	case APIC_ESR:
+	case APIC_ICR:
+	case APIC_LVTT:
+	case APIC_LVTTHMR:
+	case APIC_LVTPC:
+	case APIC_LVT0:
+	case APIC_LVT1:
+	case APIC_LVTERR:
+	case APIC_TMICT:
+	case APIC_TDCR: {
+		/* Handling Trap */
+		if (!rw) /* Trap read should never happens */
+			BUG();
+		ret = avic_noaccel_trap_write(svm);
+		break;
+	}
+	default: {
+		/* Handling Fault */
+		if (rw)
+			ret = avic_noaccel_fault_write(svm);
+		else
+			ret = avic_noaccel_fault_read(svm);
+		skip_emulated_instruction(&svm->vcpu);
+	}
+	}
+
+	return ret;
+}
+
 static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = {
 	[SVM_EXIT_READ_CR0]			= cr_interception,
 	[SVM_EXIT_READ_CR3]			= cr_interception,
@@ -3753,6 +4011,8 @@ static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = {
 	[SVM_EXIT_XSETBV]			= xsetbv_interception,
 	[SVM_EXIT_NPF]				= pf_interception,
 	[SVM_EXIT_RSM]                          = emulate_on_interception,
+	[SVM_EXIT_AVIC_INCMP_IPI]		= avic_incomp_ipi_interception,
+	[SVM_EXIT_AVIC_NOACCEL]			= avic_noaccel_interception,
 };
 
 static void dump_vmcb(struct kvm_vcpu *vcpu)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PART1 RFC v2 08/10] svm: Do not expose x2APIC when enable AVIC
  2016-03-04 20:45 [PART1 RFC v2 00/10] KVM: x86: Introduce SVM AVIC support Suravee Suthikulpanit
                   ` (6 preceding siblings ...)
  2016-03-04 20:46 ` [PART1 RFC v2 07/10] svm: Add VMEXIT handlers for AVIC Suravee Suthikulpanit
@ 2016-03-04 20:46 ` Suravee Suthikulpanit
  2016-03-04 20:46 ` [PART1 RFC v2 09/10] svm: Do not intercept CR8 " Suravee Suthikulpanit
  2016-03-04 20:46 ` [PART1 RFC v2 10/10] svm: Manage vcpu load/unload " Suravee Suthikulpanit
  9 siblings, 0 replies; 54+ messages in thread
From: Suravee Suthikulpanit @ 2016-03-04 20:46 UTC (permalink / raw)
  To: pbonzini, rkrcmar, joro, bp, gleb, alex.williamson
  Cc: kvm, linux-kernel, wei, sherry.hurwitz, Suravee Suthikulpanit

From: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>

Since AVIC only virtualizes xAPIC hardware for the guest, we need to:
    * Intercept APIC BAR msr accesses to disable x2APIC
    * Intercept CPUID access to not advertise x2APIC support
    * Hide x2APIC support when checking via KVM ioctl

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
 arch/x86/kvm/svm.c | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index a177781..02cd8d0 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -228,6 +228,7 @@ static const struct svm_direct_access_msrs {
 	{ .index = MSR_IA32_LASTBRANCHTOIP,		.always = false },
 	{ .index = MSR_IA32_LASTINTFROMIP,		.always = false },
 	{ .index = MSR_IA32_LASTINTTOIP,		.always = false },
+	{ .index = MSR_IA32_APICBASE,			.always = false },
 	{ .index = MSR_INVALID,				.always = false },
 };
 
@@ -855,6 +856,9 @@ static void svm_vcpu_init_msrpm(u32 *msrpm)
 
 		set_msr_interception(msrpm, direct_access_msrs[i].index, 1, 1);
 	}
+
+	if (avic)
+		set_msr_interception(msrpm, MSR_IA32_APICBASE, 1, 1);
 }
 
 static void add_msr_offset(u32 offset)
@@ -3490,6 +3494,18 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 			msr_info->data = 0x1E;
 		}
 		break;
+	case MSR_IA32_APICBASE:
+		if (avic) {
+			/* Note:
+			 * For AVIC, we need to disable X2APIC
+			 * and enable XAPIC
+			 */
+			kvm_get_msr_common(vcpu, msr_info);
+			msr_info->data &= ~X2APIC_ENABLE;
+			msr_info->data |= XAPIC_ENABLE;
+			break;
+		}
+		/* Follow through if not AVIC */
 	default:
 		return kvm_get_msr_common(vcpu, msr_info);
 	}
@@ -3618,6 +3634,10 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
 	case MSR_VM_IGNNE:
 		vcpu_unimpl(vcpu, "unimplemented wrmsr: 0x%x data 0x%llx\n", ecx, data);
 		break;
+	case MSR_IA32_APICBASE:
+		if (avic)
+			avic_update_vapic_bar(to_svm(vcpu), data);
+		/* Follow through */
 	default:
 		return kvm_set_msr_common(vcpu, msr);
 	}
@@ -4754,11 +4774,26 @@ static void svm_cpuid_update(struct kvm_vcpu *vcpu)
 
 	/* Update nrips enabled cache */
 	svm->nrips_enabled = !!guest_cpuid_has_nrips(&svm->vcpu);
+
+	/* Do not support X2APIC when enable AVIC */
+	if (avic) {
+		int i;
+
+		for (i = 0 ; i < vcpu->arch.cpuid_nent ; i++) {
+			if (vcpu->arch.cpuid_entries[i].function == 1)
+				vcpu->arch.cpuid_entries[i].ecx &= ~(1 << 21);
+		}
+	}
 }
 
 static void svm_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 *entry)
 {
 	switch (func) {
+	case 0x00000001:
+		/* Do not support X2APIC when enable AVIC */
+		if (avic)
+			entry->ecx &= ~(1 << 21);
+		break;
 	case 0x80000001:
 		if (nested)
 			entry->ecx |= (1 << 2); /* Set SVM bit */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PART1 RFC v2 09/10] svm: Do not intercept CR8 when enable AVIC
  2016-03-04 20:45 [PART1 RFC v2 00/10] KVM: x86: Introduce SVM AVIC support Suravee Suthikulpanit
                   ` (7 preceding siblings ...)
  2016-03-04 20:46 ` [PART1 RFC v2 08/10] svm: Do not expose x2APIC when enable AVIC Suravee Suthikulpanit
@ 2016-03-04 20:46 ` Suravee Suthikulpanit
  2016-03-07 15:39   ` Paolo Bonzini
  2016-03-04 20:46 ` [PART1 RFC v2 10/10] svm: Manage vcpu load/unload " Suravee Suthikulpanit
  9 siblings, 1 reply; 54+ messages in thread
From: Suravee Suthikulpanit @ 2016-03-04 20:46 UTC (permalink / raw)
  To: pbonzini, rkrcmar, joro, bp, gleb, alex.williamson
  Cc: kvm, linux-kernel, wei, sherry.hurwitz, Suravee Suthikulpanit,
	Suravee Suthikulpanit

When enable AVIC:
    * Do not intercept CR8 since this should be handled by AVIC HW.
    * Also update TPR in APIC backing page when syncing CR8 before VMRUN

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
 arch/x86/kvm/svm.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 02cd8d0..5142861 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -986,6 +986,9 @@ static __init int svm_hardware_setup(void)
 
 	if (avic) {
 		printk(KERN_INFO "kvm: AVIC enabled\n");
+
+		/* Do not do cr8 intercept if AVIC is enabled. */
+		svm_x86_ops.update_cr8_intercept = NULL;
 	} else {
 		svm_x86_ops.deliver_posted_interrupt = NULL;
 	}
@@ -1116,7 +1119,8 @@ static void init_vmcb(struct vcpu_svm *svm)
 	set_cr_intercept(svm, INTERCEPT_CR0_WRITE);
 	set_cr_intercept(svm, INTERCEPT_CR3_WRITE);
 	set_cr_intercept(svm, INTERCEPT_CR4_WRITE);
-	set_cr_intercept(svm, INTERCEPT_CR8_WRITE);
+	if (!avic)
+		set_cr_intercept(svm, INTERCEPT_CR8_WRITE);
 
 	set_dr_intercepts(svm);
 
@@ -4277,7 +4281,8 @@ static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 
-	if (is_guest_mode(vcpu) && (vcpu->arch.hflags & HF_VINTR_MASK))
+	if ((is_guest_mode(vcpu) && (vcpu->arch.hflags & HF_VINTR_MASK)) ||
+	     avic)
 		return;
 
 	clr_cr_intercept(svm, INTERCEPT_CR8_WRITE);
@@ -4472,8 +4477,10 @@ static inline void sync_lapic_to_cr8(struct kvm_vcpu *vcpu)
 	if (is_guest_mode(vcpu) && (vcpu->arch.hflags & HF_VINTR_MASK))
 		return;
 
-	cr8 = kvm_get_cr8(vcpu);
-	svm->vmcb->control.v_tpr = cr8 & V_TPR_MASK;
+	svm->vmcb->control.v_tpr = cr8 = kvm_get_cr8(vcpu) & V_TPR_MASK;
+
+	if (avic)
+		*(avic_get_bk_page_entry(svm, APIC_TASKPRI)) = (u32)cr8 << 4;
 }
 
 static void svm_complete_interrupts(struct vcpu_svm *svm)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PART1 RFC v2 10/10] svm: Manage vcpu load/unload when enable AVIC
  2016-03-04 20:45 [PART1 RFC v2 00/10] KVM: x86: Introduce SVM AVIC support Suravee Suthikulpanit
                   ` (8 preceding siblings ...)
  2016-03-04 20:46 ` [PART1 RFC v2 09/10] svm: Do not intercept CR8 " Suravee Suthikulpanit
@ 2016-03-04 20:46 ` Suravee Suthikulpanit
  2016-03-09 21:46   ` Radim Krčmář
  9 siblings, 1 reply; 54+ messages in thread
From: Suravee Suthikulpanit @ 2016-03-04 20:46 UTC (permalink / raw)
  To: pbonzini, rkrcmar, joro, bp, gleb, alex.williamson
  Cc: kvm, linux-kernel, wei, sherry.hurwitz, Suravee Suthikulpanit

From: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>

When a vcpu is loaded/unloaded to a physical core, we need to update
information in the Physical APIC-ID table accordingly.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
 arch/x86/kvm/svm.c | 146 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 146 insertions(+)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 5142861..ebcade0 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -35,6 +35,7 @@
 #include <linux/trace_events.h>
 #include <linux/slab.h>
 
+#include <asm/apic.h>
 #include <asm/perf_event.h>
 #include <asm/tlbflush.h>
 #include <asm/desc.h>
@@ -175,6 +176,7 @@ struct vcpu_svm {
 
 	struct page *avic_bk_page;
 	void *in_kernel_lapic_regs;
+	bool avic_was_running;
 };
 
 struct __attribute__ ((__packed__))
@@ -1508,6 +1510,146 @@ static int avic_vcpu_init(struct kvm *kvm, struct vcpu_svm *svm, int id)
 	return 0;
 }
 
+static inline int
+avic_update_iommu(struct kvm_vcpu *vcpu, int cpu, phys_addr_t pa, bool r)
+{
+	if (!kvm_arch_has_assigned_device(vcpu->kvm))
+		return 0;
+
+	/* TODO: We will hook up with IOMMU API at later time */
+	return 0;
+}
+
+static int avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu, bool is_load)
+{
+	int g_phy_apic_id, h_phy_apic_id;
+	struct svm_avic_phy_ait_entry *entry, new_entry;
+	struct vcpu_svm *svm = to_svm(vcpu);
+	int ret = 0;
+
+	if (!avic)
+		return 0;
+
+	if (!svm)
+		return -EINVAL;
+
+	/* Note: APIC ID = 0xff is used for broadcast.
+	 *       APIC ID > 0xff is reserved.
+	 */
+	g_phy_apic_id = vcpu->vcpu_id;
+	h_phy_apic_id = __default_cpu_present_to_apicid(cpu);
+
+	if ((g_phy_apic_id >= AVIC_PHY_APIC_ID_MAX) ||
+	    (h_phy_apic_id >= AVIC_PHY_APIC_ID_MAX))
+		return -EINVAL;
+
+	entry = avic_get_phy_ait_entry(vcpu, g_phy_apic_id);
+	if (!entry)
+		return -EINVAL;
+
+	if (is_load) {
+		/* Handle vcpu load */
+		phys_addr_t pa = PFN_PHYS(page_to_pfn(svm->avic_bk_page));
+
+		new_entry = READ_ONCE(*entry);
+
+		BUG_ON(new_entry.is_running);
+
+		new_entry.bk_pg_ptr = (pa >> 12) & 0xffffffffff;
+		new_entry.valid = 1;
+		new_entry.host_phy_apic_id = h_phy_apic_id;
+
+		if (svm->avic_was_running) {
+			/**
+			 * Restore AVIC running flag if it was set during
+			 * vcpu unload.
+			 */
+			new_entry.is_running = 1;
+		}
+		ret = avic_update_iommu(vcpu, h_phy_apic_id, pa,
+					   svm->avic_was_running);
+		WRITE_ONCE(*entry, new_entry);
+
+	} else {
+		/* Handle vcpu unload */
+		new_entry = READ_ONCE(*entry);
+		if (new_entry.is_running) {
+			phys_addr_t pa = PFN_PHYS(page_to_pfn(svm->avic_bk_page));
+
+			/**
+			 * This handles the case when vcpu is scheduled out
+			 * and has not yet not called blocking. We save the
+			 * AVIC running flag so that we can restore later.
+			 */
+			svm->avic_was_running = true;
+
+			/**
+			 * We need to also clear the AVIC running flag
+			 * and communicate the changes to IOMMU.
+			 */
+			ret = avic_update_iommu(vcpu, h_phy_apic_id, pa, 0);
+
+			new_entry.is_running = 0;
+			WRITE_ONCE(*entry, new_entry);
+		} else {
+			svm->avic_was_running = false;
+		}
+	}
+
+	return ret;
+}
+
+/**
+ * This function is called during VCPU halt/unhalt.
+ */
+static int avic_set_running(struct kvm_vcpu *vcpu, bool is_run)
+{
+	int ret = 0;
+	int g_phy_apic_id, h_phy_apic_id;
+	struct svm_avic_phy_ait_entry *entry, new_entry;
+	struct vcpu_svm *svm = to_svm(vcpu);
+	phys_addr_t pa = PFN_PHYS(page_to_pfn(svm->avic_bk_page));
+
+	if (!avic)
+		return 0;
+
+	/* Note: APIC ID = 0xff is used for broadcast.
+	 *       APIC ID > 0xff is reserved.
+	 */
+	g_phy_apic_id = vcpu->vcpu_id;
+	h_phy_apic_id = __default_cpu_present_to_apicid(vcpu->cpu);
+
+	if ((g_phy_apic_id >= AVIC_PHY_APIC_ID_MAX) ||
+	    (h_phy_apic_id >= AVIC_PHY_APIC_ID_MAX))
+		return -EINVAL;
+
+	entry = avic_get_phy_ait_entry(vcpu, g_phy_apic_id);
+	if (!entry)
+		return -EINVAL;
+
+	if (is_run) {
+		/**
+		 * Handle vcpu unblocking after HLT
+		 */
+		new_entry = READ_ONCE(*entry);
+		new_entry.is_running = is_run;
+		WRITE_ONCE(*entry, new_entry);
+
+		ret = avic_update_iommu(vcpu, h_phy_apic_id, pa, is_run);
+	} else {
+		/**
+		 * Handle vcpu blocking due to HLT
+		 */
+		ret = avic_update_iommu(vcpu, h_phy_apic_id, pa, is_run);
+
+		new_entry = READ_ONCE(*entry);
+		new_entry.is_running = is_run;
+		WRITE_ONCE(*entry, new_entry);
+	}
+
+	return ret;
+}
+
 static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
@@ -1648,6 +1790,8 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	/* This assumes that the kernel never uses MSR_TSC_AUX */
 	if (static_cpu_has(X86_FEATURE_RDTSCP))
 		wrmsrl(MSR_TSC_AUX, svm->tsc_aux);
+
+	avic_vcpu_load(vcpu, cpu, true);
 }
 
 static void svm_vcpu_put(struct kvm_vcpu *vcpu)
@@ -1655,6 +1799,8 @@ static void svm_vcpu_put(struct kvm_vcpu *vcpu)
 	struct vcpu_svm *svm = to_svm(vcpu);
 	int i;
 
+	avic_vcpu_load(vcpu, 0, false);
+
 	++vcpu->stat.host_state_reload;
 	kvm_load_ldt(svm->host.ldt);
 #ifdef CONFIG_X86_64
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [PART1 RFC v2 06/10] svm: Add interrupt injection via AVIC
  2016-03-04 20:46 ` [PART1 RFC v2 06/10] svm: Add interrupt injection via AVIC Suravee Suthikulpanit
@ 2016-03-07 15:36   ` Paolo Bonzini
  2016-03-08 21:54     ` Radim Krčmář
  2016-03-14  5:25     ` Suravee Suthikulpanit
  0 siblings, 2 replies; 54+ messages in thread
From: Paolo Bonzini @ 2016-03-07 15:36 UTC (permalink / raw)
  To: Suravee Suthikulpanit, rkrcmar, joro, bp, gleb, alex.williamson
  Cc: kvm, linux-kernel, wei, sherry.hurwitz



On 04/03/2016 21:46, Suravee Suthikulpanit wrote:
> +static void svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
> +{
> +	struct vcpu_svm *svm = to_svm(vcpu);
> +
> +	kvm_lapic_set_vector(vec, avic_get_bk_page_entry(svm, APIC_IRR));
> +
> +	if (vcpu->mode == IN_GUEST_MODE) {
> +		wrmsrl(SVM_AVIC_DOORBELL,
> +		       __default_cpu_present_to_apicid(vcpu->cpu));
> +	} else {
> +		kvm_vcpu_kick(vcpu);
> +	}

You also need to add

	kvm_make_request(KVM_REQ_EVENT, vcpu);

before the "if", similar to vmx_deliver_posted_interrupt.

Paolo

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PART1 RFC v2 09/10] svm: Do not intercept CR8 when enable AVIC
  2016-03-04 20:46 ` [PART1 RFC v2 09/10] svm: Do not intercept CR8 " Suravee Suthikulpanit
@ 2016-03-07 15:39   ` Paolo Bonzini
  2016-03-14  6:09     ` Suravee Suthikulpanit
  0 siblings, 1 reply; 54+ messages in thread
From: Paolo Bonzini @ 2016-03-07 15:39 UTC (permalink / raw)
  To: Suravee Suthikulpanit, rkrcmar, joro, bp, gleb, alex.williamson
  Cc: kvm, linux-kernel, wei, sherry.hurwitz



On 04/03/2016 21:46, Suravee Suthikulpanit wrote:
> +
> +		/* Do not do cr8 intercept if AVIC is enabled. */

No need for this comment.

> +		svm_x86_ops.update_cr8_intercept = NULL;
>  	} else {
>  		svm_x86_ops.deliver_posted_interrupt = NULL;
>  	}
> @@ -1116,7 +1119,8 @@ static void init_vmcb(struct vcpu_svm *svm)
>  	set_cr_intercept(svm, INTERCEPT_CR0_WRITE);
>  	set_cr_intercept(svm, INTERCEPT_CR3_WRITE);
>  	set_cr_intercept(svm, INTERCEPT_CR4_WRITE);
> -	set_cr_intercept(svm, INTERCEPT_CR8_WRITE);
> +	if (!avic)

Remember that AVIC enabled/disabled must be refreshed when the
.refresh_apicv_exec_ctrl callback is invoked, so it is not enough to use
the global variable.

Paolo

> +		set_cr_intercept(svm, INTERCEPT_CR8_WRITE);
>  
>  	set_dr_intercepts(svm);
>  
> @@ -4277,7 +4281,8 @@ static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
>  
> -	if (is_guest_mode(vcpu) && (vcpu->arch.hflags & HF_VINTR_MASK))
> +	if ((is_guest_mode(vcpu) && (vcpu->arch.hflags & HF_VINTR_MASK)) ||
> +	     avic)
>  		return;
>  
>  	clr_cr_intercept(svm, INTERCEPT_CR8_WRITE);
> @@ -4472,8 +4477,10 @@ static inline void sync_lapic_to_cr8(struct kvm_vcpu *vcpu)
>  	if (is_guest_mode(vcpu) && (vcpu->arch.hflags & HF_VINTR_MASK))
>  		return;
>  
> -	cr8 = kvm_get_cr8(vcpu);
> -	svm->vmcb->control.v_tpr = cr8 & V_TPR_MASK;
> +	svm->vmcb->control.v_tpr = cr8 = kvm_get_cr8(vcpu) & V_TPR_MASK;
> +
> +	if (avic)
> +		*(avic_get_bk_page_entry(svm, APIC_TASKPRI)) = (u32)cr8 << 4;

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PART1 RFC v2 02/10] KVM: x86: Introducing kvm_x86_ops VCPU blocking/unblocking
  2016-03-04 20:46 ` [PART1 RFC v2 02/10] KVM: x86: Introducing kvm_x86_ops VCPU blocking/unblocking Suravee Suthikulpanit
@ 2016-03-07 15:42   ` Paolo Bonzini
  2016-03-14  6:19     ` Suravee Suthikulpanit
  0 siblings, 1 reply; 54+ messages in thread
From: Paolo Bonzini @ 2016-03-07 15:42 UTC (permalink / raw)
  To: Suravee Suthikulpanit, rkrcmar, joro, bp, gleb, alex.williamson
  Cc: kvm, linux-kernel, wei, sherry.hurwitz



On 04/03/2016 21:46, Suravee Suthikulpanit wrote:
>  
> +static void svm_vcpu_blocking(struct kvm_vcpu *vcpu)
> +{
> +	avic_set_running(vcpu, false);
> +}
> +
> +static void svm_vcpu_unblocking(struct kvm_vcpu *vcpu)
> +{
> +	avic_set_running(vcpu, true);
> +}
> +
>  static unsigned long svm_get_rflags(struct kvm_vcpu *vcpu)
>  {
>  	return to_svm(vcpu)->vmcb->save.rflags;
> @@ -4321,6 +4331,8 @@ static struct kvm_x86_ops svm_x86_ops = {
>  	.prepare_guest_switch = svm_prepare_guest_switch,
>  	.vcpu_load = svm_vcpu_load,
>  	.vcpu_put = svm_vcpu_put,
> +	.vcpu_blocking = svm_vcpu_blocking,
> +	.vcpu_unblocking = svm_vcpu_unblocking,

These two hunks should be added to patch 10.

Paolo

>  	.update_bp_intercept = update_bp_intercept,
>  	.get_msr = svm_get_msr,

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PART1 RFC v2 03/10] svm: Introduce new AVIC VMCB registers
  2016-03-04 20:46 ` [PART1 RFC v2 03/10] svm: Introduce new AVIC VMCB registers Suravee Suthikulpanit
@ 2016-03-07 15:44   ` Paolo Bonzini
  2016-03-14  7:41     ` Suravee Suthikulpanit
  0 siblings, 1 reply; 54+ messages in thread
From: Paolo Bonzini @ 2016-03-07 15:44 UTC (permalink / raw)
  To: Suravee Suthikulpanit, rkrcmar, joro, bp, gleb, alex.williamson
  Cc: kvm, linux-kernel, wei, sherry.hurwitz



On 04/03/2016 21:46, Suravee Suthikulpanit wrote:
> From: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> 
> Introduce new AVIC VMCB registers. Also breakdown int_ctl register
> into bit-field for ease of use.
> 
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> ---
>  arch/x86/include/asm/svm.h | 29 ++++++++++++++++++++++++-----
>  1 file changed, 24 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
> index 6136d99..db5d7af 100644
> --- a/arch/x86/include/asm/svm.h
> +++ b/arch/x86/include/asm/svm.h
> @@ -67,10 +67,24 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
>  	u32 asid;
>  	u8 tlb_ctl;
>  	u8 reserved_2[3];
> -	u32 int_ctl;
> +	union { /* Offset 0x60 */
> +		u32 int_ctl;
> +
> +		struct __attribute__ ((__packed__)) {
> +			u32 v_tpr		: 8,
> +			    v_irq		: 1,
> +			    reserved_3		: 7,
> +			    v_intr_prio		: 4,
> +			    v_ign_tpr		: 1,
> +			    reserved_4		: 3,
> +			    v_intr_masking	: 1,
> +			    reserved_5		: 6,
> +			    avic_enable		: 1;

Please do not introduce bitfields and drop patch 4.

Thanks,

Paolo

> +		};
> +	};
>  	u32 int_vector;
>  	u32 int_state;
> -	u8 reserved_3[4];
> +	u8 reserved_6[4];
>  	u32 exit_code;
>  	u32 exit_code_hi;
>  	u64 exit_info_1;
> @@ -78,17 +92,22 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
>  	u32 exit_int_info;
>  	u32 exit_int_info_err;
>  	u64 nested_ctl;
> -	u8 reserved_4[16];
> +	u64 avic_vapic_bar;
> +	u8 reserved_7[8];
>  	u32 event_inj;
>  	u32 event_inj_err;
>  	u64 nested_cr3;
>  	u64 lbr_ctl;
>  	u32 clean;
> -	u32 reserved_5;
> +	u32 reserved_8;
>  	u64 next_rip;
>  	u8 insn_len;
>  	u8 insn_bytes[15];
> -	u8 reserved_6[800];
> +	u64 avic_bk_page;	/* Offset 0xe0 */
> +	u8 reserved_9[8];	/* Offset 0xe8 */
> +	u64 avic_log_apic_id;	/* Offset 0xf0 */
> +	u64 avic_phy_apic_id;	/* Offset 0xf8 */
> +	u8 reserved_10[768];
>  };
>  
>  
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PART1 RFC v2 07/10] svm: Add VMEXIT handlers for AVIC
  2016-03-04 20:46 ` [PART1 RFC v2 07/10] svm: Add VMEXIT handlers for AVIC Suravee Suthikulpanit
@ 2016-03-07 15:58   ` Paolo Bonzini
  2016-03-08 22:05     ` Radim Krčmář
  2016-03-09 20:55   ` Radim Krčmář
  1 sibling, 1 reply; 54+ messages in thread
From: Paolo Bonzini @ 2016-03-07 15:58 UTC (permalink / raw)
  To: Suravee Suthikulpanit, rkrcmar, joro, bp, gleb, alex.williamson
  Cc: kvm, linux-kernel, wei, sherry.hurwitz



On 04/03/2016 21:46, Suravee Suthikulpanit wrote:
> +#define SVM_EXIT_AVIC_INCMP_IPI 0x401
> +#define SVM_EXIT_AVIC_NOACCEL  0x402
> 
> +enum avic_incmp_ipi_err_code {
> +	AVIC_INCMP_IPI_ERR_INVALID_INT_TYPE,
> +	AVIC_INCMP_IPI_ERR_TARGET_NOT_RUN,
> +	AVIC_INCMP_IPI_ERR_INV_TARGET,
> +	AVIC_INCMP_IPI_ERR_INV_BK_PAGE,
> +};
> +

Please do not abbreviate names and use the same definition as the
manual; these are "IPI delivery failure causes", which we can shorten to
"IPI failure causes".  Putting it together gives:

#define SVM_EXIT_AVIC_INCOMPLETE_IPI 0x401
#define SVM_EXIT_AVIC_UNACCELERATED_ACCESS 0x402

enum avic_ipi_failure_cause {
	AVIC_IPI_FAILURE_INVALID_INT_TYPE,
	AVIC_IPI_FAILURE_TARGET_NOT_RUNNING,
	AVIC_IPI_FAILURE_INVALID_TARGET,
	AVIC_IPI_FAILURE_INVALID_BACKING_PAGE,
};

Likewise, do not abbreviate function names.

> +#define APIC_SHORT_MASK			0xc0000
> +#define APIC_DEST_MASK			0x800

Please move these to lapic.h too in patch 1.

> +	case AVIC_INCMP_IPI_ERR_INV_TARGET:
> +		pr_err("%s: Invalid IPI target (icr=%#08x:%08x, idx=%u)\n",
> +		       __func__, icrh, icrl, index);
> +		BUG();
> +		break;
> +	case AVIC_INCMP_IPI_ERR_INV_BK_PAGE:
> +		pr_err("%s: Invalid bk page (icr=%#08x:%08x, idx=%u)\n",
> +		       __func__, icrh, icrl, index);
> +		BUG();
> +		break;

Please use WARN(1, "%s: Invalid bk page (icr=%#08x:%08x, idx=%u)\n",
__func__, icrh, icrl, index) (and likewise for invalid target) instead
of BUG().

> 
> +	pr_debug("%s: offset=%#x, val=%#x, (cpu=%x) (vcpu_id=%x)\n",
> +		 __func__, offset, reg, svm->vcpu.cpu, svm->vcpu.vcpu_id);
> +
> +	switch (offset) {
> +	case APIC_ID: {
> +		u32 aid = (reg >> 24) & 0xff;
> +		struct svm_avic_phy_ait_entry *o_ent =
> +				avic_get_phy_ait_entry(&svm->vcpu, svm->vcpu.vcpu_id);
> +		struct svm_avic_phy_ait_entry *n_ent =
> +				avic_get_phy_ait_entry(&svm->vcpu, aid);
> +
> +		if (!n_ent || !o_ent)
> +			return 0;
> +
> +		pr_debug("%s: APIC_ID=%#x (id=%x)\n", __func__, reg, aid);
> +
> +		/* We need to move phy_apic_entry to new offset */
> +		*n_ent = *o_ent;
> +		*((u64 *)o_ent) = 0ULL;
> +		break;
> +	}
> +	case APIC_LDR: {
> +		int ret, lid;
> +		int dlid = (reg >> 24) & 0xff;
> +
> +		if (!dlid)
> +			return 0;
> +
> +		lid = ffs(dlid) - 1;
> +		pr_debug("%s: LDR=%0#10x (lid=%x)\n", __func__, reg, lid);
> +		ret = avic_init_log_apic_entry(&svm->vcpu, svm->vcpu.vcpu_id,
> +					       lid);
> +		if (ret)
> +			return 0;
> +
> +		break;
> +	}
> +	case APIC_DFR: {
> +		u32 mod = (*avic_get_bk_page_entry(svm, offset) >> 28) & 0xf;
> +
> +		pr_debug("%s: DFR=%#x (%s)\n", __func__,
> +			 mod, (mod == 0xf) ? "flat" : "cluster");
> +
> +		/*
> +		 * We assume that all local APICs are using the same type.
> +		 * If this changes, we need to rebuild the AVIC logical
> +		 * APID id table with subsequent write to APIC_LDR.
> +		 */
> +		if (vm_data->ldr_mode != mod) {
> +			clear_page(page_address(vm_data->avic_log_ait_page));
> +			vm_data->ldr_mode = mod;
> +		}
> +		break;
> +	}
> +	case APIC_TMICT: {
> +		u32 val = kvm_apic_get_reg(apic, APIC_TMICT);
> +
> +		pr_debug("%s: TMICT=%#x,%#x\n", __func__, val, reg);
> +		break;
> +	}
> +	case APIC_ESR: {
> +		u32 val = kvm_apic_get_reg(apic, APIC_ESR);
> +
> +		pr_debug("%s: ESR=%#x,%#x\n", __func__, val, reg);
> +		break;
> +	}
> +	case APIC_LVTERR: {
> +		u32 val = kvm_apic_get_reg(apic, APIC_LVTERR);
> +
> +		pr_debug("%s: LVTERR=%#x,%#x\n", __func__, val, reg);
> +		break;
> +	}
> +	default:
> +		break;

Please use a single tracepoint instead of all these pr_debug statements.
 The tracepoint can convert APIC register offsets to APIC register
names, like the existing kvm_apic tracepoint.  Also please remove the
TMICT/ESR/LVTERR cases, since they do nothing but debugging.

Notice how a single tracepoint can be used for multiple functions, the
example being kvm_avic as well.

Existing tracepoints in fact make most of the pr_debug statements
unnecessary.

Paolo

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PART1 RFC v2 05/10] KVM: x86: Detect and Initialize AVIC support
  2016-03-04 20:46 ` [PART1 RFC v2 05/10] KVM: x86: Detect and Initialize AVIC support Suravee Suthikulpanit
@ 2016-03-07 16:41   ` Paolo Bonzini
  2016-03-15 17:09     ` Suravee Suthikulpanit
  0 siblings, 1 reply; 54+ messages in thread
From: Paolo Bonzini @ 2016-03-07 16:41 UTC (permalink / raw)
  To: Suravee Suthikulpanit, rkrcmar, joro, bp, gleb, alex.williamson
  Cc: kvm, linux-kernel, wei, sherry.hurwitz



On 04/03/2016 21:46, Suravee Suthikulpanit wrote:

> @@ -162,6 +170,37 @@ struct vcpu_svm {
>  
>  	/* cached guest cpuid flags for faster access */
>  	bool nrips_enabled	: 1;
> +
> +	struct page *avic_bk_page;
> +	void *in_kernel_lapic_regs;
> +};
> +
> +struct __attribute__ ((__packed__))
> +svm_avic_log_ait_entry {
> +	u32 guest_phy_apic_id	: 8,
> +	    res			: 23,
> +	    valid		: 1;
> +};
> +
> +struct __attribute__ ((__packed__))
> +svm_avic_phy_ait_entry {
> +	u64 host_phy_apic_id	: 8,
> +	    res1		: 4,
> +	    bk_pg_ptr		: 40,
> +	    res2		: 10,
> +	    is_running		: 1,
> +	    valid		: 1;
> +};

Please don't use bitfields.

> +/* Note: This structure is per VM */
> +struct svm_vm_data {
> +	atomic_t count;
> +	u32 ldr_mode;
> +	u32 avic_max_vcpu_id;
> +	u32 avic_tag;
> +
> +	struct page *avic_log_ait_page;
> +	struct page *avic_phy_ait_page;

You can put these directly in kvm_arch.  Do not use abbreviations:

	struct page *avic_logical_apic_id_table_page;
	struct page *avic_physical_apic_id_table_page;

> @@ -1000,6 +1057,41 @@ static void svm_adjust_tsc_offset_guest(struct kvm_vcpu *vcpu, s64 adjustment)
>  	mark_dirty(svm->vmcb, VMCB_INTERCEPTS);
>  }
>  
> +static void avic_init_vmcb(struct vcpu_svm *svm)
> +{
> +	void *vapic_bkpg;
> +	struct vmcb *vmcb = svm->vmcb;
> +	struct svm_vm_data *vm_data = svm->vcpu.kvm->arch.arch_data;
> +	phys_addr_t bpa = PFN_PHYS(page_to_pfn(svm->avic_bk_page));
> +	phys_addr_t lpa = PFN_PHYS(page_to_pfn(vm_data->avic_log_ait_page));
> +	phys_addr_t ppa = PFN_PHYS(page_to_pfn(vm_data->avic_phy_ait_page));

Please use page_to_phys.

> +	if (!vmcb)
> +		return;
> +
> +	pr_debug("%s: bpa=%#llx, lpa=%#llx, ppa=%#llx\n",
> +		 __func__, (unsigned long long)bpa,
> +		 (unsigned long long)lpa, (unsigned long long)ppa);

No pr_debug.

> +
> +	/*
> +	 * Here we copy the value from the emulated LAPIC register
> +	 * to the vAPIC backign page, and then flip regs frame.
> +	 */
> +	if (!svm->in_kernel_lapic_regs)
> +		svm->in_kernel_lapic_regs = svm->vcpu.arch.apic->regs;
> +
> +	vapic_bkpg = pfn_to_kaddr(page_to_pfn(svm->avic_bk_page));

Please use kmap instead.

> +	memcpy(vapic_bkpg, svm->in_kernel_lapic_regs, PAGE_SIZE);
> +	svm->vcpu.arch.apic->regs = vapic_bkpg;

Can you explain the flipping logic, and why you cannot just use the
existing apic.regs?

> +	vmcb->control.avic_bk_page = bpa & AVIC_HPA_MASK;

No abbreviations ("avic_backing_page").

> +	vmcb->control.avic_log_apic_id = lpa & AVIC_HPA_MASK;
> +	vmcb->control.avic_phy_apic_id = ppa & AVIC_HPA_MASK;
> +	vmcb->control.avic_phy_apic_id |= AVIC_PHY_APIC_ID_MAX;
> +	vmcb->control.avic_enable = 1;
> +}
> +
>  static void init_vmcb(struct vcpu_svm *svm)
>  {
>  	struct vmcb_control_area *control = &svm->vmcb->control;
> @@ -1113,6 +1205,293 @@ static void init_vmcb(struct vcpu_svm *svm)
>  	mark_all_dirty(svm->vmcb);
>  
>  	enable_gif(svm);
> +
> +	if (avic)
> +		avic_init_vmcb(svm);
> +}
> +
> +static struct svm_avic_phy_ait_entry *
> +avic_get_phy_ait_entry(struct kvm_vcpu *vcpu, int index)
> +{
> +	struct svm_avic_phy_ait_entry *avic_phy_ait;
> +	struct svm_vm_data *vm_data = vcpu->kvm->arch.arch_data;
> +
> +	if (!vm_data)
> +		return NULL;
> +
> +	/* Note: APIC ID = 0xff is used for broadcast.
> +	 *       APIC ID > 0xff is reserved.
> +	 */
> +	if (index >= 0xff)
> +		return NULL;
> +
> +	avic_phy_ait = page_address(vm_data->avic_phy_ait_page);
> +
> +	return &avic_phy_ait[index];
> +}
> +
> +struct svm_avic_log_ait_entry *
> +avic_get_log_ait_entry(struct kvm_vcpu *vcpu, u8 mda, bool is_flat)
> +{
> +	struct svm_vm_data *vm_data = vcpu->kvm->arch.arch_data;
> +	int index;
> +	struct svm_avic_log_ait_entry *avic_log_ait;
> +
> +	if (!vm_data)
> +		return NULL;
> +
> +	if (is_flat) { /* flat */
> +		if (mda > 7)
> +			return NULL;
> +		index = mda;
> +	} else { /* cluster */
> +		int apic_id = mda & 0xf;
> +		int cluster_id = (mda & 0xf0) >> 8;
> +
> +		if (apic_id > 4 || cluster_id >= 0xf)
> +			return NULL;
> +		index = (cluster_id << 2) + apic_id;
> +	}
> +	avic_log_ait = (struct svm_avic_log_ait_entry *)
> +				page_address(vm_data->avic_log_ait_page);
> +
> +	return &avic_log_ait[index];
> +}

Instead of these functions, create a complete function to handle APIC_ID
and APIC_LDR writes.  Then use kmap/kunmap instead of page_address.

> +static inline void
> +avic_set_bk_page_entry(struct vcpu_svm *svm, int reg_off, u32 val)
> +{
> +	void *avic_bk = page_address(svm->avic_bk_page);
> +
> +	*((u32 *) (avic_bk + reg_off)) = val;
> +}

Unused function.

> +static inline u32 *avic_get_bk_page_entry(struct vcpu_svm *svm, u32 offset)
> +{
> +	char *tmp = (char *)page_address(svm->avic_bk_page);
> +
> +	return (u32 *)(tmp+offset);
> +}

I think you should be able to use kvm_apic_get_reg instead.

> +static int avic_init_bk_page(struct kvm_vcpu *vcpu)

No abbreviations (but again, please try reusing apic.regs).

> +static int avic_alloc_bk_page(struct vcpu_svm *svm, int id)
> +{
> +	int ret = 0, i;
> +	bool realloc = false;
> +	struct kvm_vcpu *vcpu;
> +	struct kvm *kvm = svm->vcpu.kvm;
> +	struct svm_vm_data *vm_data = kvm->arch.arch_data;
> +
> +	mutex_lock(&kvm->slots_lock);
> +
> +	/* Check if we have already allocated vAPIC backing
> +	 * page for this vCPU. If not, we need to realloc
> +	 * a new one and re-assign all other vCPU.
> +	 */
> +	if (kvm->arch.apic_access_page_done &&
> +	    (id > vm_data->avic_max_vcpu_id)) {
> +		kvm_for_each_vcpu(i, vcpu, kvm)
> +			avic_unalloc_bk_page(vcpu);
> +
> +		__x86_set_memory_region(kvm, APIC_ACCESS_PAGE_PRIVATE_MEMSLOT,
> +					0, 0);
> +		realloc = true;
> +		vm_data->avic_max_vcpu_id = 0;
> +	}
> +
> +	/*
> +	 * We are allocating vAPIC backing page
> +	 * upto the max vCPU ID
> +	 */
> +	if (id >= vm_data->avic_max_vcpu_id) {
> +		ret = __x86_set_memory_region(kvm,
> +					      APIC_ACCESS_PAGE_PRIVATE_MEMSLOT,
> +					      APIC_DEFAULT_PHYS_BASE,
> +					      PAGE_SIZE * (id + 1));

Why is this necessary?  The APIC access page is a peculiarity of Intel
processors (and the special memslot for only needs to map 0xfee00000 to
0xfee00fff; after that there is the MSI area).

> +		if (ret)
> +			goto out;
> +
> +		vm_data->avic_max_vcpu_id = id;
> +	}
> +
> +	/* Reinit vAPIC backing page for exisinting vcpus */
> +	if (realloc)
> +		kvm_for_each_vcpu(i, vcpu, kvm)
> +			avic_init_bk_page(vcpu);

Why is this necessary?

> +	avic_init_bk_page(&svm->vcpu);
> +
> +	kvm->arch.apic_access_page_done = true;
> +
> +out:
> +	mutex_unlock(&kvm->slots_lock);
> +	return ret;
> +}
> +
> +static void avic_vm_uninit(struct kvm *kvm)
> +{
> +	struct svm_vm_data *vm_data = kvm->arch.arch_data;
> +
> +	if (!vm_data)
> +		return;
> +
> +	if (vm_data->avic_log_ait_page)
> +		__free_page(vm_data->avic_log_ait_page);
> +	if (vm_data->avic_phy_ait_page)
> +		__free_page(vm_data->avic_phy_ait_page);
> +	kfree(vm_data);
> +	kvm->arch.arch_data = NULL;
> +}
> +
> +static void avic_vcpu_uninit(struct kvm_vcpu *vcpu)
> +{
> +	struct vcpu_svm *svm = to_svm(vcpu);
> +	struct svm_vm_data *vm_data = vcpu->kvm->arch.arch_data;
> +
> +	if (!avic)
> +		return;
> +
> +	avic_unalloc_bk_page(vcpu);
> +	svm->vcpu.arch.apic->regs = svm->in_kernel_lapic_regs;
> +	svm->in_kernel_lapic_regs = NULL;
> +
> +	if (vm_data &&
> +	    (atomic_read(&vm_data->count) == 0 ||
> +	     atomic_dec_and_test(&vm_data->count)))
> +		avic_vm_uninit(vcpu->kvm);

Add a new kvm_x86_ops callback .free_vcpus and call it from
kvm_free_vcpus; then count is not necessary.

We probably should use it for Intel too.  The calls to
x86_set_memory_region in kvm_arch_destroy_vm are hideous.

> +}
> +
> +static atomic_t avic_tag_gen = ATOMIC_INIT(1);
> +
> +static inline u32 avic_get_next_tag(void)
> +{
> +	u32 tag = atomic_read(&avic_tag_gen);
> +
> +	atomic_inc(&avic_tag_gen);
> +	return tag;
> +}
> +
> +static int avic_vm_init(struct kvm *kvm)
> +{
> +	int err = -ENOMEM;
> +	struct svm_vm_data *vm_data;
> +	struct page *avic_phy_ait_page;
> +	struct page *avic_log_ait_page;

This is probably also best moved to a new callback .init_vm (called from
kvm_arch_init_vm).

> +	vm_data = kzalloc(sizeof(struct svm_vm_data),
> +				      GFP_KERNEL);
> +	if (!vm_data)
> +		return err;
> +
> +	kvm->arch.arch_data = vm_data;
> +	atomic_set(&vm_data->count, 0);
> +
> +	/* Allocating physical APIC ID table (4KB) */
> +	avic_phy_ait_page = alloc_page(GFP_KERNEL);
> +	if (!avic_phy_ait_page)
> +		goto free_avic;
> +
> +	vm_data->avic_phy_ait_page = avic_phy_ait_page;
> +	clear_page(page_address(avic_phy_ait_page));

You can use get_zeroed_page and free_page, instead of
alloc_page/clear_page/__free_page.

Thanks,

Paolo

> +	/* Allocating logical APIC ID table (4KB) */
> +	avic_log_ait_page = alloc_page(GFP_KERNEL);
> +	if (!avic_log_ait_page)
> +		goto free_avic;
> +
> +	vm_data->avic_log_ait_page = avic_log_ait_page;
> +	clear_page(page_address(avic_log_ait_page));
> +
> +	vm_data->avic_tag = avic_get_next_tag();
> +
> +	return 0;
> +
> +free_avic:
> +	avic_vm_uninit(kvm);
> +	return err;
> +}
> +
> +static int avic_vcpu_init(struct kvm *kvm, struct vcpu_svm *svm, int id)
> +{
> +	int err;
> +	struct svm_vm_data *vm_data = NULL;
> +
> +	/* Note: svm_vm_data is per VM */
> +	if (!kvm->arch.arch_data) {
> +		err = avic_vm_init(kvm);
> +		if (err)
> +			return err;
> +	}
> +
> +	err = avic_alloc_bk_page(svm, id);
> +	if (err) {
> +		avic_vcpu_uninit(&svm->vcpu);
> +		return err;
> +	}
> +
> +	vm_data = kvm->arch.arch_data;
> +	atomic_inc(&vm_data->count);
> +
> +	return 0;
>  }
>  
>  static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
> @@ -1131,6 +1510,9 @@ static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
>  
>  	kvm_cpuid(vcpu, &eax, &dummy, &dummy, &dummy);
>  	kvm_register_write(vcpu, VCPU_REGS_RDX, eax);
> +
> +	if (avic && !init_event)
> +		avic_update_vapic_bar(svm, APIC_DEFAULT_PHYS_BASE);
>  }
>  
>  static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id)
> @@ -1169,6 +1551,12 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id)
>  	if (!hsave_page)
>  		goto free_page3;
>  
> +	if (avic) {
> +		err = avic_vcpu_init(kvm, svm, id);
> +		if (err)
> +			goto free_page4;
> +	}
> +
>  	svm->nested.hsave = page_address(hsave_page);
>  
>  	svm->msrpm = page_address(msrpm_pages);
> @@ -1187,6 +1575,8 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id)
>  
>  	return &svm->vcpu;
>  
> +free_page4:
> +	__free_page(hsave_page);
>  free_page3:
>  	__free_pages(nested_msrpm_pages, MSRPM_ALLOC_ORDER);
>  free_page2:
> @@ -1209,6 +1599,7 @@ static void svm_free_vcpu(struct kvm_vcpu *vcpu)
>  	__free_pages(virt_to_page(svm->msrpm), MSRPM_ALLOC_ORDER);
>  	__free_page(virt_to_page(svm->nested.hsave));
>  	__free_pages(virt_to_page(svm->nested.msrpm), MSRPM_ALLOC_ORDER);
> +	avic_vcpu_uninit(vcpu);
>  	kvm_vcpu_uninit(vcpu);
>  	kmem_cache_free(kvm_vcpu_cache, svm);
>  }
> @@ -3382,6 +3773,7 @@ static void dump_vmcb(struct kvm_vcpu *vcpu)
>  	pr_err("%-20s%08x\n", "exit_int_info_err:", control->exit_int_info_err);
>  	pr_err("%-20s%lld\n", "nested_ctl:", control->nested_ctl);
>  	pr_err("%-20s%016llx\n", "nested_cr3:", control->nested_cr3);
> +	pr_err("%-20s%016llx\n", "avic_vapic_bar:", control->avic_vapic_bar);
>  	pr_err("%-20s%08x\n", "event_inj:", control->event_inj);
>  	pr_err("%-20s%08x\n", "event_inj_err:", control->event_inj_err);
>  	pr_err("%-20s%lld\n", "lbr_ctl:", control->lbr_ctl);
> @@ -3613,11 +4005,31 @@ static void svm_set_virtual_x2apic_mode(struct kvm_vcpu *vcpu, bool set)
>  
>  static bool svm_get_enable_apicv(void)
>  {
> -	return false;
> +	return avic;
> +}
> +
> +static void svm_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr)
> +{
>  }
>  
> +static void svm_hwapic_isr_update(struct kvm *kvm, int isr)
> +{
> +}
> +
> +/* Note: Currently only used by Hyper-V. */
>  static void svm_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
>  {
> +	struct vcpu_svm *svm = to_svm(vcpu);
> +	struct vmcb *vmcb = svm->vmcb;
> +
> +	if (!avic)
> +		return;
> +
> +	if (!svm->in_kernel_lapic_regs)
> +		return;
> +
> +	svm->vcpu.arch.apic->regs = svm->in_kernel_lapic_regs;
> +	vmcb->control.avic_enable = 0;
>  }
>  
>  static void svm_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)
> @@ -4387,6 +4799,8 @@ static struct kvm_x86_ops svm_x86_ops = {
>  	.refresh_apicv_exec_ctrl = svm_refresh_apicv_exec_ctrl,
>  	.load_eoi_exitmap = svm_load_eoi_exitmap,
>  	.sync_pir_to_irr = svm_sync_pir_to_irr,
> +	.hwapic_irr_update = svm_hwapic_irr_update,
> +	.hwapic_isr_update = svm_hwapic_isr_update,
>  
>  	.set_tss_addr = svm_set_tss_addr,
>  	.get_tdp_level = get_npt_level,
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PART1 RFC v2 06/10] svm: Add interrupt injection via AVIC
  2016-03-07 15:36   ` Paolo Bonzini
@ 2016-03-08 21:54     ` Radim Krčmář
  2016-03-09 11:10       ` Paolo Bonzini
  2016-03-14  5:25     ` Suravee Suthikulpanit
  1 sibling, 1 reply; 54+ messages in thread
From: Radim Krčmář @ 2016-03-08 21:54 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Suravee Suthikulpanit, joro, bp, gleb, alex.williamson, kvm,
	linux-kernel, wei, sherry.hurwitz

2016-03-07 16:36+0100, Paolo Bonzini:
> On 04/03/2016 21:46, Suravee Suthikulpanit wrote:
>> +static void svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
>> +{
>> +	struct vcpu_svm *svm = to_svm(vcpu);
>> +
>> +	kvm_lapic_set_vector(vec, avic_get_bk_page_entry(svm, APIC_IRR));

(I think that smp_mb here would make sense, even though we're fine now
 thanks to re-checking vcpu->mode in kvm_vcpu_kick.
 A comment explaining this optimization would be nice.  I'm thinking
 about a race where we don't send the doorbell even though the VCPU is
 in guest mode, because vcpu->mode was read before writing APIC_IRR.)

>> +
>> +	if (vcpu->mode == IN_GUEST_MODE) {
>> +		wrmsrl(SVM_AVIC_DOORBELL,
>> +		       __default_cpu_present_to_apicid(vcpu->cpu));
>> +	} else {
>> +		kvm_vcpu_kick(vcpu);
>> +	}
> 
> You also need to add
> 
> 	kvm_make_request(KVM_REQ_EVENT, vcpu);
> 
> before the "if", similar to vmx_deliver_posted_interrupt.

KVM won't do anything in KVM_REQ_EVENT and I think that the request can
be avoided because KVM already has to handle IRR writes from AVIC.

And what about
  [...]
  else if (!vcpu->...->is_running)
  	kvm_vcpu_kick(vcpu);

?
The kick isn't needed unless the VCPU is scheduled out.

Or maybe just
  if (vcpu->...->is_running)
    wrmsrl()
  else
    kvm_vcpu_kick();
?
Which doesn't use the information we have on top AVIC, making our logic
a bit simpler.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PART1 RFC v2 07/10] svm: Add VMEXIT handlers for AVIC
  2016-03-07 15:58   ` Paolo Bonzini
@ 2016-03-08 22:05     ` Radim Krčmář
  2016-03-09 10:56       ` Paolo Bonzini
  0 siblings, 1 reply; 54+ messages in thread
From: Radim Krčmář @ 2016-03-08 22:05 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Suravee Suthikulpanit, joro, bp, gleb, alex.williamson, kvm,
	linux-kernel, wei, sherry.hurwitz

2016-03-07 16:58+0100, Paolo Bonzini:
>> +	case AVIC_INCMP_IPI_ERR_INV_TARGET:
>> +		pr_err("%s: Invalid IPI target (icr=%#08x:%08x, idx=%u)\n",
>> +		       __func__, icrh, icrl, index);
>> +		BUG();
>> +		break;
>> +	case AVIC_INCMP_IPI_ERR_INV_BK_PAGE:
>> +		pr_err("%s: Invalid bk page (icr=%#08x:%08x, idx=%u)\n",
>> +		       __func__, icrh, icrl, index);
>> +		BUG();
>> +		break;
> 
> Please use WARN(1, "%s: Invalid bk page (icr=%#08x:%08x, idx=%u)\n",
> __func__, icrh, icrl, index) (and likewise for invalid target) instead
> of BUG().

I think that if we hit one of these, then WARNs would just flood the
log.  I'd prefer WARN_ONCE on AVIC_INCMP_IPI_ERR_INV_BK_PAGE.
(Btw. aren't icr and idx are pointless on this error?  and the function
 name should be printed by WARN.)

Invalid target is triggerable by the guest (by sending IPI to a
non-existent LAPIC), so warning log level seems too severe.
pr_info_ratelimited() or nothing would be better.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PART1 RFC v2 07/10] svm: Add VMEXIT handlers for AVIC
  2016-03-08 22:05     ` Radim Krčmář
@ 2016-03-09 10:56       ` Paolo Bonzini
  0 siblings, 0 replies; 54+ messages in thread
From: Paolo Bonzini @ 2016-03-09 10:56 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: Suravee Suthikulpanit, joro, bp, gleb, alex.williamson, kvm,
	linux-kernel, wei, sherry.hurwitz



On 08/03/2016 23:05, Radim Krčmář wrote:
>>> >> +	case AVIC_INCMP_IPI_ERR_INV_TARGET:
>>> >> +		pr_err("%s: Invalid IPI target (icr=%#08x:%08x, idx=%u)\n",
>>> >> +		       __func__, icrh, icrl, index);
>>> >> +		BUG();
>>> >> +		break;
>>> >> +	case AVIC_INCMP_IPI_ERR_INV_BK_PAGE:
>>> >> +		pr_err("%s: Invalid bk page (icr=%#08x:%08x, idx=%u)\n",
>>> >> +		       __func__, icrh, icrl, index);
>>> >> +		BUG();
>>> >> +		break;
>> > 
>> > Please use WARN(1, "%s: Invalid bk page (icr=%#08x:%08x, idx=%u)\n",
>> > __func__, icrh, icrl, index) (and likewise for invalid target) instead
>> > of BUG().
> I think that if we hit one of these, then WARNs would just flood the
> log.  I'd prefer WARN_ONCE on AVIC_INCMP_IPI_ERR_INV_BK_PAGE.
> (Btw. aren't icr and idx are pointless on this error?  and the function
>  name should be printed by WARN.)

Agreed.

> Invalid target is triggerable by the guest (by sending IPI to a
> non-existent LAPIC), so warning log level seems too severe.
> pr_info_ratelimited() or nothing would be better.

Definitely should be nothing.

Paolo

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PART1 RFC v2 06/10] svm: Add interrupt injection via AVIC
  2016-03-08 21:54     ` Radim Krčmář
@ 2016-03-09 11:10       ` Paolo Bonzini
  2016-03-09 16:00         ` Radim Krčmář
  2016-03-14  9:50         ` Suravee Suthikulpanit
  0 siblings, 2 replies; 54+ messages in thread
From: Paolo Bonzini @ 2016-03-09 11:10 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: Suravee Suthikulpanit, joro, bp, gleb, alex.williamson, kvm,
	linux-kernel, wei, sherry.hurwitz



On 08/03/2016 22:54, Radim Krčmář wrote:
> 2016-03-07 16:36+0100, Paolo Bonzini:
>> On 04/03/2016 21:46, Suravee Suthikulpanit wrote:
>>> +static void svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
>>> +{
>>> +	struct vcpu_svm *svm = to_svm(vcpu);
>>> +
>>> +	kvm_lapic_set_vector(vec, avic_get_bk_page_entry(svm, APIC_IRR));
> 
> (I think that smp_mb here would make sense, even though we're fine now
>  thanks to re-checking vcpu->mode in kvm_vcpu_kick.

Right, though only a smp_mb__after_atomic() is required (which is a
compiler barrier).  It is similarly required in vmx.

Offtopic remark, kvm_make_request() and kvm_check_request() should
probably have a smp_wmb() and a smp_rmb().

>>> +
>>> +	if (vcpu->mode == IN_GUEST_MODE) {
>>> +		wrmsrl(SVM_AVIC_DOORBELL,
>>> +		       __default_cpu_present_to_apicid(vcpu->cpu));
>>> +	} else {
>>> +		kvm_vcpu_kick(vcpu);
>>> +	}
>>
>> You also need to add
>>
>> 	kvm_make_request(KVM_REQ_EVENT, vcpu);
>>
>> before the "if", similar to vmx_deliver_posted_interrupt.
> 
> KVM won't do anything in KVM_REQ_EVENT and I think that the request can
> be avoided because KVM already has to handle IRR writes from AVIC.

Doh, I've made the same remark already and you've given the same answer. :)

> And what about
>   [...]
>   else if (!vcpu->...->is_running)
>   	kvm_vcpu_kick(vcpu);
> 
> ?
> The kick isn't needed unless the VCPU is scheduled out.
> Or maybe just
>   if (vcpu->...->is_running)
>     wrmsrl()
>   else
>     kvm_vcpu_kick();
> ?
> Which doesn't use the information we have on top AVIC, making our logic
> a bit simpler.

Yes, both of this should work.  I like the latter.

In any case, the smp_mb__after_atomic() is necessary.

Paolo

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PART1 RFC v2 06/10] svm: Add interrupt injection via AVIC
  2016-03-09 11:10       ` Paolo Bonzini
@ 2016-03-09 16:00         ` Radim Krčmář
  2016-03-14  9:41           ` Suravee Suthikulpanit
  2016-03-14  9:50         ` Suravee Suthikulpanit
  1 sibling, 1 reply; 54+ messages in thread
From: Radim Krčmář @ 2016-03-09 16:00 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Suravee Suthikulpanit, joro, bp, gleb, alex.williamson, kvm,
	linux-kernel, wei, sherry.hurwitz

2016-03-09 12:10+0100, Paolo Bonzini:
> On 08/03/2016 22:54, Radim Krčmář wrote:
>> 2016-03-07 16:36+0100, Paolo Bonzini:
>>> On 04/03/2016 21:46, Suravee Suthikulpanit wrote:
>>>> +static void svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
>>>> +{
>>>> +	struct vcpu_svm *svm = to_svm(vcpu);
>>>> +
>>>> +	kvm_lapic_set_vector(vec, avic_get_bk_page_entry(svm, APIC_IRR));
>> 
>> (I think that smp_mb here would make sense, even though we're fine now
>>  thanks to re-checking vcpu->mode in kvm_vcpu_kick.
> 
> Right, though only a smp_mb__after_atomic() is required (which is a
> compiler barrier).  It is similarly required in vmx.

True, kvm_lapic_set_vector uses a lock prefix.

(I thought it behaves like atomic_set, which would require MFENCE for
 correct ordering here ... I don't like smp_mb__after_atomic much
 because of the discrepancy on some atomic operations.)

> Offtopic remark, kvm_make_request() and kvm_check_request() should
> probably have a smp_wmb() and a smp_rmb().

Yeah, noted.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PART1 RFC v2 07/10] svm: Add VMEXIT handlers for AVIC
  2016-03-04 20:46 ` [PART1 RFC v2 07/10] svm: Add VMEXIT handlers for AVIC Suravee Suthikulpanit
  2016-03-07 15:58   ` Paolo Bonzini
@ 2016-03-09 20:55   ` Radim Krčmář
  2016-03-10 19:34     ` Radim Krčmář
                       ` (2 more replies)
  1 sibling, 3 replies; 54+ messages in thread
From: Radim Krčmář @ 2016-03-09 20:55 UTC (permalink / raw)
  To: Suravee Suthikulpanit
  Cc: pbonzini, joro, bp, gleb, alex.williamson, kvm, linux-kernel,
	wei, sherry.hurwitz

2016-03-04 14:46-0600, Suravee Suthikulpanit:
> From: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> 
> Introduce VMEXIT handlers, avic_incp_ipi_interception() and
> avic_noaccel_interception().
> 
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> ---
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> @@ -3690,6 +3690,264 @@ static int mwait_interception(struct vcpu_svm *svm)
> +	case AVIC_INCMP_IPI_ERR_TARGET_NOT_RUN: {
> +		kvm_for_each_vcpu(i, vcpu, kvm) {
> +			if (!kvm_apic_match_dest(vcpu, apic,
> +						 icrl & APIC_SHORT_MASK,
> +						 GET_APIC_DEST_FIELD(icrh),
> +						 icrl & APIC_DEST_MASK))
> +				continue;
> +
> +			kvm_vcpu_kick(vcpu);

KVM shouldn't kick VCPUs that are running.  (Imagine a broadcast when
most VCPUs are in guest mode.)

I think a new helper might be useful here: we only want to wake up from
wait queue, but never force VCPU out of guest mode ... kvm_vcpu_kick()
does both.

> +static int avic_noaccel_trap_write(struct vcpu_svm *svm)
> +{
> +	switch (offset) {
> +	case APIC_ID: {
> +	case APIC_LDR: {
> +	case APIC_DFR: {
> +	}

It's not enough to modify the AVIC map here.  Userspace can also change
the APIC page with kvm_vcpu_ioctl_set_lapic, so AVIC would better hook
into some common path.

I think that AVIC map should be connected to recalculate_apic_map() and
'struct kvm_apic_map' as we already have the mode and a coupling of
LAPICs and VCPUs there.

recalculate_apic_map() is currently quite wasteful as it recomputes the
whole map on every change, but its simplicity should be bearable.

> +static int avic_noaccel_interception(struct vcpu_svm *svm)
> +{
> +	int ret = 0;
> +	u32 offset = svm->vmcb->control.exit_info_1 & 0xFF0;
> +	u32 rw = (svm->vmcb->control.exit_info_1 >> 32) & 0x1;

Change "u32 rw" to "bool write"

> +	u32 vector = svm->vmcb->control.exit_info_2 & 0xFFFFFFFF;

and please #define those masks.

> +	pr_debug("%s: offset=%#x, rw=%#x, vector=%#x, vcpu_id=%#x, cpu=%#x\n",
> +		 __func__, offset, rw, vector, svm->vcpu.vcpu_id, svm->vcpu.cpu);
> +
> +	BUG_ON(offset >= 0x400);

These are valid faulting registers, so our implementation has to handle
them.  (And the rule is to never BUG if a recovery is simple.)

> +	switch (offset) {
> +	case APIC_ID:
> +	case APIC_EOI:
> +	case APIC_RRR:
> +	case APIC_LDR:
> +	case APIC_DFR:
> +	case APIC_SPIV:
> +	case APIC_ESR:
> +	case APIC_ICR:
> +	case APIC_LVTT:
> +	case APIC_LVTTHMR:
> +	case APIC_LVTPC:
> +	case APIC_LVT0:
> +	case APIC_LVT1:
> +	case APIC_LVTERR:
> +	case APIC_TMICT:
> +	case APIC_TDCR: {

(Try a helper that returns true/false for trap/fault registers, the code
 might look nicer.)

> +		/* Handling Trap */
> +		if (!rw) /* Trap read should never happens */
> +			BUG();
> +		ret = avic_noaccel_trap_write(svm);
> +		break;
> +	}
> +	default: {
> +		/* Handling Fault */
> +		if (rw)
> +			ret = avic_noaccel_fault_write(svm);
> +		else
> +			ret = avic_noaccel_fault_read(svm);
> +		skip_emulated_instruction(&svm->vcpu);

AVIC doesn't tell us what it wanted to write, so KVM has to emulate the
instruction.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PART1 RFC v2 10/10] svm: Manage vcpu load/unload when enable AVIC
  2016-03-04 20:46 ` [PART1 RFC v2 10/10] svm: Manage vcpu load/unload " Suravee Suthikulpanit
@ 2016-03-09 21:46   ` Radim Krčmář
  2016-03-10 14:01     ` Radim Krčmář
  2016-03-14 11:48     ` Suravee Suthikulpanit
  0 siblings, 2 replies; 54+ messages in thread
From: Radim Krčmář @ 2016-03-09 21:46 UTC (permalink / raw)
  To: Suravee Suthikulpanit
  Cc: pbonzini, joro, bp, gleb, alex.williamson, kvm, linux-kernel,
	wei, sherry.hurwitz

2016-03-04 14:46-0600, Suravee Suthikulpanit:
> From: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> 
> When a vcpu is loaded/unloaded to a physical core, we need to update
> information in the Physical APIC-ID table accordingly.
> 
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> ---
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> @@ -1508,6 +1510,146 @@ static int avic_vcpu_init(struct kvm *kvm, struct vcpu_svm *svm, int id)
> +static inline int
> +avic_update_iommu(struct kvm_vcpu *vcpu, int cpu, phys_addr_t pa, bool r)
> +{
> +	if (!kvm_arch_has_assigned_device(vcpu->kvm))
> +		return 0;
> +
> +	/* TODO: We will hook up with IOMMU API at later time */

(It'd be best to drop avic_update_iommu from this series and introduce
 it when merging iommu support.)

> +	return 0;
> +}
> +
> +static int avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu, bool is_load)

This function does a lot and there is only one thing that must be done
in svm_vcpu_load:  change host physical APIC ID if the CPU changed.
The rest can be done elsewhere:
 - is_running when blocking.
 - kb_pg_ptr when the pointer changes = only on initialization?
 - valid when the kb_pg_ptr is valid = always for existing VCPUs?

> +static int avic_set_running(struct kvm_vcpu *vcpu, bool is_run)

avic_set_running should get address of the entry and write is_run to it.
(No need to touch avic_bk_page, h_phy_apic_id or do bound checks.)

I think you can cache the physical apic id table entry in *vcpu, so both
functions are going to be very simple.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PART1 RFC v2 10/10] svm: Manage vcpu load/unload when enable AVIC
  2016-03-09 21:46   ` Radim Krčmář
@ 2016-03-10 14:01     ` Radim Krčmář
  2016-03-14 11:58       ` Suravee Suthikulpanit
  2016-03-14 11:48     ` Suravee Suthikulpanit
  1 sibling, 1 reply; 54+ messages in thread
From: Radim Krčmář @ 2016-03-10 14:01 UTC (permalink / raw)
  To: Suravee Suthikulpanit
  Cc: pbonzini, joro, bp, gleb, alex.williamson, kvm, linux-kernel,
	wei, sherry.hurwitz

2016-03-09 22:46+0100, Radim Krčmář:
> 2016-03-04 14:46-0600, Suravee Suthikulpanit:
>> From: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
>> 
>> When a vcpu is loaded/unloaded to a physical core, we need to update
>> information in the Physical APIC-ID table accordingly.
>> 
>> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
>> ---
>> +static int avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu, bool is_load)
> 
> This function does a lot and there is only one thing that must be done
> in svm_vcpu_load:  change host physical APIC ID if the CPU changed.
> The rest can be done elsewhere:
>  - is_running when blocking.

Well, we haven't reached an agreement on is_running yet.  The situation:
if we don't unset vcpu1.is_running when vcpu1 is scheduled out and vcpu2
gets scheduled on vcpu1's physical core, then vcpu2 would receive a
doorbell intended to vcpu1.

We'd like to keep is_running set when there is no reason to vmexit, but
not if a guest can negatively affect other guests.
How does receiving a stray doorbell affect the performance?

Thanks.

(Toggling is_running on load/put is definitely safer, so it's a good
 choice for first version.)

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PART1 RFC v2 07/10] svm: Add VMEXIT handlers for AVIC
  2016-03-09 20:55   ` Radim Krčmář
@ 2016-03-10 19:34     ` Radim Krčmář
  2016-03-10 19:54       ` Paolo Bonzini
  2016-03-17  3:58     ` Suravee Suthikulpanit
  2016-03-17 19:44     ` Suravee Suthikulpanit
  2 siblings, 1 reply; 54+ messages in thread
From: Radim Krčmář @ 2016-03-10 19:34 UTC (permalink / raw)
  To: Suravee Suthikulpanit
  Cc: pbonzini, joro, bp, gleb, alex.williamson, kvm, linux-kernel,
	wei, sherry.hurwitz

2016-03-09 21:55+0100, Radim Krčmář:
> 2016-03-04 14:46-0600, Suravee Suthikulpanit:
>> +static int avic_noaccel_interception(struct vcpu_svm *svm)
>> +{
>> +	int ret = 0;
>> +	u32 offset = svm->vmcb->control.exit_info_1 & 0xFF0;
>> +	u32 rw = (svm->vmcb->control.exit_info_1 >> 32) & 0x1;
> 
> Change "u32 rw" to "bool write"
> 
>> +	u32 vector = svm->vmcb->control.exit_info_2 & 0xFFFFFFFF;
> 
> and please #define those masks.

I reconsidered.  Other users are being removed, so these masks will be
used only once and properly named variables are better then.

vector isn't used right now ... is the EOI vector you get by reading the
APIC page equal to it?

(Btw. edge EOI hacks for PIT and RTC won't work because AVIC doesn't
 exit when TMR=0, but I'd take it slow and tackle those after v3.)

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PART1 RFC v2 07/10] svm: Add VMEXIT handlers for AVIC
  2016-03-10 19:34     ` Radim Krčmář
@ 2016-03-10 19:54       ` Paolo Bonzini
  2016-03-10 20:44         ` Radim Krčmář
  0 siblings, 1 reply; 54+ messages in thread
From: Paolo Bonzini @ 2016-03-10 19:54 UTC (permalink / raw)
  To: Radim Krčmář, Suravee Suthikulpanit
  Cc: joro, bp, gleb, alex.williamson, kvm, linux-kernel, wei, sherry.hurwitz



On 10/03/2016 20:34, Radim Krčmář wrote:
> (Btw. edge EOI hacks for PIT and RTC won't work because AVIC doesn't
>  exit when TMR=0, but I'd take it slow and tackle those after v3.)

I'd just only enable AVIC for split irqchip...

Paolo

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PART1 RFC v2 07/10] svm: Add VMEXIT handlers for AVIC
  2016-03-10 19:54       ` Paolo Bonzini
@ 2016-03-10 20:44         ` Radim Krčmář
  0 siblings, 0 replies; 54+ messages in thread
From: Radim Krčmář @ 2016-03-10 20:44 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Suravee Suthikulpanit, joro, bp, gleb, alex.williamson, kvm,
	linux-kernel, wei, sherry.hurwitz

2016-03-10 20:54+0100, Paolo Bonzini:
> On 10/03/2016 20:34, Radim Krčmář wrote:
>> (Btw. edge EOI hacks for PIT and RTC won't work because AVIC doesn't
>>  exit when TMR=0, but I'd take it slow and tackle those after v3.)
> 
> I'd just only enable AVIC for split irqchip...

Great idea.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PART1 RFC v2 06/10] svm: Add interrupt injection via AVIC
  2016-03-07 15:36   ` Paolo Bonzini
  2016-03-08 21:54     ` Radim Krčmář
@ 2016-03-14  5:25     ` Suravee Suthikulpanit
  2016-03-14  8:54       ` Suravee Suthikulpanit
  1 sibling, 1 reply; 54+ messages in thread
From: Suravee Suthikulpanit @ 2016-03-14  5:25 UTC (permalink / raw)
  To: Paolo Bonzini, rkrcmar, joro, bp, gleb, alex.williamson
  Cc: kvm, linux-kernel, wei, sherry.hurwitz



On 03/07/2016 10:36 PM, Paolo Bonzini wrote:
>
>
> On 04/03/2016 21:46, Suravee Suthikulpanit wrote:
>> +static void svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
>> +{
>> +	struct vcpu_svm *svm = to_svm(vcpu);
>> +
>> +	kvm_lapic_set_vector(vec, avic_get_bk_page_entry(svm, APIC_IRR));
>> +
>> +	if (vcpu->mode == IN_GUEST_MODE) {
>> +		wrmsrl(SVM_AVIC_DOORBELL,
>> +		       __default_cpu_present_to_apicid(vcpu->cpu));
>> +	} else {
>> +		kvm_vcpu_kick(vcpu);
>> +	}
>
> You also need to add
>
> 	kvm_make_request(KVM_REQ_EVENT, vcpu);
>
> before the "if", similar to vmx_deliver_posted_interrupt.
>
> Paolo
>

Actually, I should only need that just before the kvm_cpu_kick(vcpu) 
isn't it. I don't think we need it in the case when sending doorbell.

Thanks,
Suravee

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PART1 RFC v2 09/10] svm: Do not intercept CR8 when enable AVIC
  2016-03-07 15:39   ` Paolo Bonzini
@ 2016-03-14  6:09     ` Suravee Suthikulpanit
  2016-03-14 12:28       ` Paolo Bonzini
  0 siblings, 1 reply; 54+ messages in thread
From: Suravee Suthikulpanit @ 2016-03-14  6:09 UTC (permalink / raw)
  To: Paolo Bonzini, rkrcmar, joro, bp, gleb, alex.williamson
  Cc: kvm, linux-kernel, wei, sherry.hurwitz

Hi

On 03/07/2016 10:39 PM, Paolo Bonzini wrote:
>> +		svm_x86_ops.update_cr8_intercept = NULL;
>> >  	} else {
>> >  		svm_x86_ops.deliver_posted_interrupt = NULL;
>> >  	}
>> >@@ -1116,7 +1119,8 @@ static void init_vmcb(struct vcpu_svm *svm)
>> >  	set_cr_intercept(svm, INTERCEPT_CR0_WRITE);
>> >  	set_cr_intercept(svm, INTERCEPT_CR3_WRITE);
>> >  	set_cr_intercept(svm, INTERCEPT_CR4_WRITE);
>> >-	set_cr_intercept(svm, INTERCEPT_CR8_WRITE);
>> >+	if (!avic)
> Remember that AVIC enabled/disabled must be refreshed when the
> .refresh_apicv_exec_ctrl callback is invoked, so it is not enough to use
> the global variable.
>
> Paolo
>

Good point. I'll fix this.  By the way, how can we enable APICv only in 
certain VM? Does Qemu/KVM have any specific flags that we can pass to 
enable/disable this?

Thanks,
Suravee

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PART1 RFC v2 02/10] KVM: x86: Introducing kvm_x86_ops VCPU blocking/unblocking
  2016-03-07 15:42   ` Paolo Bonzini
@ 2016-03-14  6:19     ` Suravee Suthikulpanit
  0 siblings, 0 replies; 54+ messages in thread
From: Suravee Suthikulpanit @ 2016-03-14  6:19 UTC (permalink / raw)
  To: Paolo Bonzini, rkrcmar, joro, bp, gleb, alex.williamson
  Cc: kvm, linux-kernel, wei, sherry.hurwitz

Hi

On 03/07/2016 10:42 PM, Paolo Bonzini wrote:
>
> On 04/03/2016 21:46, Suravee Suthikulpanit wrote:
>> >
>> >+static void svm_vcpu_blocking(struct kvm_vcpu *vcpu)
>> >+{
>> >+	avic_set_running(vcpu, false);
>> >+}
>> >+
>> >+static void svm_vcpu_unblocking(struct kvm_vcpu *vcpu)
>> >+{
>> >+	avic_set_running(vcpu, true);
>> >+}
>> >+
>> >  static unsigned long svm_get_rflags(struct kvm_vcpu *vcpu)
>> >  {
>> >  	return to_svm(vcpu)->vmcb->save.rflags;
>> >@@ -4321,6 +4331,8 @@ static struct kvm_x86_ops svm_x86_ops = {
>> >  	.prepare_guest_switch = svm_prepare_guest_switch,
>> >  	.vcpu_load = svm_vcpu_load,
>> >  	.vcpu_put = svm_vcpu_put,
>> >+	.vcpu_blocking = svm_vcpu_blocking,
>> >+	.vcpu_unblocking = svm_vcpu_unblocking,
> These two hunks should be added to patch 10.
>
> Paolo
>

Right. Fixed. Thanks for catching this.

Suravee

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PART1 RFC v2 03/10] svm: Introduce new AVIC VMCB registers
  2016-03-07 15:44   ` Paolo Bonzini
@ 2016-03-14  7:41     ` Suravee Suthikulpanit
  2016-03-14 12:25       ` Paolo Bonzini
  0 siblings, 1 reply; 54+ messages in thread
From: Suravee Suthikulpanit @ 2016-03-14  7:41 UTC (permalink / raw)
  To: Paolo Bonzini, rkrcmar, joro, bp, gleb, alex.williamson
  Cc: kvm, linux-kernel, wei, sherry.hurwitz

Hi,

On 03/07/2016 10:44 PM, Paolo Bonzini wrote:
>
> On 04/03/2016 21:46, Suravee Suthikulpanit wrote:
>> >From: Suravee Suthikulpanit<suravee.suthikulpanit@amd.com>
>> >
>> >Introduce new AVIC VMCB registers. Also breakdown int_ctl register
>> >into bit-field for ease of use.
>> >
>> >Signed-off-by: Suravee Suthikulpanit<suravee.suthikulpanit@amd.com>
>> >---
>> >  arch/x86/include/asm/svm.h | 29 ++++++++++++++++++++++++-----
>> >  1 file changed, 24 insertions(+), 5 deletions(-)
>> >
>> >diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
>> >index 6136d99..db5d7af 100644
>> >--- a/arch/x86/include/asm/svm.h
>> >+++ b/arch/x86/include/asm/svm.h
>> >@@ -67,10 +67,24 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
>> >  	u32 asid;
>> >  	u8 tlb_ctl;
>> >  	u8 reserved_2[3];
>> >-	u32 int_ctl;
>> >+	union { /* Offset 0x60 */
>> >+		u32 int_ctl;
>> >+
>> >+		struct __attribute__ ((__packed__)) {
>> >+			u32 v_tpr		: 8,
>> >+			    v_irq		: 1,
>> >+			    reserved_3		: 7,
>> >+			    v_intr_prio		: 4,
>> >+			    v_ign_tpr		: 1,
>> >+			    reserved_4		: 3,
>> >+			    v_intr_masking	: 1,
>> >+			    reserved_5		: 6,
>> >+			    avic_enable		: 1;
> Please do not introduce bitfields and drop patch 4.
>
> Thanks,
>
> Paolo
>

Any particular reason why you do not recommend the use of bit field?

Thanks,
Suravee

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PART1 RFC v2 06/10] svm: Add interrupt injection via AVIC
  2016-03-14  5:25     ` Suravee Suthikulpanit
@ 2016-03-14  8:54       ` Suravee Suthikulpanit
  0 siblings, 0 replies; 54+ messages in thread
From: Suravee Suthikulpanit @ 2016-03-14  8:54 UTC (permalink / raw)
  To: Paolo Bonzini, rkrcmar, joro, bp, gleb, alex.williamson
  Cc: kvm, linux-kernel, wei, sherry.hurwitz

Hi,

Please ignore this. I should have read the whole thread before reply :(
Sorry for spam.

Suravee

On 03/14/2016 12:25 PM, Suravee Suthikulpanit wrote:
>
>
> On 03/07/2016 10:36 PM, Paolo Bonzini wrote:
>>
>>
>> On 04/03/2016 21:46, Suravee Suthikulpanit wrote:
>>> +static void svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
>>> +{
>>> +    struct vcpu_svm *svm = to_svm(vcpu);
>>> +
>>> +    kvm_lapic_set_vector(vec, avic_get_bk_page_entry(svm, APIC_IRR));
>>> +
>>> +    if (vcpu->mode == IN_GUEST_MODE) {
>>> +        wrmsrl(SVM_AVIC_DOORBELL,
>>> +               __default_cpu_present_to_apicid(vcpu->cpu));
>>> +    } else {
>>> +        kvm_vcpu_kick(vcpu);
>>> +    }
>>
>> You also need to add
>>
>>     kvm_make_request(KVM_REQ_EVENT, vcpu);
>>
>> before the "if", similar to vmx_deliver_posted_interrupt.
>>
>> Paolo
>>
>
> Actually, I should only need that just before the kvm_cpu_kick(vcpu)
> isn't it. I don't think we need it in the case when sending doorbell.
>
> Thanks,
> Suravee

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PART1 RFC v2 06/10] svm: Add interrupt injection via AVIC
  2016-03-09 16:00         ` Radim Krčmář
@ 2016-03-14  9:41           ` Suravee Suthikulpanit
  2016-03-14 12:27             ` Paolo Bonzini
  0 siblings, 1 reply; 54+ messages in thread
From: Suravee Suthikulpanit @ 2016-03-14  9:41 UTC (permalink / raw)
  To: Radim Krčmář, Paolo Bonzini
  Cc: joro, bp, gleb, alex.williamson, kvm, linux-kernel, wei, sherry.hurwitz

Hi

On 03/09/2016 11:00 PM, Radim Krčmář wrote:
> 2016-03-09 12:10+0100, Paolo Bonzini:
>> On 08/03/2016 22:54, Radim Krčmář wrote:
>>> 2016-03-07 16:36+0100, Paolo Bonzini:
>>>> On 04/03/2016 21:46, Suravee Suthikulpanit wrote:
>>>>> +static void svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
>>>>> +{
>>>>> +	struct vcpu_svm *svm = to_svm(vcpu);
>>>>> +
>>>>> +	kvm_lapic_set_vector(vec, avic_get_bk_page_entry(svm, APIC_IRR));
>>>
>>> (I think that smp_mb here would make sense, even though we're fine now
>>>   thanks to re-checking vcpu->mode in kvm_vcpu_kick.
>>
>> Right, though only a smp_mb__after_atomic() is required (which is a
>> compiler barrier).  It is similarly required in vmx.
>
> True, kvm_lapic_set_vector uses a lock prefix.
>
> (I thought it behaves like atomic_set, which would require MFENCE for
>   correct ordering here ... I don't like smp_mb__after_atomic much
>   because of the discrepancy on some atomic operations.)
>

So, should i just use smb_mb() in this case?

Suravee.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PART1 RFC v2 06/10] svm: Add interrupt injection via AVIC
  2016-03-09 11:10       ` Paolo Bonzini
  2016-03-09 16:00         ` Radim Krčmář
@ 2016-03-14  9:50         ` Suravee Suthikulpanit
  1 sibling, 0 replies; 54+ messages in thread
From: Suravee Suthikulpanit @ 2016-03-14  9:50 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář
  Cc: joro, bp, gleb, alex.williamson, kvm, linux-kernel, wei, sherry.hurwitz

Hi,

On 03/09/2016 06:10 PM, Paolo Bonzini wrote:
>>>> +
>>>> >>>+	if (vcpu->mode == IN_GUEST_MODE) {
>>>> >>>+		wrmsrl(SVM_AVIC_DOORBELL,
>>>> >>>+		       __default_cpu_present_to_apicid(vcpu->cpu));
>>>> >>>+	} else {
>>>> >>>+		kvm_vcpu_kick(vcpu);
>>>> >>>+	}
 >> >
>> >And what about
>> >   [...]
>> >   else if (!vcpu->...->is_running)
>> >   	kvm_vcpu_kick(vcpu);
>> >
>> >?
>> >The kick isn't needed unless the VCPU is scheduled out.
>> >Or maybe just
>> >   if (vcpu->...->is_running)
>> >     wrmsrl()
>> >   else
>> >     kvm_vcpu_kick();
>> >?
>> >Which doesn't use the information we have on top AVIC, making our logic
>> >a bit simpler.
> Yes, both of this should work.  I like the latter.

Ok, I'll modify this to check the is_running bit of the AVIC Physical 
APIC ID table entry to determine if we should kick vcpu.

Thanks,
Suravee

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PART1 RFC v2 10/10] svm: Manage vcpu load/unload when enable AVIC
  2016-03-09 21:46   ` Radim Krčmář
  2016-03-10 14:01     ` Radim Krčmář
@ 2016-03-14 11:48     ` Suravee Suthikulpanit
  2016-03-14 16:40       ` Radim Krčmář
  1 sibling, 1 reply; 54+ messages in thread
From: Suravee Suthikulpanit @ 2016-03-14 11:48 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: pbonzini, joro, bp, gleb, alex.williamson, kvm, linux-kernel,
	wei, sherry.hurwitz

Hi,

On 03/10/2016 04:46 AM, Radim Krčmář wrote:
> 2016-03-04 14:46-0600, Suravee Suthikulpanit:
>> From: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
>>
>> When a vcpu is loaded/unloaded to a physical core, we need to update
>> information in the Physical APIC-ID table accordingly.
>>
>> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
>> ---
>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>> @@ -1508,6 +1510,146 @@ static int avic_vcpu_init(struct kvm *kvm, struct vcpu_svm *svm, int id)
>> +static inline int
>> +avic_update_iommu(struct kvm_vcpu *vcpu, int cpu, phys_addr_t pa, bool r)
>> +{
>> +	if (!kvm_arch_has_assigned_device(vcpu->kvm))
>> +		return 0;
>> +
>> +	/* TODO: We will hook up with IOMMU API at later time */
>
> (It'd be best to drop avic_update_iommu from this series and introduce
>   it when merging iommu support.)
>

I just kept it there to make code merging b/w part1 and 2 that I have 
been testing simpler. I didn't think it would cause much confusion. But 
if you think that might be the case, I can drop it for now.

>> +	return 0;
>> +}
>> +
>> +static int avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu, bool is_load)
>
> This function does a lot and there is only one thing that must be done
> in svm_vcpu_load:  change host physical APIC ID if the CPU changed.
> The rest can be done elsewhere:
>   - is_running when blocking.

I added the logic here to track if the is_running is set when unloading 
since I noticed the case when the vCPU is busy doing some work for the 
guest, then get unloaded and later on get loaded w/o 
blocking/unblocking. So, in theory, it should be be set to running 
during unloaded period, and it should restore this flag if it is loaded 
again.

>   - kb_pg_ptr when the pointer changes = only on initialization?

The reason I put this here mainly because it is a convenient place to 
set the vAPIC bakcing page address since we already have to set up the 
host physical APIC id. I guess I can try setting this separately during 
vcpu create. But I don't think it would make much difference.

>   - valid when the kb_pg_ptr is valid = always for existing VCPUs?

According to the programming manual, the valid bit is set when we set 
the host physical APIC ID. However, in theory, the vAPIC backing page 
address is required for AVIC hardware to set bits in IRR register, while 
the host physical APIC ID is needed for sending doorbell to the target 
physical core. So, I would actually interpret the valid bit as it should 
be set when the vAPIC backing address is set.

In the current implementation, the valid bit is set during vcpu load, 
but is not unset it when unload. This actually reflect the 
interpretation of the description above.

If we decide to move the setting of vAPIC backing page address to the 
vcpu create phrase, we would set the valid bit at that point as well.

Please let me know if you think differently.

>> +static int avic_set_running(struct kvm_vcpu *vcpu, bool is_run)
>
> avic_set_running should get address of the entry and write is_run to it.
> (No need to touch avic_bk_page, h_phy_apic_id or do bound checks.)
>
> I think you can cache the physical apic id table entry in *vcpu, so both
> functions are going to be very simple.
>

I was actually thinking about doing this. Lemme try to come up with the 
logic to cache this.

Thanks,
Suravee

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PART1 RFC v2 10/10] svm: Manage vcpu load/unload when enable AVIC
  2016-03-10 14:01     ` Radim Krčmář
@ 2016-03-14 11:58       ` Suravee Suthikulpanit
  2016-03-14 16:54         ` Radim Krčmář
  0 siblings, 1 reply; 54+ messages in thread
From: Suravee Suthikulpanit @ 2016-03-14 11:58 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: pbonzini, joro, bp, gleb, alex.williamson, kvm, linux-kernel,
	wei, sherry.hurwitz

Hi,

On 03/10/2016 09:01 PM, Radim Krčmář wrote:
> Well, we haven't reached an agreement on is_running yet.  The situation:
> if we don't unset vcpu1.is_running when vcpu1 is scheduled out and vcpu2
> gets scheduled on vcpu1's physical core, then vcpu2 would receive a
> doorbell intended to vcpu1.

That's why, in V2, I added the logic to check if the is_running bit is 
set for the current vcpu (e.g. vcpu1) when unloaded, then restore the 
bit during loading later of if it was set during previous unloaded. This 
way, when we load the new vcpu (e.g. vcpu2), the is_running will be set 
as it was before unloading.

> We'd like to keep is_running set when there is no reason to vmexit, but
> not if a guest can negatively affect other guests.

Not sure how this can affect other guests?

> How does receiving a stray doorbell affect the performance?

As far as I know, the doorbell only affects the CPU during vmrun. Once 
received, it will check the IRR in vAPIC backing page.  So, I think if 
IRR bit is not set, the affect should be rather minimal.

Thanks,
Suravee

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PART1 RFC v2 03/10] svm: Introduce new AVIC VMCB registers
  2016-03-14  7:41     ` Suravee Suthikulpanit
@ 2016-03-14 12:25       ` Paolo Bonzini
  2016-03-15 12:51         ` Suravee Suthikulpanit
  0 siblings, 1 reply; 54+ messages in thread
From: Paolo Bonzini @ 2016-03-14 12:25 UTC (permalink / raw)
  To: Suravee Suthikulpanit, rkrcmar, joro, bp, gleb, alex.williamson
  Cc: kvm, linux-kernel, wei, sherry.hurwitz



On 14/03/2016 08:41, Suravee Suthikulpanit wrote:
> Any particular reason why you do not recommend the use of bit field?

1) The current coding style is generally not using bitfields

2) Having to review patches that change working code unrelated to AVIC

3) Most of the fields are not even used when AVIC is enabled, so the
benefit of the conversion is small.

Paolo

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PART1 RFC v2 06/10] svm: Add interrupt injection via AVIC
  2016-03-14  9:41           ` Suravee Suthikulpanit
@ 2016-03-14 12:27             ` Paolo Bonzini
  0 siblings, 0 replies; 54+ messages in thread
From: Paolo Bonzini @ 2016-03-14 12:27 UTC (permalink / raw)
  To: Suravee Suthikulpanit, Radim Krčmář
  Cc: joro, bp, gleb, alex.williamson, kvm, linux-kernel, wei, sherry.hurwitz



On 14/03/2016 10:41, Suravee Suthikulpanit wrote:
>>>>>>
>>>>>> +    kvm_lapic_set_vector(vec, avic_get_bk_page_entry(svm,
>>>>>> APIC_IRR));
>>>>
>>>> (I think that smp_mb here would make sense, even though we're fine now
>>>>   thanks to re-checking vcpu->mode in kvm_vcpu_kick.
>>>
>>> Right, though only a smp_mb__after_atomic() is required (which is a
>>> compiler barrier).  It is similarly required in vmx.
>>
>> True, kvm_lapic_set_vector uses a lock prefix.
>>
>> (I thought it behaves like atomic_set, which would require MFENCE for
>>   correct ordering here ... I don't like smp_mb__after_atomic much
>>   because of the discrepancy on some atomic operations.)
>>
> 
> So, should i just use smb_mb() in this case?

You should use smp_mb__after_atomic().

Paolo

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PART1 RFC v2 09/10] svm: Do not intercept CR8 when enable AVIC
  2016-03-14  6:09     ` Suravee Suthikulpanit
@ 2016-03-14 12:28       ` Paolo Bonzini
  0 siblings, 0 replies; 54+ messages in thread
From: Paolo Bonzini @ 2016-03-14 12:28 UTC (permalink / raw)
  To: Suravee Suthikulpanit, rkrcmar, joro, bp, gleb, alex.williamson
  Cc: kvm, linux-kernel, wei, sherry.hurwitz



On 14/03/2016 07:09, Suravee Suthikulpanit wrote:
> By the way, how can we enable APICv only in certain VM? Does Qemu/KVM
> have any specific flags that we can pass to enable/disable this?

You can use "-cpu kvm64,+hv-synic".

Paolo

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PART1 RFC v2 10/10] svm: Manage vcpu load/unload when enable AVIC
  2016-03-14 11:48     ` Suravee Suthikulpanit
@ 2016-03-14 16:40       ` Radim Krčmář
  0 siblings, 0 replies; 54+ messages in thread
From: Radim Krčmář @ 2016-03-14 16:40 UTC (permalink / raw)
  To: Suravee Suthikulpanit
  Cc: pbonzini, joro, bp, gleb, alex.williamson, kvm, linux-kernel,
	wei, sherry.hurwitz

2016-03-14 18:48+0700, Suravee Suthikulpanit:
> On 03/10/2016 04:46 AM, Radim Krčmář wrote:
>>2016-03-04 14:46-0600, Suravee Suthikulpanit:
>>>From: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
>>>
>>>When a vcpu is loaded/unloaded to a physical core, we need to update
>>>information in the Physical APIC-ID table accordingly.
>>>
>>>Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
>>>---
>>>diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>>>@@ -1508,6 +1510,146 @@ static int avic_vcpu_init(struct kvm *kvm, struct vcpu_svm *svm, int id)
>>>+static inline int
>>>+avic_update_iommu(struct kvm_vcpu *vcpu, int cpu, phys_addr_t pa, bool r)
>>>+{
>>>+	if (!kvm_arch_has_assigned_device(vcpu->kvm))
>>>+		return 0;
>>>+
>>>+	/* TODO: We will hook up with IOMMU API at later time */
>>
>>(It'd be best to drop avic_update_iommu from this series and introduce
>>  it when merging iommu support.)
>>
> 
> I just kept it there to make code merging b/w part1 and 2 that I have been
> testing simpler. I didn't think it would cause much confusion. But if you
> think that might be the case, I can drop it for now.

The iommu part might end up having different requirements for this
function, so this husk can only add work when compared to waiting.
And avic_update_iommu is logically separable, so it would be nicer as a
short separate patch anyway.  (It's not a problem if you leave it.)

>>>+static int avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu, bool is_load)
>>
>>This function does a lot and there is only one thing that must be done
>>in svm_vcpu_load:  change host physical APIC ID if the CPU changed.
>>The rest can be done elsewhere:
>>  - is_running when blocking.
> 
> I added the logic here to track if the is_running is set when unloading
> since I noticed the case when the vCPU is busy doing some work for the
> guest, then get unloaded and later on get loaded w/o blocking/unblocking.
> So, in theory, it should be be set to running during unloaded period, and it
> should restore this flag if it is loaded again.

(A second mail will be related to this.)

>>  - kb_pg_ptr when the pointer changes = only on initialization?
> 
> The reason I put this here mainly because it is a convenient place to set
> the vAPIC bakcing page address since we already have to set up the host
> physical APIC id. I guess I can try setting this separately during vcpu
> create. But I don't think it would make much difference.

vcpu_load isn't as hot vcpu_run, but it's still called often and the
most useful optimization is to avoid unnecessary operations ...
(I think the split code is going to be easier to understand as well.)

>>  - valid when the kb_pg_ptr is valid = always for existing VCPUs?
> 
> According to the programming manual, the valid bit is set when we set the
> host physical APIC ID.

(Physical APIC ID doesn't affect the valid bit at all.)

>                        However, in theory, the vAPIC backing page address is
> required for AVIC hardware to set bits in IRR register, while the host
> physical APIC ID is needed for sending doorbell to the target physical core.
> So, I would actually interpret the valid bit as it should be set when the
> vAPIC backing address is set.

Yes, APM (rev 3.23, vol 2, table 15-24):
  Valid bit. When set, indicates that this entry contains a valid vAPIC
  backing page pointer. If cleared, this table entry contains no
  information.

> In the current implementation, the valid bit is set during vcpu load, but is
> not unset it when unload. This actually reflect the interpretation of the
> description above.
> 
> If we decide to move the setting of vAPIC backing page address to the vcpu
> create phrase, we would set the valid bit at that point as well.
> 
> Please let me know if you think differently.

I agree with your analysis and setting the backing page and valid bit on
LAPIC creation seems better to me.

Thanks.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PART1 RFC v2 10/10] svm: Manage vcpu load/unload when enable AVIC
  2016-03-14 11:58       ` Suravee Suthikulpanit
@ 2016-03-14 16:54         ` Radim Krčmář
  0 siblings, 0 replies; 54+ messages in thread
From: Radim Krčmář @ 2016-03-14 16:54 UTC (permalink / raw)
  To: Suravee Suthikulpanit
  Cc: pbonzini, joro, bp, gleb, alex.williamson, kvm, linux-kernel,
	wei, sherry.hurwitz

2016-03-14 18:58+0700, Suravee Suthikulpanit:
> On 03/10/2016 09:01 PM, Radim Krčmář wrote:
>>Well, we haven't reached an agreement on is_running yet.  The situation:
>>if we don't unset vcpu1.is_running when vcpu1 is scheduled out and vcpu2
>>gets scheduled on vcpu1's physical core, then vcpu2 would receive a
>>doorbell intended to vcpu1.
> 
> That's why, in V2, I added the logic to check if the is_running bit is set
> for the current vcpu (e.g. vcpu1) when unloaded, then restore the bit during
> loading later of if it was set during previous unloaded. This way, when we
> load the new vcpu (e.g. vcpu2), the is_running will be set as it was before
> unloading.

Yes, that's a good solution and I'm leaning towards it.  The downside is
that IPIs from other VCPUs exit, even though KVM can't do anything,
because the vCPU is already going to run as soon as it can.
Keeping is_running set during unload would prevent meaningless exits.

>>We'd like to keep is_running set when there is no reason to vmexit, but
>>not if a guest can negatively affect other guests.
> 
> Not sure how this can affect other guests?

If is_running is set, then the doorbell is sent to a physical core, so
any running task/vCPU will receive it.  This is safe, but a difference
can be seen in performance.

>>How does receiving a stray doorbell affect the performance?
> 
> As far as I know, the doorbell only affects the CPU during vmrun.

Yeah, I guess that receiving a doorbell outside of vmrun has no
overhead.

>                                                                   Once
> received, it will check the IRR in vAPIC backing page.  So, I think if IRR
> bit is not set, the affect should be rather minimal.

Even empty IRR still needs to be rescanned every time a doorbell
arrives, which might affect the execution pipeline.

After re-reading all relevant quotes, I think that hardware wasn't
designed with this use in mind, so it's safer to assume an adverse
effect and go with solution we have now.  (It'd be hard to measure
anyway.)

Sorry for the tangent.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PART1 RFC v2 03/10] svm: Introduce new AVIC VMCB registers
  2016-03-14 12:25       ` Paolo Bonzini
@ 2016-03-15 12:51         ` Suravee Suthikulpanit
  0 siblings, 0 replies; 54+ messages in thread
From: Suravee Suthikulpanit @ 2016-03-15 12:51 UTC (permalink / raw)
  To: Paolo Bonzini, rkrcmar, joro, bp, gleb, alex.williamson
  Cc: kvm, linux-kernel, wei, sherry.hurwitz



On 03/14/2016 07:25 PM, Paolo Bonzini wrote:
>
>
> On 14/03/2016 08:41, Suravee Suthikulpanit wrote:
>> Any particular reason why you do not recommend the use of bit field?
>
> 1) The current coding style is generally not using bitfields
>
> 2) Having to review patches that change working code unrelated to AVIC
>
> 3) Most of the fields are not even used when AVIC is enabled, so the
> benefit of the conversion is small.
>
> Paolo
>

Ok I'll remove the bit-field stuff from patch 3 and will get rid off 
patch 4.

Thanks,
Suravee

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PART1 RFC v2 05/10] KVM: x86: Detect and Initialize AVIC support
  2016-03-07 16:41   ` Paolo Bonzini
@ 2016-03-15 17:09     ` Suravee Suthikulpanit
  2016-03-15 17:22       ` Paolo Bonzini
  0 siblings, 1 reply; 54+ messages in thread
From: Suravee Suthikulpanit @ 2016-03-15 17:09 UTC (permalink / raw)
  To: Paolo Bonzini, rkrcmar, joro, bp, gleb, alex.williamson
  Cc: kvm, linux-kernel, wei, sherry.hurwitz

Hi

On 03/07/2016 11:41 PM, Paolo Bonzini wrote:
> On 04/03/2016 21:46, Suravee Suthikulpanit wrote:
> >  [....]
>> +/* Note: This structure is per VM */
>> +struct svm_vm_data {
>> +	atomic_t count;
>> +	u32 ldr_mode;
>> +	u32 avic_max_vcpu_id;
>> +	u32 avic_tag;
>> +
>> +	struct page *avic_log_ait_page;
>> +	struct page *avic_phy_ait_page;
>
> You can put these directly in kvm_arch.  Do not use abbreviations:
>
> 	struct page *avic_logical_apic_id_table_page;
> 	struct page *avic_physical_apic_id_table_page;
>

Actually, the reason I would like to introduce this per-arch specific 
structure is because I feel that it is easier to manage these 
processor-specific variable/data-structure. If we add all these directly 
into kvm_arch, which is shared b/w SVM and VMX, it is more difficult to 
tell which one is used in the different code base.

>> [...]
>> +	memcpy(vapic_bkpg, svm->in_kernel_lapic_regs, PAGE_SIZE);
>> +	svm->vcpu.arch.apic->regs = vapic_bkpg;
>
> Can you explain the flipping logic, and why you cannot just use the
> existing apic.regs?

Please see "explanation 1" below.

>> [...]
>> +static struct svm_avic_phy_ait_entry *
>> +avic_get_phy_ait_entry(struct kvm_vcpu *vcpu, int index)
>> +{
>> +    [.....]
>> +}
>> +
>> +struct svm_avic_log_ait_entry *
>> +avic_get_log_ait_entry(struct kvm_vcpu *vcpu, u8 mda, bool is_flat)
>> +{
>> +	[.....]
>> +}
>
> Instead of these functions, create a complete function to handle APIC_ID
> and APIC_LDR writes.  Then use kmap/kunmap instead of page_address.
>

Ok. May I ask why we are against using page_address?  I have see that 
used in several places in the code.

>> [...]
>> +static int avic_alloc_bk_page(struct vcpu_svm *svm, int id)
>> +{
>> +	int ret = 0, i;
>> +	bool realloc = false;
>> +	struct kvm_vcpu *vcpu;
>> +	struct kvm *kvm = svm->vcpu.kvm;
>> +	struct svm_vm_data *vm_data = kvm->arch.arch_data;
>> +
>> +	mutex_lock(&kvm->slots_lock);
>> +
>> +	/* Check if we have already allocated vAPIC backing
>> +	 * page for this vCPU. If not, we need to realloc
>> +	 * a new one and re-assign all other vCPU.
>> +	 */
>> +	if (kvm->arch.apic_access_page_done &&
>> +	    (id > vm_data->avic_max_vcpu_id)) {
>> +		kvm_for_each_vcpu(i, vcpu, kvm)
>> +			avic_unalloc_bk_page(vcpu);
>> +
>> +		__x86_set_memory_region(kvm, APIC_ACCESS_PAGE_PRIVATE_MEMSLOT,
>> +					0, 0);
>> +		realloc = true;
>> +		vm_data->avic_max_vcpu_id = 0;
>> +	}
>> +
>> +	/*
>> +	 * We are allocating vAPIC backing page
>> +	 * upto the max vCPU ID
>> +	 */
>> +	if (id >= vm_data->avic_max_vcpu_id) {
>> +		ret = __x86_set_memory_region(kvm,
>> +					      APIC_ACCESS_PAGE_PRIVATE_MEMSLOT,
>> +					      APIC_DEFAULT_PHYS_BASE,
>> +					      PAGE_SIZE * (id + 1));
>
> Why is this necessary?  The APIC access page is a peculiarity of Intel
> processors (and the special memslot for only needs to map 0xfee00000 to
> 0xfee00fff; after that there is the MSI area).
>

Please see "explanation 1" below.

 >> [...]
>> +		if (ret)
>> +			goto out;
>> +
>> +		vm_data->avic_max_vcpu_id = id;
>> +	}
>> +
>> +	/* Reinit vAPIC backing page for exisinting vcpus */
>> +	if (realloc)
>> +		kvm_for_each_vcpu(i, vcpu, kvm)
>> +			avic_init_bk_page(vcpu);
>
> Why is this necessary?

Explanation 1:

The current lapic regs page is allocated using get_zeroed_page(), which 
can be paged out. If I use these pages for AVIC backing pages, it seems 
to cause VM to slow down quite a bit due to a lot of page faults.

Currently, the AVIC backing pages are acquired from __x86_set_memory 
region() with APIC_ACCESS_PAGE_PRIVATE_MEMSLOT, which maps the pages for 
address 0xfee00000 and above for VM to use. I mostly grab this from the 
VMX implementation in alloc_apic_access_page().

However, the memslot requires specification of the size at the time when 
calling __x86_set_memory_region(). However, I can't seem to figure out 
where I can get the number of vcpus at the time when we creating VM. 
Therefore, I have to track the vcpu creation, and re-acquire larger 
memslot every time vcpu_create() is called.

I was not sure if this is the right approach, any suggestion for this part.

Thanks,
Suravee

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PART1 RFC v2 05/10] KVM: x86: Detect and Initialize AVIC support
  2016-03-15 17:09     ` Suravee Suthikulpanit
@ 2016-03-15 17:22       ` Paolo Bonzini
  2016-03-16  6:22         ` Suravee Suthikulpanit
  0 siblings, 1 reply; 54+ messages in thread
From: Paolo Bonzini @ 2016-03-15 17:22 UTC (permalink / raw)
  To: Suravee Suthikulpanit, rkrcmar, joro, bp, gleb, alex.williamson
  Cc: kvm, linux-kernel, wei, sherry.hurwitz



On 15/03/2016 18:09, Suravee Suthikulpanit wrote:
> Hi
> 
> On 03/07/2016 11:41 PM, Paolo Bonzini wrote:
>> On 04/03/2016 21:46, Suravee Suthikulpanit wrote:
>> >  [....]
>>> +/* Note: This structure is per VM */
>>> +struct svm_vm_data {
>>> +    atomic_t count;
>>> +    u32 ldr_mode;
>>> +    u32 avic_max_vcpu_id;
>>> +    u32 avic_tag;
>>> +
>>> +    struct page *avic_log_ait_page;
>>> +    struct page *avic_phy_ait_page;
>>
>> You can put these directly in kvm_arch.  Do not use abbreviations:
>>
>>     struct page *avic_logical_apic_id_table_page;
>>     struct page *avic_physical_apic_id_table_page;
>>
> 
> Actually, the reason I would like to introduce this per-arch specific
> structure is because I feel that it is easier to manage these
> processor-specific variable/data-structure. If we add all these directly
> into kvm_arch, which is shared b/w SVM and VMX, it is more difficult to
> tell which one is used in the different code base.

You're right, but adding a pointer makes things slower and larger.
Using an anonymous union would work.  For now, I prefer to have the
fields directly in kvm_arch.

>>> [...]
>>> +static struct svm_avic_phy_ait_entry *
>>> +avic_get_phy_ait_entry(struct kvm_vcpu *vcpu, int index)
>>> +{
>>> +    [.....]
>>> +}
>>> +
>>> +struct svm_avic_log_ait_entry *
>>> +avic_get_log_ait_entry(struct kvm_vcpu *vcpu, u8 mda, bool is_flat)
>>> +{
>>> +    [.....]
>>> +}
>>
>> Instead of these functions, create a complete function to handle APIC_ID
>> and APIC_LDR writes.  Then use kmap/kunmap instead of page_address.
>>
> 
> Ok. May I ask why we are against using page_address?  I have see that
> used in several places in the code.

You're right, I guess page_address is okay for pages that were allocated
with alloc_page().

>> Why is this necessary?  The APIC access page is a peculiarity of Intel
>> processors (and the special memslot for only needs to map 0xfee00000 to
>> 0xfee00fff; after that there is the MSI area).
> 
> The current lapic regs page is allocated using get_zeroed_page(), which
> can be paged out. If I use these pages for AVIC backing pages, it seems
> to cause VM to slow down quite a bit due to a lot of page faults.

What causes the lapic regs page to be paged out?

> Currently, the AVIC backing pages are acquired from __x86_set_memory
> region() with APIC_ACCESS_PAGE_PRIVATE_MEMSLOT, which maps the pages for
> address 0xfee00000 and above for VM to use. I mostly grab this from the
> VMX implementation in alloc_apic_access_page().
> 
> However, the memslot requires specification of the size at the time when
> calling __x86_set_memory_region(). However, I can't seem to figure out
> where I can get the number of vcpus at the time when we creating VM.
> Therefore, I have to track the vcpu creation, and re-acquire larger
> memslot every time vcpu_create() is called.

The purpose of the APIC_ACCESS_PAGE_PRIVATE_MEMSLOT is very specific:
it is there to provide a mapping for 0xfee00000 because Intel processors
trap writes between 0xfee00000 and 0xfee00fff, but otherwise ignore the
contents of the page you map there.  Intel processors only need
something to compare the physical address with, they don't care about
the data in the page, so they have one page per VM (they could really
use a single page in the whole system---one per VM is just how KVM works
right now).  It is a peculiar design, and one that you should probably
ignore in your AVIC patches.

The AVIC backing page is more similar to Intel's "virtual-APIC page".
You can see that vmx.c just uses lapic->regs for it.

        if (cpu_has_vmx_tpr_shadow() && !init_event) {
                vmcs_write64(VIRTUAL_APIC_PAGE_ADDR, 0);
                if (cpu_need_tpr_shadow(vcpu))
                        vmcs_write64(VIRTUAL_APIC_PAGE_ADDR,
                                     __pa(vcpu->arch.apic->regs));
                vmcs_write32(TPR_THRESHOLD, 0);
        }

Paolo

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PART1 RFC v2 05/10] KVM: x86: Detect and Initialize AVIC support
  2016-03-15 17:22       ` Paolo Bonzini
@ 2016-03-16  6:22         ` Suravee Suthikulpanit
  2016-03-16  7:20           ` Paolo Bonzini
  0 siblings, 1 reply; 54+ messages in thread
From: Suravee Suthikulpanit @ 2016-03-16  6:22 UTC (permalink / raw)
  To: Paolo Bonzini, rkrcmar, joro, bp, gleb, alex.williamson
  Cc: kvm, linux-kernel, wei, sherry.hurwitz

Hi,

On 03/16/2016 12:22 AM, Paolo Bonzini wrote:
>>> Why is this necessary?  The APIC access page is a peculiarity of Intel
>>> >>processors (and the special memslot for only needs to map 0xfee00000 to
>>> >>0xfee00fff; after that there is the MSI area).
>> >
>> >The current lapic regs page is allocated using get_zeroed_page(), which
>> >can be paged out. If I use these pages for AVIC backing pages, it seems
>> >to cause VM to slow down quite a bit due to a lot of page faults.
> What causes the lapic regs page to be paged out?
>

This is mainly causing a large number of VMEXIT due to NPF.  In my test 
running hackbench in the guest. The following are perf result profiling 
for 10 seconds in the host in two cases:

CASE1: Using x86_set_memory_region() for AVIC backing page

# ./perf-vmexit.sh 10
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 2.813 MB perf.data.guest (30356 samples) ]


Analyze events for all VMs, all VCPUs:

              VM-EXIT    Samples  Samples%     Time%    Min Time    Max 
Time         Avg time

            interrupt      10042    66.30%    81.33%      0.43us 
202.50us      7.43us ( +-   1.20% )
                  msr       5004    33.04%    15.76%      0.73us 
12.21us      2.89us ( +-   0.43% )
                pause         58     0.38%     0.18%      0.56us 
5.88us      2.92us ( +-   6.43% )
                  npf         35     0.23%     2.01%      6.41us 
207.78us     52.70us ( +-  23.67% )
                  nmi          4     0.03%     0.02%      2.31us 
4.67us      3.49us ( +-  14.26% )
                   io          3     0.02%     0.70%     82.75us 
360.90us    214.28us ( +-  37.64% )
      avic_incomp_ipi          1     0.01%     0.00%      2.17us 
2.17us      2.17us ( +-   0.00% )

Total Samples:15147, Total events handled time:91715.78us.


CASE2: Using the lapic regs page for AVIC backing page.

# ./perf-vmexit.sh 10
[ perf record: Woken up 255 times to write data ]
[ perf record: Captured and wrote 509.202 MB perf.data.guest (5718856 
samples) ]


Analyze events for all VMs, all VCPUs:

              VM-EXIT    Samples  Samples%     Time%    Min Time    Max 
Time         Avg time

                  npf    1897710    99.33%    98.08%      1.09us 
243.22us      1.67us ( +-   0.04% )
            interrupt       7818     0.41%     1.44%      0.44us 
216.55us      5.97us ( +-   1.92% )
                  msr       5001     0.26%     0.45%      0.68us 
12.58us      2.89us ( +-   0.50% )
                pause         25     0.00%     0.00%      0.71us 
4.23us      2.03us ( +-  10.76% )
                   io          4     0.00%     0.03%     73.91us 
337.29us    206.74us ( +-  26.38% )
                  nmi          1     0.00%     0.00%      5.92us 
5.92us      5.92us ( +-   0.00% )

Total Samples:1910559, Total events handled time:3229214.64us.

Thanks,
Suravee

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PART1 RFC v2 05/10] KVM: x86: Detect and Initialize AVIC support
  2016-03-16  6:22         ` Suravee Suthikulpanit
@ 2016-03-16  7:20           ` Paolo Bonzini
  2016-03-16  8:21             ` Suravee Suthikulpanit
  0 siblings, 1 reply; 54+ messages in thread
From: Paolo Bonzini @ 2016-03-16  7:20 UTC (permalink / raw)
  To: Suravee Suthikulpanit, rkrcmar, joro, bp, gleb, alex.williamson
  Cc: kvm, linux-kernel, wei, sherry.hurwitz



On 16/03/2016 07:22, Suravee Suthikulpanit wrote:
> This is mainly causing a large number of VMEXIT due to NPF.

Got it, it's here in the manual: "System software is responsible for
setting up a translation in the nested page table granting guest read
and write permissions for accesses to the vAPIC Backing Page in SPA
space. AVIC hardware walks the nested page table to check permissions,
but does not use the SPA address specified in the leaf page table entry.
Instead, AVIC hardware finds this address in the AVIC_BACKING_PAGE
pointer field of the VMCB".

Strictly speaking the address of the 0xFEE00000 translation is
unnecessary and it could be all zeroes, but I suggest that you set up an
APIC access page like Intel does (4k only), using the special memslot.
The AVIC backing page can then point to lapic->regs.

Thanks for the explanation!

Paolo

> CASE1: Using x86_set_memory_region() for AVIC backing page
> 
> # ./perf-vmexit.sh 10
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 2.813 MB perf.data.guest (30356
> samples) ]
> 
> 
> Analyze events for all VMs, all VCPUs:
> 
>              VM-EXIT    Samples  Samples%     Time%    Min Time    Max
> Time         Avg time
> 
>            interrupt      10042    66.30%    81.33%      0.43us
> 202.50us      7.43us ( +-   1.20% )
>                  msr       5004    33.04%    15.76%      0.73us
> 12.21us      2.89us ( +-   0.43% )
>                pause         58     0.38%     0.18%      0.56us
> 5.88us      2.92us ( +-   6.43% )
>                  npf         35     0.23%     2.01%      6.41us
> 207.78us     52.70us ( +-  23.67% )
>                  nmi          4     0.03%     0.02%      2.31us
> 4.67us      3.49us ( +-  14.26% )
>                   io          3     0.02%     0.70%     82.75us
> 360.90us    214.28us ( +-  37.64% )
>      avic_incomp_ipi          1     0.01%     0.00%      2.17us
> 2.17us      2.17us ( +-   0.00% )
> 
> Total Samples:15147, Total events handled time:91715.78us.
> 
> 
> CASE2: Using the lapic regs page for AVIC backing page.
> 
> # ./perf-vmexit.sh 10
> [ perf record: Woken up 255 times to write data ]
> [ perf record: Captured and wrote 509.202 MB perf.data.guest (5718856
> samples) ]
> 
> 
> Analyze events for all VMs, all VCPUs:
> 
>              VM-EXIT    Samples  Samples%     Time%    Min Time    Max
> Time         Avg time
> 
>                  npf    1897710    99.33%    98.08%      1.09us
> 243.22us      1.67us ( +-   0.04% )
>            interrupt       7818     0.41%     1.44%      0.44us
> 216.55us      5.97us ( +-   1.92% )
>                  msr       5001     0.26%     0.45%      0.68us
> 12.58us      2.89us ( +-   0.50% )
>                pause         25     0.00%     0.00%      0.71us
> 4.23us      2.03us ( +-  10.76% )
>                   io          4     0.00%     0.03%     73.91us
> 337.29us    206.74us ( +-  26.38% )
>                  nmi          1     0.00%     0.00%      5.92us
> 5.92us      5.92us ( +-   0.00% )
> 
> Total Samples:1910559, Total events handled time:3229214.64us.
> 
> Thanks,
> Suravee
> -- 
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PART1 RFC v2 05/10] KVM: x86: Detect and Initialize AVIC support
  2016-03-16  7:20           ` Paolo Bonzini
@ 2016-03-16  8:21             ` Suravee Suthikulpanit
  2016-03-16 11:12               ` Paolo Bonzini
  0 siblings, 1 reply; 54+ messages in thread
From: Suravee Suthikulpanit @ 2016-03-16  8:21 UTC (permalink / raw)
  To: Paolo Bonzini, rkrcmar, joro, bp, gleb, alex.williamson
  Cc: kvm, linux-kernel, wei, sherry.hurwitz

Hi,

On 03/16/2016 02:20 PM, Paolo Bonzini wrote:
>
> On 16/03/2016 07:22, Suravee Suthikulpanit wrote:
>> >This is mainly causing a large number of VMEXIT due to NPF.
> Got it, it's here in the manual: "System software is responsible for
> setting up a translation in the nested page table granting guest read
> and write permissions for accesses to the vAPIC Backing Page in SPA
> space. AVIC hardware walks the nested page table to check permissions,
> but does not use the SPA address specified in the leaf page table entry.
> Instead, AVIC hardware finds this address in the AVIC_BACKING_PAGE
> pointer field of the VMCB".
>
> Strictly speaking the address of the 0xFEE00000 translation is
> unnecessary and it could be all zeroes, but I suggest that you set up an
> APIC access page like Intel does (4k only), using the special memslot.
> The AVIC backing page can then point to lapic->regs.
>
> Thanks for the explanation!
>
> Paolo
>

Ahh... you are right, this works also. Thanks for the pointer. I'm 
fixing this, doing some more testing, and cleaning up the code. This has 
simplify the init logic quite a bit.

Thanks for suggestion,
Suravee

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PART1 RFC v2 05/10] KVM: x86: Detect and Initialize AVIC support
  2016-03-16  8:21             ` Suravee Suthikulpanit
@ 2016-03-16 11:12               ` Paolo Bonzini
  0 siblings, 0 replies; 54+ messages in thread
From: Paolo Bonzini @ 2016-03-16 11:12 UTC (permalink / raw)
  To: Suravee Suthikulpanit, rkrcmar, joro, bp, gleb, alex.williamson
  Cc: kvm, linux-kernel, wei, sherry.hurwitz



On 16/03/2016 09:21, Suravee Suthikulpanit wrote:
>> Strictly speaking the address of the 0xFEE00000 translation is
>> unnecessary and it could be all zeroes, but I suggest that you set up an
>> APIC access page like Intel does (4k only), using the special memslot.
>> The AVIC backing page can then point to lapic->regs.
> 
> Ahh... you are right, this works also. Thanks for the pointer. I'm
> fixing this, doing some more testing, and cleaning up the code. This has
> simplify the init logic quite a bit.

Awesome, thanks.

Paolo

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PART1 RFC v2 07/10] svm: Add VMEXIT handlers for AVIC
  2016-03-09 20:55   ` Radim Krčmář
  2016-03-10 19:34     ` Radim Krčmář
@ 2016-03-17  3:58     ` Suravee Suthikulpanit
  2016-03-17  9:35       ` Paolo Bonzini
  2016-03-17 19:44     ` Suravee Suthikulpanit
  2 siblings, 1 reply; 54+ messages in thread
From: Suravee Suthikulpanit @ 2016-03-17  3:58 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: pbonzini, joro, bp, gleb, alex.williamson, kvm, linux-kernel,
	wei, sherry.hurwitz

Hi Radim,

On 03/10/2016 03:55 AM, Radim Krčmář wrote:
>> >+	pr_debug("%s: offset=%#x, rw=%#x, vector=%#x, vcpu_id=%#x, cpu=%#x\n",
>> >+		 __func__, offset, rw, vector, svm->vcpu.vcpu_id, svm->vcpu.cpu);
>> >+
>> >+	BUG_ON(offset >= 0x400);
> These are valid faulting registers, so our implementation has to handle
> them.  (And the rule is to never BUG if a recovery is simple.)
>

Just want to clarify the part that you mentioned "to handle them". 
IIUC, offet 0x400 and above are for x2APIC stuff, which AVIC does not 
currently support. Also, since I have only advertised as xAPIC when 
enabling AVIC, if we run into the situation that the VM is trying to 
access these register, we should just ignore it (and not BUG). Do I 
understand that correctly?

Thanks,
Suravee

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PART1 RFC v2 07/10] svm: Add VMEXIT handlers for AVIC
  2016-03-17  3:58     ` Suravee Suthikulpanit
@ 2016-03-17  9:35       ` Paolo Bonzini
  0 siblings, 0 replies; 54+ messages in thread
From: Paolo Bonzini @ 2016-03-17  9:35 UTC (permalink / raw)
  To: Suravee Suthikulpanit, Radim Krčmář
  Cc: joro, bp, gleb, alex.williamson, kvm, linux-kernel, wei, sherry.hurwitz



On 17/03/2016 04:58, Suravee Suthikulpanit wrote:
>>>
>>> >+    BUG_ON(offset >= 0x400);
>> These are valid faulting registers, so our implementation has to handle
>> them.  (And the rule is to never BUG if a recovery is simple.)
>>
> 
> Just want to clarify the part that you mentioned "to handle them". IIUC,
> offet 0x400 and above are for x2APIC stuff, which AVIC does not
> currently support. Also, since I have only advertised as xAPIC when
> enabling AVIC, if we run into the situation that the VM is trying to
> access these register, we should just ignore it (and not BUG). Do I
> understand that correctly?

Yes.  You can add a printk(KERN_DEBUG) though.

Paolo

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PART1 RFC v2 07/10] svm: Add VMEXIT handlers for AVIC
  2016-03-09 20:55   ` Radim Krčmář
  2016-03-10 19:34     ` Radim Krčmář
  2016-03-17  3:58     ` Suravee Suthikulpanit
@ 2016-03-17 19:44     ` Suravee Suthikulpanit
  2016-03-17 20:27       ` [PATCH] KVM: split kvm_vcpu_wake_up from kvm_vcpu_kick Radim Krčmář
  2 siblings, 1 reply; 54+ messages in thread
From: Suravee Suthikulpanit @ 2016-03-17 19:44 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: pbonzini, joro, bp, gleb, alex.williamson, kvm, linux-kernel,
	wei, sherry.hurwitz

Hi,

On 3/10/16 03:55, Radim Krčmář wrote:
> 2016-03-04 14:46-0600, Suravee Suthikulpanit:
>> >From: Suravee Suthikulpanit<suravee.suthikulpanit@amd.com>
>> >
>> >Introduce VMEXIT handlers, avic_incp_ipi_interception() and
>> >avic_noaccel_interception().
>> >
>> >Signed-off-by: Suravee Suthikulpanit<suravee.suthikulpanit@amd.com>
>> >---
>> >diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>> >@@ -3690,6 +3690,264 @@ static int mwait_interception(struct vcpu_svm *svm)
>> >+	case AVIC_INCMP_IPI_ERR_TARGET_NOT_RUN: {
>> >+		kvm_for_each_vcpu(i, vcpu, kvm) {
>> >+			if (!kvm_apic_match_dest(vcpu, apic,
>> >+						 icrl & APIC_SHORT_MASK,
>> >+						 GET_APIC_DEST_FIELD(icrh),
>> >+						 icrl & APIC_DEST_MASK))
>> >+				continue;
>> >+
>> >+			kvm_vcpu_kick(vcpu);
> KVM shouldn't kick VCPUs that are running.  (Imagine a broadcast when
> most VCPUs are in guest mode.)

So, besides checking if the vcpu match the destination, I will add the 
check to see if the is_running bit is set before calling kvm_vcpu_kick()

> I think a new helper might be useful here: we only want to wake up from
> wait queue, but never force VCPU out of guest mode ... kvm_vcpu_kick()
> does both.

If I only kick non-running vcpu, do I still need this new helper function?

Thanks,
Suravee

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH] KVM: split kvm_vcpu_wake_up from kvm_vcpu_kick
  2016-03-17 19:44     ` Suravee Suthikulpanit
@ 2016-03-17 20:27       ` Radim Krčmář
  2016-03-18  5:13         ` Suravee Suthikulpanit
  0 siblings, 1 reply; 54+ messages in thread
From: Radim Krčmář @ 2016-03-17 20:27 UTC (permalink / raw)
  To: Suravee Suthikulpanit
  Cc: pbonzini, joro, bp, gleb, alex.williamson, kvm, linux-kernel,
	wei, sherry.hurwitz

2016-03-18 02:44+0700, Suravee Suthikulpanit:
> On 3/10/16 03:55, Radim Krčmář wrote:
>>2016-03-04 14:46-0600, Suravee Suthikulpanit:
>>>>From: Suravee Suthikulpanit<suravee.suthikulpanit@amd.com>
>>>>
>>>>Introduce VMEXIT handlers, avic_incp_ipi_interception() and
>>>>avic_noaccel_interception().
>>>>
>>>>Signed-off-by: Suravee Suthikulpanit<suravee.suthikulpanit@amd.com>
>>>>---
>>>>diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>>>>@@ -3690,6 +3690,264 @@ static int mwait_interception(struct vcpu_svm *svm)
>>>>+	case AVIC_INCMP_IPI_ERR_TARGET_NOT_RUN: {
>>>>+		kvm_for_each_vcpu(i, vcpu, kvm) {
>>>>+			if (!kvm_apic_match_dest(vcpu, apic,
>>>>+						 icrl & APIC_SHORT_MASK,
>>>>+						 GET_APIC_DEST_FIELD(icrh),
>>>>+						 icrl & APIC_DEST_MASK))
>>>>+				continue;
>>>>+
>>>>+			kvm_vcpu_kick(vcpu);
>>KVM shouldn't kick VCPUs that are running.  (Imagine a broadcast when
>>most VCPUs are in guest mode.)
> 
> So, besides checking if the vcpu match the destination, I will add the check
> to see if the is_running bit is set before calling kvm_vcpu_kick()

That will do.

>>I think a new helper might be useful here: we only want to wake up from
>>wait queue, but never force VCPU out of guest mode ... kvm_vcpu_kick()
>>does both.
> 
> If I only kick non-running vcpu, do I still need this new helper function?

I would prefer it.  It's a minor performance optimization (non-running
VCPUs aren't in guest mode) and makes our intent clear.  Please use
include the following patch and use kvm_vcpu_wake_up instead.

---8<---
AVIC has a use for kvm_vcpu_wake_up.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 include/linux/kvm_host.h |  1 +
 virt/kvm/kvm_main.c      | 19 +++++++++++++------
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 861f690aa791..7b269626a3b3 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -650,6 +650,7 @@ void kvm_vcpu_mark_page_dirty(struct kvm_vcpu *vcpu, gfn_t gfn);
 void kvm_vcpu_block(struct kvm_vcpu *vcpu);
 void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu);
 void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu);
+void kvm_vcpu_wake_up(struct kvm_vcpu *vcpu);
 void kvm_vcpu_kick(struct kvm_vcpu *vcpu);
 int kvm_vcpu_yield_to(struct kvm_vcpu *target);
 void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 1eae05236347..c39c54afdb74 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2054,13 +2054,8 @@ out:
 EXPORT_SYMBOL_GPL(kvm_vcpu_block);
 
 #ifndef CONFIG_S390
-/*
- * Kick a sleeping VCPU, or a guest VCPU in guest mode, into host kernel mode.
- */
-void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
+void kvm_vcpu_wake_up(struct kvm_vcpu *vcpu)
 {
-	int me;
-	int cpu = vcpu->cpu;
 	wait_queue_head_t *wqp;
 
 	wqp = kvm_arch_vcpu_wq(vcpu);
@@ -2068,6 +2063,18 @@ void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
 		wake_up_interruptible(wqp);
 		++vcpu->stat.halt_wakeup;
 	}
+}
+EXPORT_SYMBOL_GPL(kvm_vcpu_wake_up);
+
+/*
+ * Kick a sleeping VCPU, or a guest VCPU in guest mode, into host kernel mode.
+ */
+void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
+{
+	int me;
+	int cpu = vcpu->cpu;
+
+	kvm_vcpu_wake_up(vcpu);
 
 	me = get_cpu();
 	if (cpu != me && (unsigned)cpu < nr_cpu_ids && cpu_online(cpu))
-- 
2.7.2

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [PATCH] KVM: split kvm_vcpu_wake_up from kvm_vcpu_kick
  2016-03-17 20:27       ` [PATCH] KVM: split kvm_vcpu_wake_up from kvm_vcpu_kick Radim Krčmář
@ 2016-03-18  5:13         ` Suravee Suthikulpanit
  0 siblings, 0 replies; 54+ messages in thread
From: Suravee Suthikulpanit @ 2016-03-18  5:13 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: pbonzini, joro, bp, gleb, alex.williamson, kvm, linux-kernel,
	wei, sherry.hurwitz

Hi,

On 03/18/2016 03:27 AM, Radim Krčmář wrote:
> 2016-03-18 02:44+0700, Suravee Suthikulpanit:
>> On 3/10/16 03:55, Radim Krčmář wrote:
>>> 2016-03-04 14:46-0600, Suravee Suthikulpanit:
>>>>> From: Suravee Suthikulpanit<suravee.suthikulpanit@amd.com>
>>>>>
>>>>> Introduce VMEXIT handlers, avic_incp_ipi_interception() and
>>>>> avic_noaccel_interception().
>>>>>
>>>>> Signed-off-by: Suravee Suthikulpanit<suravee.suthikulpanit@amd.com>
>>>>> ---
>>>>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>>>>> @@ -3690,6 +3690,264 @@ static int mwait_interception(struct vcpu_svm *svm)
>>>>> +	case AVIC_INCMP_IPI_ERR_TARGET_NOT_RUN: {
>>>>> +		kvm_for_each_vcpu(i, vcpu, kvm) {
>>>>> +			if (!kvm_apic_match_dest(vcpu, apic,
>>>>> +						 icrl & APIC_SHORT_MASK,
>>>>> +						 GET_APIC_DEST_FIELD(icrh),
>>>>> +						 icrl & APIC_DEST_MASK))
>>>>> +				continue;
>>>>> +
>>>>> +			kvm_vcpu_kick(vcpu);
>>> KVM shouldn't kick VCPUs that are running.  (Imagine a broadcast when
>>> most VCPUs are in guest mode.)
>>
>> So, besides checking if the vcpu match the destination, I will add the check
>> to see if the is_running bit is set before calling kvm_vcpu_kick()
>
> That will do.
>
>>> I think a new helper might be useful here: we only want to wake up from
>>> wait queue, but never force VCPU out of guest mode ... kvm_vcpu_kick()
>>> does both.
>>
>> If I only kick non-running vcpu, do I still need this new helper function?
>
> I would prefer it.  It's a minor performance optimization (non-running
> VCPUs aren't in guest mode) and makes our intent clear.  Please use
> include the following patch and use kvm_vcpu_wake_up instead.
>
> ---8<---
> AVIC has a use for kvm_vcpu_wake_up.
>
> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
> ---
>   include/linux/kvm_host.h |  1 +
>   virt/kvm/kvm_main.c      | 19 +++++++++++++------
>   2 files changed, 14 insertions(+), 6 deletions(-)
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 861f690aa791..7b269626a3b3 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -650,6 +650,7 @@ void kvm_vcpu_mark_page_dirty(struct kvm_vcpu *vcpu, gfn_t gfn);
>   void kvm_vcpu_block(struct kvm_vcpu *vcpu);
>   void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu);
>   void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu);
> +void kvm_vcpu_wake_up(struct kvm_vcpu *vcpu);
>   void kvm_vcpu_kick(struct kvm_vcpu *vcpu);
>   int kvm_vcpu_yield_to(struct kvm_vcpu *target);
>   void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu);
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 1eae05236347..c39c54afdb74 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2054,13 +2054,8 @@ out:
>   EXPORT_SYMBOL_GPL(kvm_vcpu_block);
>
>   #ifndef CONFIG_S390
> -/*
> - * Kick a sleeping VCPU, or a guest VCPU in guest mode, into host kernel mode.
> - */
> -void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
> +void kvm_vcpu_wake_up(struct kvm_vcpu *vcpu)
>   {
> -	int me;
> -	int cpu = vcpu->cpu;
>   	wait_queue_head_t *wqp;
>
>   	wqp = kvm_arch_vcpu_wq(vcpu);
> @@ -2068,6 +2063,18 @@ void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
>   		wake_up_interruptible(wqp);
>   		++vcpu->stat.halt_wakeup;
>   	}
> +}
> +EXPORT_SYMBOL_GPL(kvm_vcpu_wake_up);
> +
> +/*
> + * Kick a sleeping VCPU, or a guest VCPU in guest mode, into host kernel mode.
> + */
> +void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
> +{
> +	int me;
> +	int cpu = vcpu->cpu;
> +
> +	kvm_vcpu_wake_up(vcpu);
>
>   	me = get_cpu();
>   	if (cpu != me && (unsigned)cpu < nr_cpu_ids && cpu_online(cpu))
>

Sure, I'll include this in my V3.

Thanks,
Suravee

^ permalink raw reply	[flat|nested] 54+ messages in thread

end of thread, other threads:[~2016-03-18  5:13 UTC | newest]

Thread overview: 54+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-04 20:45 [PART1 RFC v2 00/10] KVM: x86: Introduce SVM AVIC support Suravee Suthikulpanit
2016-03-04 20:45 ` [PART1 RFC v2 01/10] KVM: x86: Misc LAPIC changes to exposes helper functions Suravee Suthikulpanit
2016-03-04 20:46 ` [PART1 RFC v2 02/10] KVM: x86: Introducing kvm_x86_ops VCPU blocking/unblocking Suravee Suthikulpanit
2016-03-07 15:42   ` Paolo Bonzini
2016-03-14  6:19     ` Suravee Suthikulpanit
2016-03-04 20:46 ` [PART1 RFC v2 03/10] svm: Introduce new AVIC VMCB registers Suravee Suthikulpanit
2016-03-07 15:44   ` Paolo Bonzini
2016-03-14  7:41     ` Suravee Suthikulpanit
2016-03-14 12:25       ` Paolo Bonzini
2016-03-15 12:51         ` Suravee Suthikulpanit
2016-03-04 20:46 ` [PART1 RFC v2 04/10] svm: clean up V_TPR, V_IRQ, V_INTR_PRIO, and V_INTR_MASKING Suravee Suthikulpanit
2016-03-04 20:46 ` [PART1 RFC v2 05/10] KVM: x86: Detect and Initialize AVIC support Suravee Suthikulpanit
2016-03-07 16:41   ` Paolo Bonzini
2016-03-15 17:09     ` Suravee Suthikulpanit
2016-03-15 17:22       ` Paolo Bonzini
2016-03-16  6:22         ` Suravee Suthikulpanit
2016-03-16  7:20           ` Paolo Bonzini
2016-03-16  8:21             ` Suravee Suthikulpanit
2016-03-16 11:12               ` Paolo Bonzini
2016-03-04 20:46 ` [PART1 RFC v2 06/10] svm: Add interrupt injection via AVIC Suravee Suthikulpanit
2016-03-07 15:36   ` Paolo Bonzini
2016-03-08 21:54     ` Radim Krčmář
2016-03-09 11:10       ` Paolo Bonzini
2016-03-09 16:00         ` Radim Krčmář
2016-03-14  9:41           ` Suravee Suthikulpanit
2016-03-14 12:27             ` Paolo Bonzini
2016-03-14  9:50         ` Suravee Suthikulpanit
2016-03-14  5:25     ` Suravee Suthikulpanit
2016-03-14  8:54       ` Suravee Suthikulpanit
2016-03-04 20:46 ` [PART1 RFC v2 07/10] svm: Add VMEXIT handlers for AVIC Suravee Suthikulpanit
2016-03-07 15:58   ` Paolo Bonzini
2016-03-08 22:05     ` Radim Krčmář
2016-03-09 10:56       ` Paolo Bonzini
2016-03-09 20:55   ` Radim Krčmář
2016-03-10 19:34     ` Radim Krčmář
2016-03-10 19:54       ` Paolo Bonzini
2016-03-10 20:44         ` Radim Krčmář
2016-03-17  3:58     ` Suravee Suthikulpanit
2016-03-17  9:35       ` Paolo Bonzini
2016-03-17 19:44     ` Suravee Suthikulpanit
2016-03-17 20:27       ` [PATCH] KVM: split kvm_vcpu_wake_up from kvm_vcpu_kick Radim Krčmář
2016-03-18  5:13         ` Suravee Suthikulpanit
2016-03-04 20:46 ` [PART1 RFC v2 08/10] svm: Do not expose x2APIC when enable AVIC Suravee Suthikulpanit
2016-03-04 20:46 ` [PART1 RFC v2 09/10] svm: Do not intercept CR8 " Suravee Suthikulpanit
2016-03-07 15:39   ` Paolo Bonzini
2016-03-14  6:09     ` Suravee Suthikulpanit
2016-03-14 12:28       ` Paolo Bonzini
2016-03-04 20:46 ` [PART1 RFC v2 10/10] svm: Manage vcpu load/unload " Suravee Suthikulpanit
2016-03-09 21:46   ` Radim Krčmář
2016-03-10 14:01     ` Radim Krčmář
2016-03-14 11:58       ` Suravee Suthikulpanit
2016-03-14 16:54         ` Radim Krčmář
2016-03-14 11:48     ` Suravee Suthikulpanit
2016-03-14 16:40       ` Radim Krčmář

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).