All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv5 0/5] apic: eoi optimization support
@ 2012-05-22 14:05 Michael S. Tsirkin
  2012-05-22 14:05 ` [PATCHv5 1/5] kvm_para: guest side for eoi avoidance Michael S. Tsirkin
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Michael S. Tsirkin @ 2012-05-22 14:05 UTC (permalink / raw)
  To: x86, kvm
  Cc: Ingo Molnar, H. Peter Anvin, Avi Kivity, Marcelo Tosatti, gleb,
	Linus Torvalds, linux-kernel, Thomas Gleixner

I'm looking at reducing the interrupt overhead for virtualized guests:
some workloads spend a large part of their time processing interrupts.

On kvm, an EOI write from the guest causes an expensive exit to host; we
avoid this using shared memory.

The patches work fine on my boxes. See individual patches
for perf tests. You need to patch qemu to whitelist the kvm feature.
qemu patch was sent separately.

The patches are against a merge of kvm.git (for apic announce bits) and
tip.git master. The last patch in the series, supplying the host
part, also depends on the ISR optimization patch that I sent previously:
changes to optimization would require reworking this last patch
in the series.

The rest of the patchset has not changed since v2.

Thanks,
MST

Changes from v4:
	Turn off PV EOI on each exit. Turn it back on when safe.
		Suggested by Avi.
	Address bug with nested interrupts pointed out by Marcelo.

Changes from v3:
	Address review comments by Marcelo:
		Multiple cosmetic changes eoi -> pv_eoi
		Added multiple comments
Changes from v2:
	Kill guest with GP on an illegal MSR value
	Add documentation

Changes from v1:
	Add host side patch to series
	Remove kvm-specific __test_and_clear_bit, document
	that x86 one does what we want already
	Clear msr on cpu unplug

Michael S. Tsirkin (5):
  kvm_para: guest side for eoi avoidance
  x86/bitops: note on __test_and_clear_bit atomicity
  kvm: eoi msi documentation
  kvm: only sync when attention bits set
  kvm: host side for eoi optimization

 Documentation/virtual/kvm/msr.txt |   32 +++++++++
 arch/x86/include/asm/bitops.h     |   13 +++-
 arch/x86/include/asm/kvm_host.h   |   12 +++
 arch/x86/include/asm/kvm_para.h   |    7 ++
 arch/x86/kernel/kvm.c             |   51 +++++++++++++-
 arch/x86/kvm/cpuid.c              |    1 +
 arch/x86/kvm/lapic.c              |  135 +++++++++++++++++++++++++++++++++++-
 arch/x86/kvm/lapic.h              |    2 +
 arch/x86/kvm/trace.h              |   34 +++++++++
 arch/x86/kvm/x86.c                |    8 ++-
 10 files changed, 285 insertions(+), 10 deletions(-)

-- 
MST

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCHv5 1/5] kvm_para: guest side for eoi avoidance
  2012-05-22 14:05 [PATCHv5 0/5] apic: eoi optimization support Michael S. Tsirkin
@ 2012-05-22 14:05 ` Michael S. Tsirkin
  2012-05-22 14:05 ` [PATCHv5 2/5] x86/bitops: note on __test_and_clear_bit atomicity Michael S. Tsirkin
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Michael S. Tsirkin @ 2012-05-22 14:05 UTC (permalink / raw)
  To: x86, kvm
  Cc: Ingo Molnar, H. Peter Anvin, Avi Kivity, Marcelo Tosatti, gleb,
	Linus Torvalds, linux-kernel, Thomas Gleixner

The idea is simple: there's a bit, per APIC, in guest memory,
that tells the guest that it does not need EOI.
Guest tests it using a single est and clear operation - this is
necessary so that host can detect interrupt nesting - and if set, it can
skip the EOI MSR.

I run a simple microbenchmark to show exit reduction
(note: for testing, need to apply follow-up patch
'kvm: host side for eoi optimization' + a qemu patch
 I posted separately, on host):

Before:

Performance counter stats for 'sleep 1s':

            47,357 kvm:kvm_entry                                                [99.98%]
                 0 kvm:kvm_hypercall                                            [99.98%]
                 0 kvm:kvm_hv_hypercall                                         [99.98%]
             5,001 kvm:kvm_pio                                                  [99.98%]
                 0 kvm:kvm_cpuid                                                [99.98%]
            22,124 kvm:kvm_apic                                                 [99.98%]
            49,849 kvm:kvm_exit                                                 [99.98%]
            21,115 kvm:kvm_inj_virq                                             [99.98%]
                 0 kvm:kvm_inj_exception                                        [99.98%]
                 0 kvm:kvm_page_fault                                           [99.98%]
            22,937 kvm:kvm_msr                                                  [99.98%]
                 0 kvm:kvm_cr                                                   [99.98%]
                 0 kvm:kvm_pic_set_irq                                          [99.98%]
                 0 kvm:kvm_apic_ipi                                             [99.98%]
            22,207 kvm:kvm_apic_accept_irq                                      [99.98%]
            22,421 kvm:kvm_eoi                                                  [99.98%]
                 0 kvm:kvm_pv_eoi                                               [99.99%]
                 0 kvm:kvm_nested_vmrun                                         [99.99%]
                 0 kvm:kvm_nested_intercepts                                    [99.99%]
                 0 kvm:kvm_nested_vmexit                                        [99.99%]
                 0 kvm:kvm_nested_vmexit_inject                                    [99.99%]
                 0 kvm:kvm_nested_intr_vmexit                                    [99.99%]
                 0 kvm:kvm_invlpga                                              [99.99%]
                 0 kvm:kvm_skinit                                               [99.99%]
                57 kvm:kvm_emulate_insn                                         [99.99%]
                 0 kvm:vcpu_match_mmio                                          [99.99%]
                 0 kvm:kvm_userspace_exit                                       [99.99%]
                 2 kvm:kvm_set_irq                                              [99.99%]
                 2 kvm:kvm_ioapic_set_irq                                       [99.99%]
            23,609 kvm:kvm_msi_set_irq                                          [99.99%]
                 1 kvm:kvm_ack_irq                                              [99.99%]
               131 kvm:kvm_mmio                                                 [99.99%]
               226 kvm:kvm_fpu                                                  [100.00%]
                 0 kvm:kvm_age_page                                             [100.00%]
                 0 kvm:kvm_try_async_get_page                                    [100.00%]
                 0 kvm:kvm_async_pf_doublefault                                    [100.00%]
                 0 kvm:kvm_async_pf_not_present                                    [100.00%]
                 0 kvm:kvm_async_pf_ready                                       [100.00%]
                 0 kvm:kvm_async_pf_completed

       1.002100578 seconds time elapsed

After:

 Performance counter stats for 'sleep 1s':

            28,354 kvm:kvm_entry                                                [99.98%]
                 0 kvm:kvm_hypercall                                            [99.98%]
                 0 kvm:kvm_hv_hypercall                                         [99.98%]
             1,347 kvm:kvm_pio                                                  [99.98%]
                 0 kvm:kvm_cpuid                                                [99.98%]
             1,931 kvm:kvm_apic                                                 [99.98%]
            29,595 kvm:kvm_exit                                                 [99.98%]
            24,884 kvm:kvm_inj_virq                                             [99.98%]
                 0 kvm:kvm_inj_exception                                        [99.98%]
                 0 kvm:kvm_page_fault                                           [99.98%]
             1,986 kvm:kvm_msr                                                  [99.98%]
                 0 kvm:kvm_cr                                                   [99.98%]
                 0 kvm:kvm_pic_set_irq                                          [99.98%]
                 0 kvm:kvm_apic_ipi                                             [99.99%]
            25,953 kvm:kvm_apic_accept_irq                                      [99.99%]
            26,132 kvm:kvm_eoi                                                  [99.99%]
            26,593 kvm:kvm_pv_eoi                                               [99.99%]
                 0 kvm:kvm_nested_vmrun                                         [99.99%]
                 0 kvm:kvm_nested_intercepts                                    [99.99%]
                 0 kvm:kvm_nested_vmexit                                        [99.99%]
                 0 kvm:kvm_nested_vmexit_inject                                    [99.99%]
                 0 kvm:kvm_nested_intr_vmexit                                    [99.99%]
                 0 kvm:kvm_invlpga                                              [99.99%]
                 0 kvm:kvm_skinit                                               [99.99%]
               284 kvm:kvm_emulate_insn                                         [99.99%]
                68 kvm:vcpu_match_mmio                                          [99.99%]
                68 kvm:kvm_userspace_exit                                       [99.99%]
                 2 kvm:kvm_set_irq                                              [99.99%]
                 2 kvm:kvm_ioapic_set_irq                                       [99.99%]
            28,288 kvm:kvm_msi_set_irq                                          [99.99%]
                 1 kvm:kvm_ack_irq                                              [99.99%]
               131 kvm:kvm_mmio                                                 [100.00%]
               588 kvm:kvm_fpu                                                  [100.00%]
                 0 kvm:kvm_age_page                                             [100.00%]
                 0 kvm:kvm_try_async_get_page                                    [100.00%]
                 0 kvm:kvm_async_pf_doublefault                                    [100.00%]
                 0 kvm:kvm_async_pf_not_present                                    [100.00%]
                 0 kvm:kvm_async_pf_ready                                       [100.00%]
                 0 kvm:kvm_async_pf_completed

       1.002039622 seconds time elapsed

We see that # of exits is almost halved.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 arch/x86/include/asm/bitops.h   |    6 +++-
 arch/x86/include/asm/kvm_para.h |    7 +++++
 arch/x86/kernel/kvm.c           |   51 ++++++++++++++++++++++++++++++++++++--
 3 files changed, 59 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h
index b97596e..c9c70ea 100644
--- a/arch/x86/include/asm/bitops.h
+++ b/arch/x86/include/asm/bitops.h
@@ -26,11 +26,13 @@
 #if __GNUC__ < 4 || (__GNUC__ == 4 && __GNUC_MINOR__ < 1)
 /* Technically wrong, but this avoids compilation errors on some gcc
    versions. */
-#define BITOP_ADDR(x) "=m" (*(volatile long *) (x))
+#define BITOP_ADDR_CONSTRAINT "=m"
 #else
-#define BITOP_ADDR(x) "+m" (*(volatile long *) (x))
+#define BITOP_ADDR_CONSTRAINT "+m"
 #endif
 
+#define BITOP_ADDR(x) BITOP_ADDR_CONSTRAINT (*(volatile long *) (x))
+
 #define ADDR				BITOP_ADDR(addr)
 
 /*
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 183922e..e0ef324d1 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -22,6 +22,7 @@
 #define KVM_FEATURE_CLOCKSOURCE2        3
 #define KVM_FEATURE_ASYNC_PF		4
 #define KVM_FEATURE_STEAL_TIME		5
+#define KVM_FEATURE_PV_EOI		6
 
 /* The last 8 bits are used to indicate how to interpret the flags field
  * in pvclock structure. If no bits are set, all flags are ignored.
@@ -37,6 +38,7 @@
 #define MSR_KVM_SYSTEM_TIME_NEW 0x4b564d01
 #define MSR_KVM_ASYNC_PF_EN 0x4b564d02
 #define MSR_KVM_STEAL_TIME  0x4b564d03
+#define MSR_KVM_PV_EOI_EN      0x4b564d04
 
 struct kvm_steal_time {
 	__u64 steal;
@@ -89,6 +91,11 @@ struct kvm_vcpu_pv_apf_data {
 	__u32 enabled;
 };
 
+#define KVM_PV_EOI_BIT 0
+#define KVM_PV_EOI_MASK (0x1 << KVM_PV_EOI_BIT)
+#define KVM_PV_EOI_ENABLED KVM_PV_EOI_MASK
+#define KVM_PV_EOI_DISABLED 0x0
+
 #ifdef __KERNEL__
 #include <asm/processor.h>
 
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index e554e5a..85cd6ac 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -39,6 +39,8 @@
 #include <asm/desc.h>
 #include <asm/tlbflush.h>
 #include <asm/idle.h>
+#include <asm/apic.h>
+#include <asm/apicdef.h>
 
 static int kvmapf = 1;
 
@@ -283,6 +285,24 @@ static void kvm_register_steal_time(void)
 		cpu, __pa(st));
 }
 
+/* size alignment is implied but just to make it explicit. */
+static DEFINE_PER_CPU(unsigned long, kvm_apic_eoi) __aligned(2) =
+	KVM_PV_EOI_DISABLED;
+
+static void kvm_guest_apic_eoi_write(u32 reg, u32 val)
+{
+	/**
+	 * This relies on __test_and_clear_bit to modify the memory
+	 * in a way that is atomic with respect to the local CPU.
+	 * The hypervisor only accesses this memory from the local CPU so
+	 * there's no need for lock or memory barriers.
+	 * An optimization barrier is implied in apic write.
+	 */
+	if (__test_and_clear_bit(KVM_PV_EOI_BIT, &__get_cpu_var(kvm_apic_eoi)))
+		return;
+	apic->write(APIC_EOI, APIC_EOI_ACK);
+}
+
 void __cpuinit kvm_guest_cpu_init(void)
 {
 	if (!kvm_para_available())
@@ -300,11 +320,17 @@ void __cpuinit kvm_guest_cpu_init(void)
 		       smp_processor_id());
 	}
 
+	if (kvm_para_has_feature(KVM_FEATURE_PV_EOI)) {
+		__get_cpu_var(kvm_apic_eoi) = 0;
+		wrmsrl(MSR_KVM_PV_EOI_EN, __pa(&__get_cpu_var(kvm_apic_eoi)) |
+		       KVM_MSR_ENABLED);
+	}
+
 	if (has_steal_clock)
 		kvm_register_steal_time();
 }
 
-static void kvm_pv_disable_apf(void *unused)
+static void kvm_pv_disable_apf(void)
 {
 	if (!__get_cpu_var(apf_reason).enabled)
 		return;
@@ -316,11 +342,18 @@ static void kvm_pv_disable_apf(void *unused)
 	       smp_processor_id());
 }
 
+static void kvm_pv_guest_cpu_reboot(void *unused)
+{
+	if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
+		wrmsrl(MSR_KVM_PV_EOI_EN, 0);
+	kvm_pv_disable_apf();
+}
+
 static int kvm_pv_reboot_notify(struct notifier_block *nb,
 				unsigned long code, void *unused)
 {
 	if (code == SYS_RESTART)
-		on_each_cpu(kvm_pv_disable_apf, NULL, 1);
+		on_each_cpu(kvm_pv_guest_cpu_reboot, NULL, 1);
 	return NOTIFY_DONE;
 }
 
@@ -371,7 +404,9 @@ static void __cpuinit kvm_guest_cpu_online(void *dummy)
 static void kvm_guest_cpu_offline(void *dummy)
 {
 	kvm_disable_steal_time();
-	kvm_pv_disable_apf(NULL);
+	if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
+		wrmsrl(MSR_KVM_PV_EOI_EN, 0);
+	kvm_pv_disable_apf();
 	apf_task_wake_all();
 }
 
@@ -424,6 +459,16 @@ void __init kvm_guest_init(void)
 		pv_time_ops.steal_clock = kvm_steal_clock;
 	}
 
+	if (kvm_para_has_feature(KVM_FEATURE_PV_EOI)) {
+		struct apic **drv;
+
+		for (drv = __apicdrivers; drv < __apicdrivers_end; drv++) {
+			/* Should happen once for each apic */
+			WARN_ON((*drv)->eoi_write == kvm_guest_apic_eoi_write);
+			(*drv)->eoi_write = kvm_guest_apic_eoi_write;
+		}
+	}
+
 #ifdef CONFIG_SMP
 	smp_ops.smp_prepare_boot_cpu = kvm_smp_prepare_boot_cpu;
 	register_cpu_notifier(&kvm_cpu_notifier);
-- 
MST


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCHv5 2/5] x86/bitops: note on __test_and_clear_bit atomicity
  2012-05-22 14:05 [PATCHv5 0/5] apic: eoi optimization support Michael S. Tsirkin
  2012-05-22 14:05 ` [PATCHv5 1/5] kvm_para: guest side for eoi avoidance Michael S. Tsirkin
@ 2012-05-22 14:05 ` Michael S. Tsirkin
  2012-05-22 14:05 ` [PATCHv5 3/5] kvm: eoi msi documentation Michael S. Tsirkin
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Michael S. Tsirkin @ 2012-05-22 14:05 UTC (permalink / raw)
  To: x86, kvm
  Cc: Ingo Molnar, H. Peter Anvin, Avi Kivity, Marcelo Tosatti, gleb,
	Linus Torvalds, linux-kernel, Thomas Gleixner

__test_and_clear_bit is actually atomic with respect
to the local CPU. Add a note saying that KVM on x86
relies on this behaviour so people don't accidentaly break it.
Also warn not to rely on this in portable code.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 arch/x86/include/asm/bitops.h |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h
index c9c70ea..86f3a1e 100644
--- a/arch/x86/include/asm/bitops.h
+++ b/arch/x86/include/asm/bitops.h
@@ -264,6 +264,13 @@ static inline int test_and_clear_bit(int nr, volatile unsigned long *addr)
  * This operation is non-atomic and can be reordered.
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
+ *
+ * Note: the operation is performed atomically with respect to
+ * the local CPU, but not other CPUs. Portable code should not
+ * rely on this behaviour.
+ * KVM relies on this behaviour on x86 for modifying memory that is also
+ * accessed from a hypervisor on the same CPU if running in a VM: don't change
+ * this without also updating arch/x86/kernel/kvm.c
  */
 static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
 {
-- 
MST


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCHv5 3/5] kvm: eoi msi documentation
  2012-05-22 14:05 [PATCHv5 0/5] apic: eoi optimization support Michael S. Tsirkin
  2012-05-22 14:05 ` [PATCHv5 1/5] kvm_para: guest side for eoi avoidance Michael S. Tsirkin
  2012-05-22 14:05 ` [PATCHv5 2/5] x86/bitops: note on __test_and_clear_bit atomicity Michael S. Tsirkin
@ 2012-05-22 14:05 ` Michael S. Tsirkin
  2012-05-22 14:05 ` [PATCHv5 4/5] kvm: only sync when attention bits set Michael S. Tsirkin
  2012-05-22 14:05 ` [PATCHv5 5/5] kvm: host side for eoi optimization Michael S. Tsirkin
  4 siblings, 0 replies; 10+ messages in thread
From: Michael S. Tsirkin @ 2012-05-22 14:05 UTC (permalink / raw)
  To: x86, kvm
  Cc: Ingo Molnar, H. Peter Anvin, Avi Kivity, Marcelo Tosatti, gleb,
	Linus Torvalds, linux-kernel, Thomas Gleixner

Document the new EOI MSR. Couldn't decide whether this change belongs
conceptually on guest or host side, so a separate patch.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 Documentation/virtual/kvm/msr.txt |   32 ++++++++++++++++++++++++++++++++
 1 files changed, 32 insertions(+), 0 deletions(-)

diff --git a/Documentation/virtual/kvm/msr.txt b/Documentation/virtual/kvm/msr.txt
index 5031780..3481f1b 100644
--- a/Documentation/virtual/kvm/msr.txt
+++ b/Documentation/virtual/kvm/msr.txt
@@ -219,3 +219,35 @@ MSR_KVM_STEAL_TIME: 0x4b564d03
 		steal: the amount of time in which this vCPU did not run, in
 		nanoseconds. Time during which the vcpu is idle, will not be
 		reported as steal time.
+
+MSR_KVM_EOI_EN: 0x4b564d04
+	data: Bit 0 is 1 when PV end of interrupt is enabled on the vcpu; 0
+	when disabled.  When enabled, bits 63-1 hold 2-byte aligned physical address
+	of a 2 byte memory area which must be in guest RAM and must be zeroed.
+
+	The first, least significant bit of 2 byte memory location will be
+	written to by the hypervisor, typically at the time of interrupt
+	injection.  Value of 1 means that guest can skip writing EOI to the apic
+	(using MSR or MMIO write); instead, it is sufficient to signal
+	EOI by clearing the bit in guest memory - this location will
+	later be polled by the hypervisor.
+	Value of 0 means that the EOI write is required.
+
+	It is always safe for the guest to ignore the optimization and perform
+	the APIC EOI write anyway.
+
+	Hypervisor is guaranteed to only modify this least
+	significant bit while in the current VCPU context, this means that
+	guest does not need to use either lock prefix or memory ordering
+	primitives to synchronise with the hypervisor.
+
+	However, hypervisor can set and clear this memory bit at any time:
+	therefore to make sure hypervisor does not interrupt the
+	guest and clear the least significant bit in the memory area
+	in the window between guest testing it to detect
+	whether it can skip EOI apic write and between guest
+	clearing it to signal EOI to the hypervisor,
+	guest must both read the least sgnificant bit in the memory area and
+	clear it using a single CPU instruction, such as test and clear, or
+	compare and exchange.
+
-- 
MST


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCHv5 4/5] kvm: only sync when attention bits set
  2012-05-22 14:05 [PATCHv5 0/5] apic: eoi optimization support Michael S. Tsirkin
                   ` (2 preceding siblings ...)
  2012-05-22 14:05 ` [PATCHv5 3/5] kvm: eoi msi documentation Michael S. Tsirkin
@ 2012-05-22 14:05 ` Michael S. Tsirkin
  2012-05-22 14:05 ` [PATCHv5 5/5] kvm: host side for eoi optimization Michael S. Tsirkin
  4 siblings, 0 replies; 10+ messages in thread
From: Michael S. Tsirkin @ 2012-05-22 14:05 UTC (permalink / raw)
  To: x86, kvm
  Cc: Ingo Molnar, H. Peter Anvin, Avi Kivity, Marcelo Tosatti, gleb,
	Linus Torvalds, linux-kernel, Thomas Gleixner

Commit eb0dc6d0368072236dcd086d7fdc17fd3c4574d4 introduced apic
attention bitmask but kvm still syncs lapic unconditionally.
As that commit suggested and in anticipation of adding more attention
bits, only sync lapic if(apic_attention).

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 arch/x86/kvm/x86.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 185a2b8..3e57566 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5370,7 +5370,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 	if (unlikely(vcpu->arch.tsc_always_catchup))
 		kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
 
-	kvm_lapic_sync_from_vapic(vcpu);
+	if (unlikely(vcpu->arch.apic_attention))
+		kvm_lapic_sync_from_vapic(vcpu);
 
 	r = kvm_x86_ops->handle_exit(vcpu);
 out:
-- 
MST


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCHv5 5/5] kvm: host side for eoi optimization
  2012-05-22 14:05 [PATCHv5 0/5] apic: eoi optimization support Michael S. Tsirkin
                   ` (3 preceding siblings ...)
  2012-05-22 14:05 ` [PATCHv5 4/5] kvm: only sync when attention bits set Michael S. Tsirkin
@ 2012-05-22 14:05 ` Michael S. Tsirkin
  2012-05-31  9:57   ` Gleb Natapov
  4 siblings, 1 reply; 10+ messages in thread
From: Michael S. Tsirkin @ 2012-05-22 14:05 UTC (permalink / raw)
  To: x86, kvm
  Cc: Ingo Molnar, H. Peter Anvin, Avi Kivity, Marcelo Tosatti, gleb,
	Linus Torvalds, linux-kernel, Thomas Gleixner

Implementation of PV EOI using shared memory.
This reduces the number of exits an interrupt
causes as much as by half.

The idea is simple: there's a bit, per APIC, in guest memory,
that tells the guest that it does not need EOI.
We set it before injecting an interrupt and clear
before injecting a nested one. Guest tests it using
a test and clear operation - this is necessary
so that host can detect interrupt nesting -
and if set, it can skip the EOI MSR.

There's a new MSR to set the address of said register
in guest memory. Otherwise not much changed:
- Guest EOI is not required
- Register is tested & ISR is automatically cleared on exit

For testing results see description of previous patch
'kvm_para: guest side for eoi avoidance'.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |   12 ++++
 arch/x86/kvm/cpuid.c            |    1 +
 arch/x86/kvm/lapic.c            |  135 +++++++++++++++++++++++++++++++++++++-
 arch/x86/kvm/lapic.h            |    2 +
 arch/x86/kvm/trace.h            |   34 ++++++++++
 arch/x86/kvm/x86.c              |    5 ++
 6 files changed, 185 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 32297f2..3ded418 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -174,6 +174,13 @@ enum {
 
 /* apic attention bits */
 #define KVM_APIC_CHECK_VAPIC	0
+/*
+ * The following bit is set with PV-EOI, unset on EOI.
+ * We detect PV-EOI changes by guest by comparing
+ * this bit with PV-EOI in guest memory.
+ * See the implementation in apic_update_pv_eoi.
+ */ 
+#define KVM_APIC_PV_EOI_PENDING	1
 
 /*
  * We don't want allocation failures within the mmu code, so we preallocate
@@ -485,6 +492,11 @@ struct kvm_vcpu_arch {
 		u64 length;
 		u64 status;
 	} osvw;
+
+	struct {
+		u64 msr_val;
+		struct gfn_to_hva_cache data;
+	} pv_eoi;
 };
 
 struct kvm_lpage_info {
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index eba8345..a9f155a 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -408,6 +408,7 @@ static int do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 			     (1 << KVM_FEATURE_NOP_IO_DELAY) |
 			     (1 << KVM_FEATURE_CLOCKSOURCE2) |
 			     (1 << KVM_FEATURE_ASYNC_PF) |
+			     (1 << KVM_FEATURE_PV_EOI) |
 			     (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT);
 
 		if (sched_info_on())
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 0d2985d..9d804e7 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -309,6 +309,54 @@ int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq)
 			irq->level, irq->trig_mode);
 }
 
+static int pv_eoi_put_user(struct kvm_vcpu *vcpu, u8 val)
+{
+
+	return kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.pv_eoi.data, &val,
+				      sizeof(val));
+}
+
+static int pv_eoi_get_user(struct kvm_vcpu *vcpu, u8 *val)
+{
+
+	return kvm_read_guest_cached(vcpu->kvm, &vcpu->arch.pv_eoi.data, val,
+				      sizeof(*val));
+}
+
+static inline bool pv_eoi_enabled(struct kvm_vcpu *vcpu)
+{
+	return vcpu->arch.pv_eoi.msr_val & KVM_MSR_ENABLED;
+}
+
+static bool pv_eoi_get_pending(struct kvm_vcpu *vcpu)
+{
+	u8 val;
+	if (pv_eoi_get_user(vcpu, &val) < 0)
+		apic_debug("Can't read EOI MSR value: 0x%llx\n",
+			   (unsigned long long)vcpi->arch.pv_eoi.msr_val);
+	return val & 0x1;
+}
+
+static void pv_eoi_set_pending(struct kvm_vcpu *vcpu)
+{
+	if (pv_eoi_put_user(vcpu, KVM_PV_EOI_ENABLED) < 0) {
+		apic_debug("Can't set EOI MSR value: 0x%llx\n",
+			   (unsigned long long)vcpi->arch.pv_eoi.msr_val);
+		return;
+	}
+	__set_bit(KVM_APIC_PV_EOI_PENDING, &vcpu->arch.apic_attention);
+}
+
+static void pv_eoi_clr_pending(struct kvm_vcpu *vcpu)
+{
+	if (pv_eoi_put_user(vcpu, KVM_PV_EOI_DISABLED) < 0) {
+		apic_debug("Can't clear EOI MSR value: 0x%llx\n",
+			   (unsigned long long)vcpi->arch.pv_eoi.msr_val);
+		return;
+	}
+	__clear_bit(KVM_APIC_PV_EOI_PENDING, &vcpu->arch.apic_attention);
+}
+
 static inline int apic_find_highest_isr(struct kvm_lapic *apic)
 {
 	int result;
@@ -525,15 +573,18 @@ int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1, struct kvm_vcpu *vcpu2)
 	return vcpu1->arch.apic_arb_prio - vcpu2->arch.apic_arb_prio;
 }
 
-static void apic_set_eoi(struct kvm_lapic *apic)
+static int apic_set_eoi(struct kvm_lapic *apic)
 {
 	int vector = apic_find_highest_isr(apic);
+
+	trace_kvm_eoi(apic, vector);
+
 	/*
 	 * Not every write EOI will has corresponding ISR,
 	 * one example is when Kernel check timer on setup_IO_APIC
 	 */
 	if (vector == -1)
-		return;
+		return vector;
 
 	apic_clear_isr(vector, apic);
 	apic_update_ppr(apic);
@@ -548,6 +599,7 @@ static void apic_set_eoi(struct kvm_lapic *apic)
 		kvm_ioapic_update_eoi(apic->vcpu->kvm, vector, trigger_mode);
 	}
 	kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
+	return vector;
 }
 
 static void apic_send_ipi(struct kvm_lapic *apic)
@@ -1130,6 +1182,7 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu)
 	atomic_set(&apic->lapic_timer.pending, 0);
 	if (kvm_vcpu_is_bsp(vcpu))
 		vcpu->arch.apic_base |= MSR_IA32_APICBASE_BSP;
+	vcpu->arch.pv_eoi.msr_val = 0;
 	apic_update_ppr(apic);
 
 	vcpu->arch.apic_arb_prio = 0;
@@ -1330,11 +1383,50 @@ void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu)
 		hrtimer_start_expires(timer, HRTIMER_MODE_ABS);
 }
 
+/*
+ * apic_sync_pv_eoi_from_guest - called on vmexit
+ *
+ * Detect whether guest triggered PV EOI since the
+ * last entry. If yes, set EOI on guests's behalf.
+ */
+static void apic_sync_pv_eoi_from_guest(struct kvm_vcpu *vcpu,
+					struct kvm_lapic *apic)
+{
+	bool pending;
+	int vector;
+	/*
+	 * PV EOI state is derived from KVM_APIC_PV_EOI_PENDING in host
+	 * and KVM_PV_EOI_ENABLED in guest memory as follows:
+	 *
+	 * KVM_APIC_PV_EOI_PENDING is unset:
+	 * 	-> host disabled PV EOI.
+	 * KVM_APIC_PV_EOI_PENDING is set, KVM_PV_EOI_ENABLED is set:
+	 * 	-> host enabled PV EOI, guest did not execute EOI yet.
+	 * KVM_APIC_PV_EOI_PENDING is set, KVM_PV_EOI_ENABLED is unset:
+	 * 	-> host enabled PV EOI, guest executed EOI.
+	 */
+	BUG_ON(!pv_eoi_enabled(vcpu));
+	pending = pv_eoi_get_pending(vcpu);
+	/*
+	 * Clear pending bit in any case: it will be set again on vmentry.
+	 * While this might not be ideal from performance point of view,
+	 * this makes sure pv eoi is only enabled when we know it's safe.
+	 */
+	pv_eoi_clr_pending(vcpu);
+	if (pending)
+		return;
+	vector = apic_set_eoi(apic);
+	trace_kvm_pv_eoi(apic, vector);
+}
+
 void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu)
 {
 	u32 data;
 	void *vapic;
 
+	if (test_bit(KVM_APIC_PV_EOI_PENDING, &vcpu->arch.apic_attention))
+		apic_sync_pv_eoi_from_guest(vcpu, vcpu->arch.apic);
+
 	if (!test_bit(KVM_APIC_CHECK_VAPIC, &vcpu->arch.apic_attention))
 		return;
 
@@ -1345,17 +1437,40 @@ void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu)
 	apic_set_tpr(vcpu->arch.apic, data & 0xff);
 }
 
+/*
+ * apic_sync_pv_eoi_to_guest - called before vmentry
+ *
+ * Detect whether it's safe to enable PV EOI and
+ * if yes do so.
+ */
+static void apic_sync_pv_eoi_to_guest(struct kvm_vcpu *vcpu,
+					struct kvm_lapic *apic)
+{
+	if (!pv_eoi_enabled(vcpu) ||
+	    /* IRR set or many bits in ISR: could be nested. */
+	    unlikely(apic->irr_pending) ||
+	    unlikely(apic->isr_count != 1) ||
+	    /* Cache not set: safe but we don't bother. */
+	    unlikely(apic->isr_cache == -1) ||
+	    /* Need EOI to update ioapic. */
+	    unlikely(kvm_ioapic_handles_vector(vcpu->kvm, apic->isr_cache)))
+		return;
+
+	pv_eoi_set_pending(apic->vcpu);
+}
+
 void kvm_lapic_sync_to_vapic(struct kvm_vcpu *vcpu)
 {
 	u32 data, tpr;
 	int max_irr, max_isr;
-	struct kvm_lapic *apic;
+	struct kvm_lapic *apic = vcpu->arch.apic;
 	void *vapic;
 
+	apic_sync_pv_eoi_to_guest(vcpu, apic);
+
 	if (!test_bit(KVM_APIC_CHECK_VAPIC, &vcpu->arch.apic_attention))
 		return;
 
-	apic = vcpu->arch.apic;
 	tpr = apic_get_reg(apic, APIC_TASKPRI) & 0xff;
 	max_irr = apic_find_highest_irr(apic);
 	if (max_irr < 0)
@@ -1441,3 +1556,15 @@ int kvm_hv_vapic_msr_read(struct kvm_vcpu *vcpu, u32 reg, u64 *data)
 
 	return 0;
 }
+
+int kvm_lapic_enable_pv_eoi(struct kvm_vcpu *vcpu, u64 data)
+{
+	u64 addr = data & ~KVM_MSR_ENABLED;
+	if (pv_eoi_enabled(vcpu))
+		pv_eoi_clr_pending(vcpu);
+	vcpu->arch.pv_eoi.msr_val = data;
+	if (!pv_eoi_enabled(vcpu))
+		return 0;
+	return kvm_gfn_to_hva_cache_init(vcpu->kvm, &vcpu->arch.pv_eoi.data,
+					 addr);
+}
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 9f8deff..41c62c7 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -62,4 +62,6 @@ static inline bool kvm_hv_vapic_assist_page_enabled(struct kvm_vcpu *vcpu)
 {
 	return vcpu->arch.hv_vapic & HV_X64_MSR_APIC_ASSIST_PAGE_ENABLE;
 }
+
+int kvm_lapic_enable_pv_eoi(struct kvm_vcpu *vcpu, u64 data);
 #endif
diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index 911d264..851914e 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -517,6 +517,40 @@ TRACE_EVENT(kvm_apic_accept_irq,
 		  __entry->coalesced ? " (coalesced)" : "")
 );
 
+TRACE_EVENT(kvm_eoi,
+	    TP_PROTO(struct kvm_lapic *apic, int vector),
+	    TP_ARGS(apic, vector),
+
+	TP_STRUCT__entry(
+		__field(	__u32,		apicid		)
+		__field(	int,		vector		)
+	),
+
+	TP_fast_assign(
+		__entry->apicid		= apic->vcpu->vcpu_id;
+		__entry->vector		= vector;
+	),
+
+	TP_printk("apicid %x vector %d", __entry->apicid, __entry->vector)
+);
+
+TRACE_EVENT(kvm_pv_eoi,
+	    TP_PROTO(struct kvm_lapic *apic, int vector),
+	    TP_ARGS(apic, vector),
+
+	TP_STRUCT__entry(
+		__field(	__u32,		apicid		)
+		__field(	int,		vector		)
+	),
+
+	TP_fast_assign(
+		__entry->apicid		= apic->vcpu->vcpu_id;
+		__entry->vector		= vector;
+	),
+
+	TP_printk("apicid %x vector %d", __entry->apicid, __entry->vector)
+);
+
 /*
  * Tracepoint for nested VMRUN
  */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3e57566..4fa26f2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -795,6 +795,7 @@ static u32 msrs_to_save[] = {
 	MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW,
 	HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL,
 	HV_X64_MSR_APIC_ASSIST_PAGE, MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME,
+	MSR_KVM_PV_EOI_EN,
 	MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP,
 	MSR_STAR,
 #ifdef CONFIG_X86_64
@@ -1653,6 +1654,10 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
 		kvm_make_request(KVM_REQ_STEAL_UPDATE, vcpu);
 
 		break;
+	case MSR_KVM_PV_EOI_EN:
+		if (kvm_lapic_enable_pv_eoi(vcpu, data))
+			return 1;
+		break;
 
 	case MSR_IA32_MCG_CTL:
 	case MSR_IA32_MCG_STATUS:
-- 
MST

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCHv5 5/5] kvm: host side for eoi optimization
  2012-05-22 14:05 ` [PATCHv5 5/5] kvm: host side for eoi optimization Michael S. Tsirkin
@ 2012-05-31  9:57   ` Gleb Natapov
  2012-05-31 10:11     ` Michael S. Tsirkin
  0 siblings, 1 reply; 10+ messages in thread
From: Gleb Natapov @ 2012-05-31  9:57 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: x86, kvm, Ingo Molnar, H. Peter Anvin, Avi Kivity,
	Marcelo Tosatti, Linus Torvalds, linux-kernel, Thomas Gleixner

On Tue, May 22, 2012 at 05:05:47PM +0300, Michael S. Tsirkin wrote:
> Implementation of PV EOI using shared memory.
> This reduces the number of exits an interrupt
> causes as much as by half.
> 
> The idea is simple: there's a bit, per APIC, in guest memory,
> that tells the guest that it does not need EOI.
> We set it before injecting an interrupt and clear
> before injecting a nested one. Guest tests it using
> a test and clear operation - this is necessary
> so that host can detect interrupt nesting -
> and if set, it can skip the EOI MSR.
> 
> There's a new MSR to set the address of said register
> in guest memory. Otherwise not much changed:
> - Guest EOI is not required
> - Register is tested & ISR is automatically cleared on exit
> 
> For testing results see description of previous patch
> 'kvm_para: guest side for eoi avoidance'.
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>  arch/x86/include/asm/kvm_host.h |   12 ++++
>  arch/x86/kvm/cpuid.c            |    1 +
>  arch/x86/kvm/lapic.c            |  135 +++++++++++++++++++++++++++++++++++++-
>  arch/x86/kvm/lapic.h            |    2 +
>  arch/x86/kvm/trace.h            |   34 ++++++++++
>  arch/x86/kvm/x86.c              |    5 ++
>  6 files changed, 185 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 32297f2..3ded418 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -174,6 +174,13 @@ enum {
>  
>  /* apic attention bits */
>  #define KVM_APIC_CHECK_VAPIC	0
> +/*
> + * The following bit is set with PV-EOI, unset on EOI.
> + * We detect PV-EOI changes by guest by comparing
> + * this bit with PV-EOI in guest memory.
> + * See the implementation in apic_update_pv_eoi.
> + */ 
> +#define KVM_APIC_PV_EOI_PENDING	1
>  
>  /*
>   * We don't want allocation failures within the mmu code, so we preallocate
> @@ -485,6 +492,11 @@ struct kvm_vcpu_arch {
>  		u64 length;
>  		u64 status;
>  	} osvw;
> +
> +	struct {
> +		u64 msr_val;
> +		struct gfn_to_hva_cache data;
> +	} pv_eoi;
>  };
>  
>  struct kvm_lpage_info {
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index eba8345..a9f155a 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -408,6 +408,7 @@ static int do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
>  			     (1 << KVM_FEATURE_NOP_IO_DELAY) |
>  			     (1 << KVM_FEATURE_CLOCKSOURCE2) |
>  			     (1 << KVM_FEATURE_ASYNC_PF) |
> +			     (1 << KVM_FEATURE_PV_EOI) |
>  			     (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT);
>  
>  		if (sched_info_on())
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 0d2985d..9d804e7 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -309,6 +309,54 @@ int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq)
>  			irq->level, irq->trig_mode);
>  }
>  
> +static int pv_eoi_put_user(struct kvm_vcpu *vcpu, u8 val)
> +{
> +
> +	return kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.pv_eoi.data, &val,
> +				      sizeof(val));
> +}
> +
> +static int pv_eoi_get_user(struct kvm_vcpu *vcpu, u8 *val)
> +{
> +
> +	return kvm_read_guest_cached(vcpu->kvm, &vcpu->arch.pv_eoi.data, val,
> +				      sizeof(*val));
> +}
> +
> +static inline bool pv_eoi_enabled(struct kvm_vcpu *vcpu)
> +{
> +	return vcpu->arch.pv_eoi.msr_val & KVM_MSR_ENABLED;
> +}
> +
> +static bool pv_eoi_get_pending(struct kvm_vcpu *vcpu)
> +{
> +	u8 val;
> +	if (pv_eoi_get_user(vcpu, &val) < 0)
> +		apic_debug("Can't read EOI MSR value: 0x%llx\n",
> +			   (unsigned long long)vcpi->arch.pv_eoi.msr_val);
> +	return val & 0x1;
> +}
> +
> +static void pv_eoi_set_pending(struct kvm_vcpu *vcpu)
> +{
> +	if (pv_eoi_put_user(vcpu, KVM_PV_EOI_ENABLED) < 0) {
> +		apic_debug("Can't set EOI MSR value: 0x%llx\n",
> +			   (unsigned long long)vcpi->arch.pv_eoi.msr_val);
> +		return;
> +	}
> +	__set_bit(KVM_APIC_PV_EOI_PENDING, &vcpu->arch.apic_attention);
> +}
> +
> +static void pv_eoi_clr_pending(struct kvm_vcpu *vcpu)
> +{
> +	if (pv_eoi_put_user(vcpu, KVM_PV_EOI_DISABLED) < 0) {
> +		apic_debug("Can't clear EOI MSR value: 0x%llx\n",
> +			   (unsigned long long)vcpi->arch.pv_eoi.msr_val);
> +		return;
> +	}
> +	__clear_bit(KVM_APIC_PV_EOI_PENDING, &vcpu->arch.apic_attention);
> +}
> +
>  static inline int apic_find_highest_isr(struct kvm_lapic *apic)
>  {
>  	int result;
> @@ -525,15 +573,18 @@ int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1, struct kvm_vcpu *vcpu2)
>  	return vcpu1->arch.apic_arb_prio - vcpu2->arch.apic_arb_prio;
>  }
>  
> -static void apic_set_eoi(struct kvm_lapic *apic)
> +static int apic_set_eoi(struct kvm_lapic *apic)
>  {
>  	int vector = apic_find_highest_isr(apic);
> +
> +	trace_kvm_eoi(apic, vector);
> +
>  	/*
>  	 * Not every write EOI will has corresponding ISR,
>  	 * one example is when Kernel check timer on setup_IO_APIC
>  	 */
>  	if (vector == -1)
> -		return;
> +		return vector;
>  
>  	apic_clear_isr(vector, apic);
>  	apic_update_ppr(apic);
> @@ -548,6 +599,7 @@ static void apic_set_eoi(struct kvm_lapic *apic)
>  		kvm_ioapic_update_eoi(apic->vcpu->kvm, vector, trigger_mode);
>  	}
>  	kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
> +	return vector;
>  }
>  
>  static void apic_send_ipi(struct kvm_lapic *apic)
> @@ -1130,6 +1182,7 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu)
>  	atomic_set(&apic->lapic_timer.pending, 0);
>  	if (kvm_vcpu_is_bsp(vcpu))
>  		vcpu->arch.apic_base |= MSR_IA32_APICBASE_BSP;
> +	vcpu->arch.pv_eoi.msr_val = 0;
>  	apic_update_ppr(apic);
>  
>  	vcpu->arch.apic_arb_prio = 0;
> @@ -1330,11 +1383,50 @@ void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu)
>  		hrtimer_start_expires(timer, HRTIMER_MODE_ABS);
>  }
>  
> +/*
> + * apic_sync_pv_eoi_from_guest - called on vmexit
> + *
> + * Detect whether guest triggered PV EOI since the
> + * last entry. If yes, set EOI on guests's behalf.
> + */
> +static void apic_sync_pv_eoi_from_guest(struct kvm_vcpu *vcpu,
> +					struct kvm_lapic *apic)
> +{
> +	bool pending;
> +	int vector;
> +	/*
> +	 * PV EOI state is derived from KVM_APIC_PV_EOI_PENDING in host
> +	 * and KVM_PV_EOI_ENABLED in guest memory as follows:
> +	 *
> +	 * KVM_APIC_PV_EOI_PENDING is unset:
> +	 * 	-> host disabled PV EOI.
> +	 * KVM_APIC_PV_EOI_PENDING is set, KVM_PV_EOI_ENABLED is set:
> +	 * 	-> host enabled PV EOI, guest did not execute EOI yet.
> +	 * KVM_APIC_PV_EOI_PENDING is set, KVM_PV_EOI_ENABLED is unset:
> +	 * 	-> host enabled PV EOI, guest executed EOI.
> +	 */
> +	BUG_ON(!pv_eoi_enabled(vcpu));
> +	pending = pv_eoi_get_pending(vcpu);
> +	/*
> +	 * Clear pending bit in any case: it will be set again on vmentry.
> +	 * While this might not be ideal from performance point of view,
> +	 * this makes sure pv eoi is only enabled when we know it's safe.
> +	 */
> +	pv_eoi_clr_pending(vcpu);
> +	if (pending)
> +		return;
> +	vector = apic_set_eoi(apic);
> +	trace_kvm_pv_eoi(apic, vector);
> +}
> +
>  void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu)
>  {
>  	u32 data;
>  	void *vapic;
>  
> +	if (test_bit(KVM_APIC_PV_EOI_PENDING, &vcpu->arch.apic_attention))
> +		apic_sync_pv_eoi_from_guest(vcpu, vcpu->arch.apic);
> +
>  	if (!test_bit(KVM_APIC_CHECK_VAPIC, &vcpu->arch.apic_attention))
>  		return;
>  
> @@ -1345,17 +1437,40 @@ void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu)
>  	apic_set_tpr(vcpu->arch.apic, data & 0xff);
>  }
>  
> +/*
> + * apic_sync_pv_eoi_to_guest - called before vmentry
> + *
> + * Detect whether it's safe to enable PV EOI and
> + * if yes do so.
> + */
> +static void apic_sync_pv_eoi_to_guest(struct kvm_vcpu *vcpu,
> +					struct kvm_lapic *apic)
> +{
> +	if (!pv_eoi_enabled(vcpu) ||
> +	    /* IRR set or many bits in ISR: could be nested. */
> +	    unlikely(apic->irr_pending) ||
> +	    unlikely(apic->isr_count != 1) ||
Remind me why pv_eoi should not be set if there is more than one isr?

> +	    /* Cache not set: safe but we don't bother. */
> +	    unlikely(apic->isr_cache == -1) ||
> +	    /* Need EOI to update ioapic. */
> +	    unlikely(kvm_ioapic_handles_vector(vcpu->kvm, apic->isr_cache)))
> +		return;
> +
> +	pv_eoi_set_pending(apic->vcpu);
> +}
> +
apic_sync_pv_eoi_to_guest() is not paired with
apic_sync_pv_eoi_from_guest() if event injection is canceled.
You can enter guest with stale pv_eoi bit.

>  void kvm_lapic_sync_to_vapic(struct kvm_vcpu *vcpu)
>  {
>  	u32 data, tpr;
>  	int max_irr, max_isr;
> -	struct kvm_lapic *apic;
> +	struct kvm_lapic *apic = vcpu->arch.apic;
>  	void *vapic;
>  
> +	apic_sync_pv_eoi_to_guest(vcpu, apic);
> +
>  	if (!test_bit(KVM_APIC_CHECK_VAPIC, &vcpu->arch.apic_attention))
>  		return;
>  
> -	apic = vcpu->arch.apic;
>  	tpr = apic_get_reg(apic, APIC_TASKPRI) & 0xff;
>  	max_irr = apic_find_highest_irr(apic);
>  	if (max_irr < 0)
> @@ -1441,3 +1556,15 @@ int kvm_hv_vapic_msr_read(struct kvm_vcpu *vcpu, u32 reg, u64 *data)
>  
>  	return 0;
>  }
> +
> +int kvm_lapic_enable_pv_eoi(struct kvm_vcpu *vcpu, u64 data)
> +{
> +	u64 addr = data & ~KVM_MSR_ENABLED;
> +	if (pv_eoi_enabled(vcpu))
> +		pv_eoi_clr_pending(vcpu);
> +	vcpu->arch.pv_eoi.msr_val = data;
> +	if (!pv_eoi_enabled(vcpu))
> +		return 0;
> +	return kvm_gfn_to_hva_cache_init(vcpu->kvm, &vcpu->arch.pv_eoi.data,
> +					 addr);
> +}
> diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
> index 9f8deff..41c62c7 100644
> --- a/arch/x86/kvm/lapic.h
> +++ b/arch/x86/kvm/lapic.h
> @@ -62,4 +62,6 @@ static inline bool kvm_hv_vapic_assist_page_enabled(struct kvm_vcpu *vcpu)
>  {
>  	return vcpu->arch.hv_vapic & HV_X64_MSR_APIC_ASSIST_PAGE_ENABLE;
>  }
> +
> +int kvm_lapic_enable_pv_eoi(struct kvm_vcpu *vcpu, u64 data);
>  #endif
> diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
> index 911d264..851914e 100644
> --- a/arch/x86/kvm/trace.h
> +++ b/arch/x86/kvm/trace.h
> @@ -517,6 +517,40 @@ TRACE_EVENT(kvm_apic_accept_irq,
>  		  __entry->coalesced ? " (coalesced)" : "")
>  );
>  
> +TRACE_EVENT(kvm_eoi,
> +	    TP_PROTO(struct kvm_lapic *apic, int vector),
> +	    TP_ARGS(apic, vector),
> +
> +	TP_STRUCT__entry(
> +		__field(	__u32,		apicid		)
> +		__field(	int,		vector		)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->apicid		= apic->vcpu->vcpu_id;
> +		__entry->vector		= vector;
> +	),
> +
> +	TP_printk("apicid %x vector %d", __entry->apicid, __entry->vector)
> +);
> +
> +TRACE_EVENT(kvm_pv_eoi,
> +	    TP_PROTO(struct kvm_lapic *apic, int vector),
> +	    TP_ARGS(apic, vector),
> +
> +	TP_STRUCT__entry(
> +		__field(	__u32,		apicid		)
> +		__field(	int,		vector		)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->apicid		= apic->vcpu->vcpu_id;
> +		__entry->vector		= vector;
> +	),
> +
> +	TP_printk("apicid %x vector %d", __entry->apicid, __entry->vector)
> +);
> +
>  /*
>   * Tracepoint for nested VMRUN
>   */
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 3e57566..4fa26f2 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -795,6 +795,7 @@ static u32 msrs_to_save[] = {
>  	MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW,
>  	HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL,
>  	HV_X64_MSR_APIC_ASSIST_PAGE, MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME,
> +	MSR_KVM_PV_EOI_EN,
>  	MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP,
>  	MSR_STAR,
>  #ifdef CONFIG_X86_64
> @@ -1653,6 +1654,10 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
>  		kvm_make_request(KVM_REQ_STEAL_UPDATE, vcpu);
>  
>  		break;
> +	case MSR_KVM_PV_EOI_EN:
> +		if (kvm_lapic_enable_pv_eoi(vcpu, data))
> +			return 1;
> +		break;
>  
>  	case MSR_IA32_MCG_CTL:
>  	case MSR_IA32_MCG_STATUS:
> -- 
> MST

--
			Gleb.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCHv5 5/5] kvm: host side for eoi optimization
  2012-05-31  9:57   ` Gleb Natapov
@ 2012-05-31 10:11     ` Michael S. Tsirkin
  2012-05-31 10:15       ` Gleb Natapov
  0 siblings, 1 reply; 10+ messages in thread
From: Michael S. Tsirkin @ 2012-05-31 10:11 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: x86, kvm, Ingo Molnar, H. Peter Anvin, Avi Kivity,
	Marcelo Tosatti, Linus Torvalds, linux-kernel, Thomas Gleixner

On Thu, May 31, 2012 at 12:57:10PM +0300, Gleb Natapov wrote:
> > @@ -1345,17 +1437,40 @@ void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu)
> >  	apic_set_tpr(vcpu->arch.apic, data & 0xff);
> >  }
> >  
> > +/*
> > + * apic_sync_pv_eoi_to_guest - called before vmentry
> > + *
> > + * Detect whether it's safe to enable PV EOI and
> > + * if yes do so.
> > + */
> > +static void apic_sync_pv_eoi_to_guest(struct kvm_vcpu *vcpu,
> > +					struct kvm_lapic *apic)
> > +{
> > +	if (!pv_eoi_enabled(vcpu) ||
> > +	    /* IRR set or many bits in ISR: could be nested. */
> > +	    unlikely(apic->irr_pending) ||
> > +	    unlikely(apic->isr_count != 1) ||
> Remind me why pv_eoi should not be set if there is more than one isr?

There's a comment below: it might be safe but
we do not bother: no easy way to know which interrupt
has higher priority.

In my testing more than one bit almost never happens in practice so not
worth optimizing for.


> 
> > +	    /* Cache not set: safe but we don't bother. */
> > +	    unlikely(apic->isr_cache == -1) ||
> > +	    /* Need EOI to update ioapic. */
> > +	    unlikely(kvm_ioapic_handles_vector(vcpu->kvm, apic->isr_cache)))
> > +		return;
> > +
> > +	pv_eoi_set_pending(apic->vcpu);
> > +}
> > +
> apic_sync_pv_eoi_to_guest() is not paired with
> apic_sync_pv_eoi_from_guest() if event injection is canceled.
> You can enter guest with stale pv_eoi bit.

Never. The pv_eoi bit is cleared on each exit.
It will stay cleared unless we set it here.
I will add a comment.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCHv5 5/5] kvm: host side for eoi optimization
  2012-05-31 10:11     ` Michael S. Tsirkin
@ 2012-05-31 10:15       ` Gleb Natapov
  2012-05-31 14:52         ` Michael S. Tsirkin
  0 siblings, 1 reply; 10+ messages in thread
From: Gleb Natapov @ 2012-05-31 10:15 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: x86, kvm, Ingo Molnar, H. Peter Anvin, Avi Kivity,
	Marcelo Tosatti, Linus Torvalds, linux-kernel, Thomas Gleixner

On Thu, May 31, 2012 at 01:11:11PM +0300, Michael S. Tsirkin wrote:
> On Thu, May 31, 2012 at 12:57:10PM +0300, Gleb Natapov wrote:
> > > @@ -1345,17 +1437,40 @@ void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu)
> > >  	apic_set_tpr(vcpu->arch.apic, data & 0xff);
> > >  }
> > >  
> > > +/*
> > > + * apic_sync_pv_eoi_to_guest - called before vmentry
> > > + *
> > > + * Detect whether it's safe to enable PV EOI and
> > > + * if yes do so.
> > > + */
> > > +static void apic_sync_pv_eoi_to_guest(struct kvm_vcpu *vcpu,
> > > +					struct kvm_lapic *apic)
> > > +{
> > > +	if (!pv_eoi_enabled(vcpu) ||
> > > +	    /* IRR set or many bits in ISR: could be nested. */
> > > +	    unlikely(apic->irr_pending) ||
> > > +	    unlikely(apic->isr_count != 1) ||
> > Remind me why pv_eoi should not be set if there is more than one isr?
> 
> There's a comment below: it might be safe but
> we do not bother: no easy way to know which interrupt
> has higher priority.
Last injected interrupt has highest priority.

> 
> In my testing more than one bit almost never happens in practice so not
> worth optimizing for.
> 
> 
> > 
> > > +	    /* Cache not set: safe but we don't bother. */
> > > +	    unlikely(apic->isr_cache == -1) ||
> > > +	    /* Need EOI to update ioapic. */
> > > +	    unlikely(kvm_ioapic_handles_vector(vcpu->kvm, apic->isr_cache)))
> > > +		return;
> > > +
> > > +	pv_eoi_set_pending(apic->vcpu);
> > > +}
> > > +
> > apic_sync_pv_eoi_to_guest() is not paired with
> > apic_sync_pv_eoi_from_guest() if event injection is canceled.
> > You can enter guest with stale pv_eoi bit.
> 
> Never. The pv_eoi bit is cleared on each exit.
There will be no exit since there will be no entry. Search for
"goto out" after kvm_lapic_sync_to_vapic().

> It will stay cleared unless we set it here.
> I will add a comment.

--
			Gleb.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCHv5 5/5] kvm: host side for eoi optimization
  2012-05-31 10:15       ` Gleb Natapov
@ 2012-05-31 14:52         ` Michael S. Tsirkin
  0 siblings, 0 replies; 10+ messages in thread
From: Michael S. Tsirkin @ 2012-05-31 14:52 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: x86, kvm, Ingo Molnar, H. Peter Anvin, Avi Kivity,
	Marcelo Tosatti, Linus Torvalds, linux-kernel, Thomas Gleixner

On Thu, May 31, 2012 at 01:15:13PM +0300, Gleb Natapov wrote:
> > > > +	    /* Cache not set: safe but we don't bother. */
> > > > +	    unlikely(apic->isr_cache == -1) ||
> > > > +	    /* Need EOI to update ioapic. */
> > > > +	    unlikely(kvm_ioapic_handles_vector(vcpu->kvm, apic->isr_cache)))
> > > > +		return;
> > > > +
> > > > +	pv_eoi_set_pending(apic->vcpu);
> > > > +}
> > > > +
> > > apic_sync_pv_eoi_to_guest() is not paired with
> > > apic_sync_pv_eoi_from_guest() if event injection is canceled.
> > > You can enter guest with stale pv_eoi bit.
> > 
> > Never. The pv_eoi bit is cleared on each exit.
> There will be no exit since there will be no entry. Search for
> "goto out" after kvm_lapic_sync_to_vapic().

I think you've found a bug, thanks a bunch.
I have fixed it but the x86 guys asked me not to
post more patches until merge window closes :(
So I'll sit on a fix for several days.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2012-05-31 14:52 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-22 14:05 [PATCHv5 0/5] apic: eoi optimization support Michael S. Tsirkin
2012-05-22 14:05 ` [PATCHv5 1/5] kvm_para: guest side for eoi avoidance Michael S. Tsirkin
2012-05-22 14:05 ` [PATCHv5 2/5] x86/bitops: note on __test_and_clear_bit atomicity Michael S. Tsirkin
2012-05-22 14:05 ` [PATCHv5 3/5] kvm: eoi msi documentation Michael S. Tsirkin
2012-05-22 14:05 ` [PATCHv5 4/5] kvm: only sync when attention bits set Michael S. Tsirkin
2012-05-22 14:05 ` [PATCHv5 5/5] kvm: host side for eoi optimization Michael S. Tsirkin
2012-05-31  9:57   ` Gleb Natapov
2012-05-31 10:11     ` Michael S. Tsirkin
2012-05-31 10:15       ` Gleb Natapov
2012-05-31 14:52         ` Michael S. Tsirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.