linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHv4 0/5] apic: eoi optimization support
@ 2012-05-16 11:45 Michael S. Tsirkin
  2012-05-16 11:45 ` [PATCHv4 1/5] kvm_para: guest side for eoi avoidance Michael S. Tsirkin
                   ` (5 more replies)
  0 siblings, 6 replies; 39+ messages in thread
From: Michael S. Tsirkin @ 2012-05-16 11:45 UTC (permalink / raw)
  To: x86, kvm
  Cc: Ingo Molnar, H. Peter Anvin, Avi Kivity, Marcelo Tosatti, gleb,
	Linus Torvalds, linux-kernel

I'm looking at reducing the interrupt overhead for virtualized guests:
some workloads spend a large part of their time processing interrupts.
This patchset supplies infrastructure to reduce the IRQ ack overhead on
x86: the idea is to add an eoi_write callback that we can then optimize
without touching other apic functionality.

The main user is kvm: on kvm, an EOI write from the guest causes an
expensive exit to host; we can avoid this using shared memory as the
last patches in the series demonstrate.

But I also wrote a micro-optimized version for the regular x2apic: this
shaves off a branch and about 9 instructions from EOI when x2apic is
used, and a comment in ack_APIC_irq implies that someone counted
instructions there, at some point.

That's patch 4 in the series - if someone's unhappy with
this patch specifically this patch can be dropped
as nothing else in the series depends on it.

Also included in the patchset are a couple of trivial macro fixes.

The patches work fine on my boxes. See individual patches
for perf tests. You need to patch qemu to whitelist the kvm feature.
qemu patch will be sent as a reply to this mail.

The patches are against 3.4-rc7 - let me know if
I need to rebase.

Please review, and consider for linux 3.5.

Thanks,
MST

Changes from v3:
	Address review comments by Marcelo:
		Multiple cosmetic changes eoi -> pv_eoi
		Added multiple comments
Changes from v2:
	Kill guest with GP on an illegal MSR value
	Add documentation

Changes from v1:
	Add host side patch to series
	Remove kvm-specific __test_and_clear_bit, document
	that x86 one does what we want already
	Clear msr on cpu unplug

Michael S. Tsirkin (5):
  kvm_para: guest side for eoi avoidance
  x86/bitops: note on __test_and_clear_bit atomicity
  kvm: host side for eoi optimization
  kvm: eoi msi documentation
  kvm: only sync when attention bits set

 Documentation/virtual/kvm/msr.txt |   32 +++++++++
 arch/x86/include/asm/bitops.h     |   13 +++-
 arch/x86/include/asm/kvm_host.h   |   12 +++
 arch/x86/include/asm/kvm_para.h   |    7 ++
 arch/x86/kernel/kvm.c             |   51 +++++++++++++-
 arch/x86/kvm/cpuid.c              |    1 +
 arch/x86/kvm/irq.c                |    2 +-
 arch/x86/kvm/lapic.c              |  137 +++++++++++++++++++++++++++++++++++--
 arch/x86/kvm/lapic.h              |    2 +
 arch/x86/kvm/trace.h              |   34 +++++++++
 arch/x86/kvm/x86.c                |    8 ++-
 11 files changed, 287 insertions(+), 12 deletions(-)

-- 
MST

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCHv4 1/5] kvm_para: guest side for eoi avoidance
  2012-05-16 11:45 [PATCHv4 0/5] apic: eoi optimization support Michael S. Tsirkin
@ 2012-05-16 11:45 ` Michael S. Tsirkin
  2012-05-16 11:46 ` [PATCHv4 2/5] x86/bitops: note on __test_and_clear_bit atomicity Michael S. Tsirkin
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 39+ messages in thread
From: Michael S. Tsirkin @ 2012-05-16 11:45 UTC (permalink / raw)
  To: x86, kvm
  Cc: Ingo Molnar, H. Peter Anvin, Avi Kivity, Marcelo Tosatti, gleb,
	Linus Torvalds, linux-kernel

The idea is simple: there's a bit, per APIC, in guest memory,
that tells the guest that it does not need EOI.
Guest tests it using a single est and clear operation - this is
necessary so that host can detect interrupt nesting - and if set, it can
skip the EOI MSR.

I run a simple microbenchmark to show exit reduction
(note: for testing, need to apply follow-up patch
'kvm: host side for eoi optimization' + a qemu patch
 I posted separately, on host):

Before:

Performance counter stats for 'sleep 1s':

            47,357 kvm:kvm_entry                                                [99.98%]
                 0 kvm:kvm_hypercall                                            [99.98%]
                 0 kvm:kvm_hv_hypercall                                         [99.98%]
             5,001 kvm:kvm_pio                                                  [99.98%]
                 0 kvm:kvm_cpuid                                                [99.98%]
            22,124 kvm:kvm_apic                                                 [99.98%]
            49,849 kvm:kvm_exit                                                 [99.98%]
            21,115 kvm:kvm_inj_virq                                             [99.98%]
                 0 kvm:kvm_inj_exception                                        [99.98%]
                 0 kvm:kvm_page_fault                                           [99.98%]
            22,937 kvm:kvm_msr                                                  [99.98%]
                 0 kvm:kvm_cr                                                   [99.98%]
                 0 kvm:kvm_pic_set_irq                                          [99.98%]
                 0 kvm:kvm_apic_ipi                                             [99.98%]
            22,207 kvm:kvm_apic_accept_irq                                      [99.98%]
            22,421 kvm:kvm_eoi                                                  [99.98%]
                 0 kvm:kvm_pv_eoi                                               [99.99%]
                 0 kvm:kvm_nested_vmrun                                         [99.99%]
                 0 kvm:kvm_nested_intercepts                                    [99.99%]
                 0 kvm:kvm_nested_vmexit                                        [99.99%]
                 0 kvm:kvm_nested_vmexit_inject                                    [99.99%]
                 0 kvm:kvm_nested_intr_vmexit                                    [99.99%]
                 0 kvm:kvm_invlpga                                              [99.99%]
                 0 kvm:kvm_skinit                                               [99.99%]
                57 kvm:kvm_emulate_insn                                         [99.99%]
                 0 kvm:vcpu_match_mmio                                          [99.99%]
                 0 kvm:kvm_userspace_exit                                       [99.99%]
                 2 kvm:kvm_set_irq                                              [99.99%]
                 2 kvm:kvm_ioapic_set_irq                                       [99.99%]
            23,609 kvm:kvm_msi_set_irq                                          [99.99%]
                 1 kvm:kvm_ack_irq                                              [99.99%]
               131 kvm:kvm_mmio                                                 [99.99%]
               226 kvm:kvm_fpu                                                  [100.00%]
                 0 kvm:kvm_age_page                                             [100.00%]
                 0 kvm:kvm_try_async_get_page                                    [100.00%]
                 0 kvm:kvm_async_pf_doublefault                                    [100.00%]
                 0 kvm:kvm_async_pf_not_present                                    [100.00%]
                 0 kvm:kvm_async_pf_ready                                       [100.00%]
                 0 kvm:kvm_async_pf_completed

       1.002100578 seconds time elapsed

After:

 Performance counter stats for 'sleep 1s':

            28,354 kvm:kvm_entry                                                [99.98%]
                 0 kvm:kvm_hypercall                                            [99.98%]
                 0 kvm:kvm_hv_hypercall                                         [99.98%]
             1,347 kvm:kvm_pio                                                  [99.98%]
                 0 kvm:kvm_cpuid                                                [99.98%]
             1,931 kvm:kvm_apic                                                 [99.98%]
            29,595 kvm:kvm_exit                                                 [99.98%]
            24,884 kvm:kvm_inj_virq                                             [99.98%]
                 0 kvm:kvm_inj_exception                                        [99.98%]
                 0 kvm:kvm_page_fault                                           [99.98%]
             1,986 kvm:kvm_msr                                                  [99.98%]
                 0 kvm:kvm_cr                                                   [99.98%]
                 0 kvm:kvm_pic_set_irq                                          [99.98%]
                 0 kvm:kvm_apic_ipi                                             [99.99%]
            25,953 kvm:kvm_apic_accept_irq                                      [99.99%]
            26,132 kvm:kvm_eoi                                                  [99.99%]
            26,593 kvm:kvm_pv_eoi                                               [99.99%]
                 0 kvm:kvm_nested_vmrun                                         [99.99%]
                 0 kvm:kvm_nested_intercepts                                    [99.99%]
                 0 kvm:kvm_nested_vmexit                                        [99.99%]
                 0 kvm:kvm_nested_vmexit_inject                                    [99.99%]
                 0 kvm:kvm_nested_intr_vmexit                                    [99.99%]
                 0 kvm:kvm_invlpga                                              [99.99%]
                 0 kvm:kvm_skinit                                               [99.99%]
               284 kvm:kvm_emulate_insn                                         [99.99%]
                68 kvm:vcpu_match_mmio                                          [99.99%]
                68 kvm:kvm_userspace_exit                                       [99.99%]
                 2 kvm:kvm_set_irq                                              [99.99%]
                 2 kvm:kvm_ioapic_set_irq                                       [99.99%]
            28,288 kvm:kvm_msi_set_irq                                          [99.99%]
                 1 kvm:kvm_ack_irq                                              [99.99%]
               131 kvm:kvm_mmio                                                 [100.00%]
               588 kvm:kvm_fpu                                                  [100.00%]
                 0 kvm:kvm_age_page                                             [100.00%]
                 0 kvm:kvm_try_async_get_page                                    [100.00%]
                 0 kvm:kvm_async_pf_doublefault                                    [100.00%]
                 0 kvm:kvm_async_pf_not_present                                    [100.00%]
                 0 kvm:kvm_async_pf_ready                                       [100.00%]
                 0 kvm:kvm_async_pf_completed

       1.002039622 seconds time elapsed

We see that # of exits is almost halved.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 arch/x86/include/asm/bitops.h   |    6 +++-
 arch/x86/include/asm/kvm_para.h |    7 +++++
 arch/x86/kernel/kvm.c           |   51 ++++++++++++++++++++++++++++++++++++--
 3 files changed, 59 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h
index b97596e..c9c70ea 100644
--- a/arch/x86/include/asm/bitops.h
+++ b/arch/x86/include/asm/bitops.h
@@ -26,11 +26,13 @@
 #if __GNUC__ < 4 || (__GNUC__ == 4 && __GNUC_MINOR__ < 1)
 /* Technically wrong, but this avoids compilation errors on some gcc
    versions. */
-#define BITOP_ADDR(x) "=m" (*(volatile long *) (x))
+#define BITOP_ADDR_CONSTRAINT "=m"
 #else
-#define BITOP_ADDR(x) "+m" (*(volatile long *) (x))
+#define BITOP_ADDR_CONSTRAINT "+m"
 #endif
 
+#define BITOP_ADDR(x) BITOP_ADDR_CONSTRAINT (*(volatile long *) (x))
+
 #define ADDR				BITOP_ADDR(addr)
 
 /*
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 734c376..6203ffb 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -22,6 +22,7 @@
 #define KVM_FEATURE_CLOCKSOURCE2        3
 #define KVM_FEATURE_ASYNC_PF		4
 #define KVM_FEATURE_STEAL_TIME		5
+#define KVM_FEATURE_PV_EOI		6
 
 /* The last 8 bits are used to indicate how to interpret the flags field
  * in pvclock structure. If no bits are set, all flags are ignored.
@@ -37,6 +38,7 @@
 #define MSR_KVM_SYSTEM_TIME_NEW 0x4b564d01
 #define MSR_KVM_ASYNC_PF_EN 0x4b564d02
 #define MSR_KVM_STEAL_TIME  0x4b564d03
+#define MSR_KVM_PV_EOI_EN      0x4b564d04
 
 struct kvm_steal_time {
 	__u64 steal;
@@ -89,6 +91,11 @@ struct kvm_vcpu_pv_apf_data {
 	__u32 enabled;
 };
 
+#define KVM_PV_EOI_BIT 0
+#define KVM_PV_EOI_MASK (0x1 << KVM_PV_EOI_BIT)
+#define KVM_PV_EOI_ENABLED KVM_PV_EOI_MASK
+#define KVM_PV_EOI_DISABLED 0x0
+
 #ifdef __KERNEL__
 #include <asm/processor.h>
 
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index e554e5a..85cd6ac 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -39,6 +39,8 @@
 #include <asm/desc.h>
 #include <asm/tlbflush.h>
 #include <asm/idle.h>
+#include <asm/apic.h>
+#include <asm/apicdef.h>
 
 static int kvmapf = 1;
 
@@ -283,6 +285,24 @@ static void kvm_register_steal_time(void)
 		cpu, __pa(st));
 }
 
+/* size alignment is implied but just to make it explicit. */
+static DEFINE_PER_CPU(unsigned long, kvm_apic_eoi) __aligned(2) =
+	KVM_PV_EOI_DISABLED;
+
+static void kvm_guest_apic_eoi_write(u32 reg, u32 val)
+{
+	/**
+	 * This relies on __test_and_clear_bit to modify the memory
+	 * in a way that is atomic with respect to the local CPU.
+	 * The hypervisor only accesses this memory from the local CPU so
+	 * there's no need for lock or memory barriers.
+	 * An optimization barrier is implied in apic write.
+	 */
+	if (__test_and_clear_bit(KVM_PV_EOI_BIT, &__get_cpu_var(kvm_apic_eoi)))
+		return;
+	apic->write(APIC_EOI, APIC_EOI_ACK);
+}
+
 void __cpuinit kvm_guest_cpu_init(void)
 {
 	if (!kvm_para_available())
@@ -300,11 +320,17 @@ void __cpuinit kvm_guest_cpu_init(void)
 		       smp_processor_id());
 	}
 
+	if (kvm_para_has_feature(KVM_FEATURE_PV_EOI)) {
+		__get_cpu_var(kvm_apic_eoi) = 0;
+		wrmsrl(MSR_KVM_PV_EOI_EN, __pa(&__get_cpu_var(kvm_apic_eoi)) |
+		       KVM_MSR_ENABLED);
+	}
+
 	if (has_steal_clock)
 		kvm_register_steal_time();
 }
 
-static void kvm_pv_disable_apf(void *unused)
+static void kvm_pv_disable_apf(void)
 {
 	if (!__get_cpu_var(apf_reason).enabled)
 		return;
@@ -316,11 +342,18 @@ static void kvm_pv_disable_apf(void *unused)
 	       smp_processor_id());
 }
 
+static void kvm_pv_guest_cpu_reboot(void *unused)
+{
+	if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
+		wrmsrl(MSR_KVM_PV_EOI_EN, 0);
+	kvm_pv_disable_apf();
+}
+
 static int kvm_pv_reboot_notify(struct notifier_block *nb,
 				unsigned long code, void *unused)
 {
 	if (code == SYS_RESTART)
-		on_each_cpu(kvm_pv_disable_apf, NULL, 1);
+		on_each_cpu(kvm_pv_guest_cpu_reboot, NULL, 1);
 	return NOTIFY_DONE;
 }
 
@@ -371,7 +404,9 @@ static void __cpuinit kvm_guest_cpu_online(void *dummy)
 static void kvm_guest_cpu_offline(void *dummy)
 {
 	kvm_disable_steal_time();
-	kvm_pv_disable_apf(NULL);
+	if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
+		wrmsrl(MSR_KVM_PV_EOI_EN, 0);
+	kvm_pv_disable_apf();
 	apf_task_wake_all();
 }
 
@@ -424,6 +459,16 @@ void __init kvm_guest_init(void)
 		pv_time_ops.steal_clock = kvm_steal_clock;
 	}
 
+	if (kvm_para_has_feature(KVM_FEATURE_PV_EOI)) {
+		struct apic **drv;
+
+		for (drv = __apicdrivers; drv < __apicdrivers_end; drv++) {
+			/* Should happen once for each apic */
+			WARN_ON((*drv)->eoi_write == kvm_guest_apic_eoi_write);
+			(*drv)->eoi_write = kvm_guest_apic_eoi_write;
+		}
+	}
+
 #ifdef CONFIG_SMP
 	smp_ops.smp_prepare_boot_cpu = kvm_smp_prepare_boot_cpu;
 	register_cpu_notifier(&kvm_cpu_notifier);
-- 
MST


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCHv4 2/5] x86/bitops: note on __test_and_clear_bit atomicity
  2012-05-16 11:45 [PATCHv4 0/5] apic: eoi optimization support Michael S. Tsirkin
  2012-05-16 11:45 ` [PATCHv4 1/5] kvm_para: guest side for eoi avoidance Michael S. Tsirkin
@ 2012-05-16 11:46 ` Michael S. Tsirkin
  2012-05-16 11:46 ` [PATCHv4 3/5] kvm: host side for eoi optimization Michael S. Tsirkin
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 39+ messages in thread
From: Michael S. Tsirkin @ 2012-05-16 11:46 UTC (permalink / raw)
  To: x86, kvm
  Cc: Ingo Molnar, H. Peter Anvin, Avi Kivity, Marcelo Tosatti, gleb,
	Linus Torvalds, linux-kernel

__test_and_clear_bit is actually atomic with respect
to the local CPU. Add a note saying that KVM on x86
relies on this behaviour so people don't accidentaly break it.
Also warn not to rely on this in portable code.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 arch/x86/include/asm/bitops.h |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h
index c9c70ea..86f3a1e 100644
--- a/arch/x86/include/asm/bitops.h
+++ b/arch/x86/include/asm/bitops.h
@@ -264,6 +264,13 @@ static inline int test_and_clear_bit(int nr, volatile unsigned long *addr)
  * This operation is non-atomic and can be reordered.
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
+ *
+ * Note: the operation is performed atomically with respect to
+ * the local CPU, but not other CPUs. Portable code should not
+ * rely on this behaviour.
+ * KVM relies on this behaviour on x86 for modifying memory that is also
+ * accessed from a hypervisor on the same CPU if running in a VM: don't change
+ * this without also updating arch/x86/kernel/kvm.c
  */
 static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
 {
-- 
MST


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCHv4 3/5] kvm: host side for eoi optimization
  2012-05-16 11:45 [PATCHv4 0/5] apic: eoi optimization support Michael S. Tsirkin
  2012-05-16 11:45 ` [PATCHv4 1/5] kvm_para: guest side for eoi avoidance Michael S. Tsirkin
  2012-05-16 11:46 ` [PATCHv4 2/5] x86/bitops: note on __test_and_clear_bit atomicity Michael S. Tsirkin
@ 2012-05-16 11:46 ` Michael S. Tsirkin
  2012-05-16 15:49   ` Marcelo Tosatti
  2012-05-17  9:28   ` Avi Kivity
  2012-05-16 11:46 ` [PATCHv4 4/5] kvm: eoi msi documentation Michael S. Tsirkin
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 39+ messages in thread
From: Michael S. Tsirkin @ 2012-05-16 11:46 UTC (permalink / raw)
  To: x86, kvm
  Cc: Ingo Molnar, H. Peter Anvin, Avi Kivity, Marcelo Tosatti, gleb,
	Linus Torvalds, linux-kernel

Implementation of PV EOI using shared memory.
This reduces the number of exits an interrupt
causes as much as by half.

The idea is simple: there's a bit, per APIC, in guest memory,
that tells the guest that it does not need EOI.
We set it before injecting an interrupt and clear
before injecting a nested one. Guest tests it using
a test and clear operation - this is necessary
so that host can detect interrupt nesting -
and if set, it can skip the EOI MSR.

There's a new MSR to set the address of said register
in guest memory. Otherwise not much changed:
- Guest EOI is not required
- Register is tested & ISR is automatically cleared on exit

For testing results see description of previous patch
'kvm_para: guest side for eoi avoidance'.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |   12 ++++
 arch/x86/kvm/cpuid.c            |    1 +
 arch/x86/kvm/irq.c              |    2 +-
 arch/x86/kvm/lapic.c            |  137 +++++++++++++++++++++++++++++++++++++--
 arch/x86/kvm/lapic.h            |    2 +
 arch/x86/kvm/trace.h            |   34 ++++++++++
 arch/x86/kvm/x86.c              |    5 ++
 7 files changed, 187 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 32297f2..3ded418 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -174,6 +174,13 @@ enum {
 
 /* apic attention bits */
 #define KVM_APIC_CHECK_VAPIC	0
+/*
+ * The following bit is set with PV-EOI, unset on EOI.
+ * We detect PV-EOI changes by guest by comparing
+ * this bit with PV-EOI in guest memory.
+ * See the implementation in apic_update_pv_eoi.
+ */ 
+#define KVM_APIC_PV_EOI_PENDING	1
 
 /*
  * We don't want allocation failures within the mmu code, so we preallocate
@@ -485,6 +492,11 @@ struct kvm_vcpu_arch {
 		u64 length;
 		u64 status;
 	} osvw;
+
+	struct {
+		u64 msr_val;
+		struct gfn_to_hva_cache data;
+	} pv_eoi;
 };
 
 struct kvm_lpage_info {
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index eba8345..a9f155a 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -408,6 +408,7 @@ static int do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 			     (1 << KVM_FEATURE_NOP_IO_DELAY) |
 			     (1 << KVM_FEATURE_CLOCKSOURCE2) |
 			     (1 << KVM_FEATURE_ASYNC_PF) |
+			     (1 << KVM_FEATURE_PV_EOI) |
 			     (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT);
 
 		if (sched_info_on())
diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
index 7e06ba1..07bdfab 100644
--- a/arch/x86/kvm/irq.c
+++ b/arch/x86/kvm/irq.c
@@ -48,7 +48,7 @@ int kvm_cpu_has_interrupt(struct kvm_vcpu *v)
 	if (!irqchip_in_kernel(v->kvm))
 		return v->arch.interrupt.pending;
 
-	if (kvm_apic_has_interrupt(v) == -1) {	/* LAPIC */
+	if (kvm_apic_has_interrupt(v) < 0) {	/* LAPIC */
 		if (kvm_apic_accept_pic_intr(v)) {
 			s = pic_irqchip(v->kvm);	/* PIC */
 			return s->output;
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 93c1574..b4f7013 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -270,6 +270,54 @@ int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq)
 			irq->level, irq->trig_mode);
 }
 
+static int pv_eoi_put_user(struct kvm_vcpu *vcpu, u8 val)
+{
+
+	return kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.pv_eoi.data, &val,
+				      sizeof(val));
+}
+
+static int pv_eoi_get_user(struct kvm_vcpu *vcpu, u8 *val)
+{
+
+	return kvm_read_guest_cached(vcpu->kvm, &vcpu->arch.pv_eoi.data, val,
+				      sizeof(*val));
+}
+
+static inline bool pv_eoi_enabled(struct kvm_vcpu *vcpu)
+{
+	return vcpu->arch.pv_eoi.msr_val & KVM_MSR_ENABLED;
+}
+
+static bool pv_eoi_get_pending(struct kvm_vcpu *vcpu)
+{
+	u8 val;
+	if (pv_eoi_get_user(vcpu, &val) < 0)
+		apic_debug("Can't read EOI MSR value: 0x%llx\n",
+			   (unsigned long long)vcpi->arch.pv_eoi.msr_val);
+	return val & 0x1;
+}
+
+static void pv_eoi_set_pending(struct kvm_vcpu *vcpu)
+{
+	if (pv_eoi_put_user(vcpu, KVM_PV_EOI_ENABLED) < 0) {
+		apic_debug("Can't set EOI MSR value: 0x%llx\n",
+			   (unsigned long long)vcpi->arch.pv_eoi.msr_val);
+		return;
+	}
+	__set_bit(KVM_APIC_PV_EOI_PENDING, &vcpu->arch.apic_attention);
+}
+
+static void pv_eoi_clr_pending(struct kvm_vcpu *vcpu)
+{
+	if (pv_eoi_put_user(vcpu, KVM_PV_EOI_DISABLED) < 0) {
+		apic_debug("Can't clear EOI MSR value: 0x%llx\n",
+			   (unsigned long long)vcpi->arch.pv_eoi.msr_val);
+		return;
+	}
+	__clear_bit(KVM_APIC_PV_EOI_PENDING, &vcpu->arch.apic_attention);
+}
+
 static inline int apic_find_highest_isr(struct kvm_lapic *apic)
 {
 	int result;
@@ -482,16 +530,26 @@ int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1, struct kvm_vcpu *vcpu2)
 	return vcpu1->arch.apic_arb_prio - vcpu2->arch.apic_arb_prio;
 }
 
-static void apic_set_eoi(struct kvm_lapic *apic)
+static int apic_set_eoi(struct kvm_lapic *apic)
 {
 	int vector = apic_find_highest_isr(apic);
+
+	trace_kvm_eoi(apic, vector);
+
 	/*
 	 * Not every write EOI will has corresponding ISR,
 	 * one example is when Kernel check timer on setup_IO_APIC
 	 */
 	if (vector == -1)
-		return;
+		return vector;
 
+	/*
+	 * It's legal for guest to ignore the PV EOI optimization
+	 * and signal EOI by APIC write. If this happens, clear
+	 * PV EOI on guest's behalf.
+	 */
+	if (pv_eoi_enabled(apic->vcpu))
+		pv_eoi_clr_pending(apic->vcpu);
 	apic_clear_vector(vector, apic->regs + APIC_ISR);
 	apic_update_ppr(apic);
 
@@ -505,6 +563,7 @@ static void apic_set_eoi(struct kvm_lapic *apic)
 		kvm_ioapic_update_eoi(apic->vcpu->kvm, vector, trigger_mode);
 	}
 	kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
+	return vector;
 }
 
 static void apic_send_ipi(struct kvm_lapic *apic)
@@ -1085,6 +1144,7 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu)
 	atomic_set(&apic->lapic_timer.pending, 0);
 	if (kvm_vcpu_is_bsp(vcpu))
 		vcpu->arch.apic_base |= MSR_IA32_APICBASE_BSP;
+	vcpu->arch.pv_eoi.msr_val = 0;
 	apic_update_ppr(apic);
 
 	vcpu->arch.apic_arb_prio = 0;
@@ -1211,9 +1271,10 @@ int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu)
 
 	apic_update_ppr(apic);
 	highest_irr = apic_find_highest_irr(apic);
-	if ((highest_irr == -1) ||
-	    ((highest_irr & 0xF0) <= apic_get_reg(apic, APIC_PROCPRI)))
+	if (highest_irr == -1)
 		return -1;
+	if (((highest_irr & 0xF0) <= apic_get_reg(apic, APIC_PROCPRI)))
+		return -2;
 	return highest_irr;
 }
 
@@ -1245,9 +1306,20 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
 	int vector = kvm_apic_has_interrupt(vcpu);
 	struct kvm_lapic *apic = vcpu->arch.apic;
 
-	if (vector == -1)
+	/* Detect interrupt nesting and disable EOI optimization */
+	if (pv_eoi_enabled(vcpu) && vector == -2)
+		pv_eoi_clr_pending(vcpu);
+
+	if (vector < 0)
 		return -1;
 
+	if (pv_eoi_enabled(vcpu)) {
+		if (kvm_ioapic_handles_vector(vcpu->kvm, vector))
+			pv_eoi_clr_pending(vcpu);
+		else
+			pv_eoi_set_pending(vcpu);
+	}
+
 	apic_set_vector(vector, apic->regs + APIC_ISR);
 	apic_update_ppr(apic);
 	apic_clear_irr(vector, apic);
@@ -1262,6 +1334,18 @@ void kvm_apic_post_state_restore(struct kvm_vcpu *vcpu)
 			     MSR_IA32_APICBASE_BASE;
 	kvm_apic_set_version(vcpu);
 
+	if (pv_eoi_enabled(apic->vcpu)) {
+		/*
+		 * Update shadow bit in apic_attention bitmask
+		 * from guest memory.
+		 */
+		if (pv_eoi_get_pending(apic->vcpu))
+			__set_bit(KVM_APIC_PV_EOI_PENDING,
+				  &vcpu->arch.apic_attention);
+		else
+			__clear_bit(KVM_APIC_PV_EOI_PENDING,
+				    &vcpu->arch.apic_attention);
+	}
 	apic_update_ppr(apic);
 	hrtimer_cancel(&apic->lapic_timer.timer);
 	update_divide_count(apic);
@@ -1283,11 +1367,42 @@ void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu)
 		hrtimer_start_expires(timer, HRTIMER_MODE_ABS);
 }
 
+/*
+ * apic_update_pv_eoi - called on vmexit
+ *
+ * Detect whether guest triggered PV EOI since the
+ * last entry. If yes, set EOI on guests's behalf.
+ */
+static void apic_update_pv_eoi(struct kvm_lapic *apic)
+{
+	/*
+	 * PV EOI state is derived from KVM_APIC_PV_EOI_PENDING in host
+	 * and KVM_PV_EOI_ENABLED in guest memory as follows:
+	 *
+	 * KVM_APIC_PV_EOI_PENDING is unset:
+	 * 	-> host disabled PV EOI.
+	 * KVM_APIC_PV_EOI_PENDING is set, KVM_PV_EOI_ENABLED is set:
+	 * 	-> host enabled PV EOI, guest did not execute EOI yet.
+	 * KVM_APIC_PV_EOI_PENDING is set, KVM_PV_EOI_ENABLED is unset:
+	 * 	-> host enabled PV EOI, guest executed EOI.
+	 */
+	int vector;
+	BUG_ON(!pv_eoi_enabled(apic->vcpu));
+	if (pv_eoi_get_pending(apic->vcpu))
+		return;
+	__clear_bit(KVM_APIC_PV_EOI_PENDING, &apic->vcpu->arch.apic_attention);
+	vector = apic_set_eoi(apic);
+	trace_kvm_pv_eoi(apic, vector);
+}
+
 void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu)
 {
 	u32 data;
 	void *vapic;
 
+	if (test_bit(KVM_APIC_PV_EOI_PENDING, &vcpu->arch.apic_attention))
+		apic_update_pv_eoi(vcpu->arch.apic);
+
 	if (!test_bit(KVM_APIC_CHECK_VAPIC, &vcpu->arch.apic_attention))
 		return;
 
@@ -1394,3 +1509,15 @@ int kvm_hv_vapic_msr_read(struct kvm_vcpu *vcpu, u32 reg, u64 *data)
 
 	return 0;
 }
+
+int kvm_lapic_enable_pv_eoi(struct kvm_vcpu *vcpu, u64 data)
+{
+	u64 addr = data & ~KVM_MSR_ENABLED;
+	if (pv_eoi_enabled(vcpu))
+		pv_eoi_clr_pending(vcpu);
+	vcpu->arch.pv_eoi.msr_val = data;
+	if (!pv_eoi_enabled(vcpu))
+		return 0;
+	return kvm_gfn_to_hva_cache_init(vcpu->kvm, &vcpu->arch.pv_eoi.data,
+					 addr);
+}
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 6f4ce25..149ef01 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -60,4 +60,6 @@ static inline bool kvm_hv_vapic_assist_page_enabled(struct kvm_vcpu *vcpu)
 {
 	return vcpu->arch.hv_vapic & HV_X64_MSR_APIC_ASSIST_PAGE_ENABLE;
 }
+
+int kvm_lapic_enable_pv_eoi(struct kvm_vcpu *vcpu, u64 data);
 #endif
diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index 911d264..851914e 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -517,6 +517,40 @@ TRACE_EVENT(kvm_apic_accept_irq,
 		  __entry->coalesced ? " (coalesced)" : "")
 );
 
+TRACE_EVENT(kvm_eoi,
+	    TP_PROTO(struct kvm_lapic *apic, int vector),
+	    TP_ARGS(apic, vector),
+
+	TP_STRUCT__entry(
+		__field(	__u32,		apicid		)
+		__field(	int,		vector		)
+	),
+
+	TP_fast_assign(
+		__entry->apicid		= apic->vcpu->vcpu_id;
+		__entry->vector		= vector;
+	),
+
+	TP_printk("apicid %x vector %d", __entry->apicid, __entry->vector)
+);
+
+TRACE_EVENT(kvm_pv_eoi,
+	    TP_PROTO(struct kvm_lapic *apic, int vector),
+	    TP_ARGS(apic, vector),
+
+	TP_STRUCT__entry(
+		__field(	__u32,		apicid		)
+		__field(	int,		vector		)
+	),
+
+	TP_fast_assign(
+		__entry->apicid		= apic->vcpu->vcpu_id;
+		__entry->vector		= vector;
+	),
+
+	TP_printk("apicid %x vector %d", __entry->apicid, __entry->vector)
+);
+
 /*
  * Tracepoint for nested VMRUN
  */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 185a2b8..4d77af0 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -795,6 +795,7 @@ static u32 msrs_to_save[] = {
 	MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW,
 	HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL,
 	HV_X64_MSR_APIC_ASSIST_PAGE, MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME,
+	MSR_KVM_PV_EOI_EN,
 	MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP,
 	MSR_STAR,
 #ifdef CONFIG_X86_64
@@ -1653,6 +1654,10 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
 		kvm_make_request(KVM_REQ_STEAL_UPDATE, vcpu);
 
 		break;
+	case MSR_KVM_PV_EOI_EN:
+		if (kvm_lapic_enable_pv_eoi(vcpu, data))
+			return 1;
+		break;
 
 	case MSR_IA32_MCG_CTL:
 	case MSR_IA32_MCG_STATUS:
-- 
MST


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCHv4 4/5] kvm: eoi msi documentation
  2012-05-16 11:45 [PATCHv4 0/5] apic: eoi optimization support Michael S. Tsirkin
                   ` (2 preceding siblings ...)
  2012-05-16 11:46 ` [PATCHv4 3/5] kvm: host side for eoi optimization Michael S. Tsirkin
@ 2012-05-16 11:46 ` Michael S. Tsirkin
  2012-05-16 11:46 ` [PATCHv4 5/5] kvm: only sync when attention bits set Michael S. Tsirkin
  2012-05-16 15:41 ` [PATCHv4 0/5] apic: eoi optimization support Michael S. Tsirkin
  5 siblings, 0 replies; 39+ messages in thread
From: Michael S. Tsirkin @ 2012-05-16 11:46 UTC (permalink / raw)
  To: x86, kvm
  Cc: Ingo Molnar, H. Peter Anvin, Avi Kivity, Marcelo Tosatti, gleb,
	Linus Torvalds, linux-kernel

Document the new EOI MSR. Couldn't decide whether this change belongs
conceptually on guest or host side, so a separate patch.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 Documentation/virtual/kvm/msr.txt |   32 ++++++++++++++++++++++++++++++++
 1 files changed, 32 insertions(+), 0 deletions(-)

diff --git a/Documentation/virtual/kvm/msr.txt b/Documentation/virtual/kvm/msr.txt
index 5031780..3481f1b 100644
--- a/Documentation/virtual/kvm/msr.txt
+++ b/Documentation/virtual/kvm/msr.txt
@@ -219,3 +219,35 @@ MSR_KVM_STEAL_TIME: 0x4b564d03
 		steal: the amount of time in which this vCPU did not run, in
 		nanoseconds. Time during which the vcpu is idle, will not be
 		reported as steal time.
+
+MSR_KVM_EOI_EN: 0x4b564d04
+	data: Bit 0 is 1 when PV end of interrupt is enabled on the vcpu; 0
+	when disabled.  When enabled, bits 63-1 hold 2-byte aligned physical address
+	of a 2 byte memory area which must be in guest RAM and must be zeroed.
+
+	The first, least significant bit of 2 byte memory location will be
+	written to by the hypervisor, typically at the time of interrupt
+	injection.  Value of 1 means that guest can skip writing EOI to the apic
+	(using MSR or MMIO write); instead, it is sufficient to signal
+	EOI by clearing the bit in guest memory - this location will
+	later be polled by the hypervisor.
+	Value of 0 means that the EOI write is required.
+
+	It is always safe for the guest to ignore the optimization and perform
+	the APIC EOI write anyway.
+
+	Hypervisor is guaranteed to only modify this least
+	significant bit while in the current VCPU context, this means that
+	guest does not need to use either lock prefix or memory ordering
+	primitives to synchronise with the hypervisor.
+
+	However, hypervisor can set and clear this memory bit at any time:
+	therefore to make sure hypervisor does not interrupt the
+	guest and clear the least significant bit in the memory area
+	in the window between guest testing it to detect
+	whether it can skip EOI apic write and between guest
+	clearing it to signal EOI to the hypervisor,
+	guest must both read the least sgnificant bit in the memory area and
+	clear it using a single CPU instruction, such as test and clear, or
+	compare and exchange.
+
-- 
MST


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCHv4 5/5] kvm: only sync when attention bits set
  2012-05-16 11:45 [PATCHv4 0/5] apic: eoi optimization support Michael S. Tsirkin
                   ` (3 preceding siblings ...)
  2012-05-16 11:46 ` [PATCHv4 4/5] kvm: eoi msi documentation Michael S. Tsirkin
@ 2012-05-16 11:46 ` Michael S. Tsirkin
  2012-05-16 15:41 ` [PATCHv4 0/5] apic: eoi optimization support Michael S. Tsirkin
  5 siblings, 0 replies; 39+ messages in thread
From: Michael S. Tsirkin @ 2012-05-16 11:46 UTC (permalink / raw)
  To: x86, kvm
  Cc: Ingo Molnar, H. Peter Anvin, Avi Kivity, Marcelo Tosatti, gleb,
	Linus Torvalds, linux-kernel

Commit eb0dc6d0368072236dcd086d7fdc17fd3c4574d4 introduced apic
attention bitmask but kvm still syncs lapic unconditionally.
As that commit suggested and in anticipation of adding more attention
bits, only sync lapic if(apic_attention).

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 arch/x86/kvm/x86.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4d77af0..4fa26f2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5375,7 +5375,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 	if (unlikely(vcpu->arch.tsc_always_catchup))
 		kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
 
-	kvm_lapic_sync_from_vapic(vcpu);
+	if (unlikely(vcpu->arch.apic_attention))
+		kvm_lapic_sync_from_vapic(vcpu);
 
 	r = kvm_x86_ops->handle_exit(vcpu);
 out:
-- 
MST

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCHv4 0/5] apic: eoi optimization support
  2012-05-16 11:45 [PATCHv4 0/5] apic: eoi optimization support Michael S. Tsirkin
                   ` (4 preceding siblings ...)
  2012-05-16 11:46 ` [PATCHv4 5/5] kvm: only sync when attention bits set Michael S. Tsirkin
@ 2012-05-16 15:41 ` Michael S. Tsirkin
  5 siblings, 0 replies; 39+ messages in thread
From: Michael S. Tsirkin @ 2012-05-16 15:41 UTC (permalink / raw)
  To: x86, kvm
  Cc: Ingo Molnar, H. Peter Anvin, Avi Kivity, Marcelo Tosatti, gleb,
	Linus Torvalds, linux-kernel

On Wed, May 16, 2012 at 02:45:50PM +0300, Michael S. Tsirkin wrote:
> I'm looking at reducing the interrupt overhead for virtualized guests:
> some workloads spend a large part of their time processing interrupts.
> This patchset supplies infrastructure to reduce the IRQ ack overhead on
> x86: the idea is to add an eoi_write callback that we can then optimize
> without touching other apic functionality.

Ugh. I lost first part of the patchset.
Sorry.
Will send a fixed one now.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv4 3/5] kvm: host side for eoi optimization
  2012-05-16 11:46 ` [PATCHv4 3/5] kvm: host side for eoi optimization Michael S. Tsirkin
@ 2012-05-16 15:49   ` Marcelo Tosatti
  2012-05-16 16:22     ` Michael S. Tsirkin
  2012-05-17  9:28   ` Avi Kivity
  1 sibling, 1 reply; 39+ messages in thread
From: Marcelo Tosatti @ 2012-05-16 15:49 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: x86, kvm, Ingo Molnar, H. Peter Anvin, Avi Kivity, gleb,
	Linus Torvalds, linux-kernel

On Wed, May 16, 2012 at 02:46:12PM +0300, Michael S. Tsirkin wrote:
> Implementation of PV EOI using shared memory.
> This reduces the number of exits an interrupt
> causes as much as by half.
> 
> The idea is simple: there's a bit, per APIC, in guest memory,
> that tells the guest that it does not need EOI.
> We set it before injecting an interrupt and clear
> before injecting a nested one. Guest tests it using
> a test and clear operation - this is necessary
> so that host can detect interrupt nesting -
> and if set, it can skip the EOI MSR.
> 
> There's a new MSR to set the address of said register
> in guest memory. Otherwise not much changed:
> - Guest EOI is not required
> - Register is tested & ISR is automatically cleared on exit
> 
> For testing results see description of previous patch
> 'kvm_para: guest side for eoi avoidance'.
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>  arch/x86/include/asm/kvm_host.h |   12 ++++
>  arch/x86/kvm/cpuid.c            |    1 +
>  arch/x86/kvm/irq.c              |    2 +-
>  arch/x86/kvm/lapic.c            |  137 +++++++++++++++++++++++++++++++++++++--
>  arch/x86/kvm/lapic.h            |    2 +
>  arch/x86/kvm/trace.h            |   34 ++++++++++
>  arch/x86/kvm/x86.c              |    5 ++
>  7 files changed, 187 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 32297f2..3ded418 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -174,6 +174,13 @@ enum {
>  
>  /* apic attention bits */
>  #define KVM_APIC_CHECK_VAPIC	0
> +/*
> + * The following bit is set with PV-EOI, unset on EOI.
> + * We detect PV-EOI changes by guest by comparing
> + * this bit with PV-EOI in guest memory.
> + * See the implementation in apic_update_pv_eoi.
> + */ 
> +#define KVM_APIC_PV_EOI_PENDING	1
>  
>  /*
>   * We don't want allocation failures within the mmu code, so we preallocate
> @@ -485,6 +492,11 @@ struct kvm_vcpu_arch {
>  		u64 length;
>  		u64 status;
>  	} osvw;
> +
> +	struct {
> +		u64 msr_val;
> +		struct gfn_to_hva_cache data;
> +	} pv_eoi;
>  };
>  
>  struct kvm_lpage_info {
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index eba8345..a9f155a 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -408,6 +408,7 @@ static int do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
>  			     (1 << KVM_FEATURE_NOP_IO_DELAY) |
>  			     (1 << KVM_FEATURE_CLOCKSOURCE2) |
>  			     (1 << KVM_FEATURE_ASYNC_PF) |
> +			     (1 << KVM_FEATURE_PV_EOI) |
>  			     (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT);
>  
>  		if (sched_info_on())
> diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
> index 7e06ba1..07bdfab 100644
> --- a/arch/x86/kvm/irq.c
> +++ b/arch/x86/kvm/irq.c
> @@ -48,7 +48,7 @@ int kvm_cpu_has_interrupt(struct kvm_vcpu *v)
>  	if (!irqchip_in_kernel(v->kvm))
>  		return v->arch.interrupt.pending;
>  
> -	if (kvm_apic_has_interrupt(v) == -1) {	/* LAPIC */
> +	if (kvm_apic_has_interrupt(v) < 0) {	/* LAPIC */
>  		if (kvm_apic_accept_pic_intr(v)) {
>  			s = pic_irqchip(v->kvm);	/* PIC */
>  			return s->output;
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 93c1574..b4f7013 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -270,6 +270,54 @@ int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq)
>  			irq->level, irq->trig_mode);
>  }
>  
> +static int pv_eoi_put_user(struct kvm_vcpu *vcpu, u8 val)
> +{
> +
> +	return kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.pv_eoi.data, &val,
> +				      sizeof(val));
> +}
> +
> +static int pv_eoi_get_user(struct kvm_vcpu *vcpu, u8 *val)
> +{
> +
> +	return kvm_read_guest_cached(vcpu->kvm, &vcpu->arch.pv_eoi.data, val,
> +				      sizeof(*val));
> +}
> +
> +static inline bool pv_eoi_enabled(struct kvm_vcpu *vcpu)
> +{
> +	return vcpu->arch.pv_eoi.msr_val & KVM_MSR_ENABLED;
> +}
> +
> +static bool pv_eoi_get_pending(struct kvm_vcpu *vcpu)
> +{
> +	u8 val;
> +	if (pv_eoi_get_user(vcpu, &val) < 0)
> +		apic_debug("Can't read EOI MSR value: 0x%llx\n",
> +			   (unsigned long long)vcpi->arch.pv_eoi.msr_val);
> +	return val & 0x1;
> +}
> +
> +static void pv_eoi_set_pending(struct kvm_vcpu *vcpu)
> +{
> +	if (pv_eoi_put_user(vcpu, KVM_PV_EOI_ENABLED) < 0) {
> +		apic_debug("Can't set EOI MSR value: 0x%llx\n",
> +			   (unsigned long long)vcpi->arch.pv_eoi.msr_val);
> +		return;
> +	}
> +	__set_bit(KVM_APIC_PV_EOI_PENDING, &vcpu->arch.apic_attention);
> +}
> +
> +static void pv_eoi_clr_pending(struct kvm_vcpu *vcpu)
> +{
> +	if (pv_eoi_put_user(vcpu, KVM_PV_EOI_DISABLED) < 0) {
> +		apic_debug("Can't clear EOI MSR value: 0x%llx\n",
> +			   (unsigned long long)vcpi->arch.pv_eoi.msr_val);
> +		return;
> +	}
> +	__clear_bit(KVM_APIC_PV_EOI_PENDING, &vcpu->arch.apic_attention);
> +}
> +
>  static inline int apic_find_highest_isr(struct kvm_lapic *apic)
>  {
>  	int result;
> @@ -482,16 +530,26 @@ int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1, struct kvm_vcpu *vcpu2)
>  	return vcpu1->arch.apic_arb_prio - vcpu2->arch.apic_arb_prio;
>  }
>  
> -static void apic_set_eoi(struct kvm_lapic *apic)
> +static int apic_set_eoi(struct kvm_lapic *apic)
>  {
>  	int vector = apic_find_highest_isr(apic);
> +
> +	trace_kvm_eoi(apic, vector);
> +
>  	/*
>  	 * Not every write EOI will has corresponding ISR,
>  	 * one example is when Kernel check timer on setup_IO_APIC
>  	 */
>  	if (vector == -1)
> -		return;
> +		return vector;
>  
> +	/*
> +	 * It's legal for guest to ignore the PV EOI optimization
> +	 * and signal EOI by APIC write. If this happens, clear
> +	 * PV EOI on guest's behalf.
> +	 */
> +	if (pv_eoi_enabled(apic->vcpu))
> +		pv_eoi_clr_pending(apic->vcpu);


>  	apic_clear_vector(vector, apic->regs + APIC_ISR);
>  	apic_update_ppr(apic);
>  
> @@ -505,6 +563,7 @@ static void apic_set_eoi(struct kvm_lapic *apic)
>  		kvm_ioapic_update_eoi(apic->vcpu->kvm, vector, trigger_mode);
>  	}
>  	kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
> +	return vector;
>  }
>  
>  static void apic_send_ipi(struct kvm_lapic *apic)
> @@ -1085,6 +1144,7 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu)
>  	atomic_set(&apic->lapic_timer.pending, 0);
>  	if (kvm_vcpu_is_bsp(vcpu))
>  		vcpu->arch.apic_base |= MSR_IA32_APICBASE_BSP;
> +	vcpu->arch.pv_eoi.msr_val = 0;
>  	apic_update_ppr(apic);
>  
>  	vcpu->arch.apic_arb_prio = 0;
> @@ -1211,9 +1271,10 @@ int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu)
>  
>  	apic_update_ppr(apic);
>  	highest_irr = apic_find_highest_irr(apic);
> -	if ((highest_irr == -1) ||
> -	    ((highest_irr & 0xF0) <= apic_get_reg(apic, APIC_PROCPRI)))
> +	if (highest_irr == -1)
>  		return -1;
> +	if (((highest_irr & 0xF0) <= apic_get_reg(apic, APIC_PROCPRI)))
> +		return -2;
>  	return highest_irr;
>  }
>  
> @@ -1245,9 +1306,20 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
>  	int vector = kvm_apic_has_interrupt(vcpu);
>  	struct kvm_lapic *apic = vcpu->arch.apic;
>  
> -	if (vector == -1)
> +	/* Detect interrupt nesting and disable EOI optimization */
> +	if (pv_eoi_enabled(vcpu) && vector == -2)
> +		pv_eoi_clr_pending(vcpu);
> +
> +	if (vector < 0)

With interrupt window exiting, the guest will exit:

- as soon as it sets RFLAGS.IF=1 and there is any 
interrupt pending in IRR.
- any new interrupt is set in IRR will kick vcpu
out of guest mode and recalculate interrupt-window-exiting.

Doesnt this make this bit unnecessary ?



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv4 3/5] kvm: host side for eoi optimization
  2012-05-16 15:49   ` Marcelo Tosatti
@ 2012-05-16 16:22     ` Michael S. Tsirkin
  2012-05-16 16:32       ` Gleb Natapov
  2012-05-16 17:20       ` Marcelo Tosatti
  0 siblings, 2 replies; 39+ messages in thread
From: Michael S. Tsirkin @ 2012-05-16 16:22 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: x86, kvm, Ingo Molnar, H. Peter Anvin, Avi Kivity, gleb,
	Linus Torvalds, linux-kernel

On Wed, May 16, 2012 at 12:49:40PM -0300, Marcelo Tosatti wrote:
> > @@ -1245,9 +1306,20 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
> >  	int vector = kvm_apic_has_interrupt(vcpu);
> >  	struct kvm_lapic *apic = vcpu->arch.apic;
> >  
> > -	if (vector == -1)
> > +	/* Detect interrupt nesting and disable EOI optimization */
> > +	if (pv_eoi_enabled(vcpu) && vector == -2)
> > +		pv_eoi_clr_pending(vcpu);
> > +
> > +	if (vector < 0)
> 
> With interrupt window exiting, the guest will exit:
> 
> - as soon as it sets RFLAGS.IF=1 and there is any 
> interrupt pending in IRR.
> - any new interrupt is set in IRR will kick vcpu
> out of guest mode and recalculate interrupt-window-exiting.
> 
> Doesnt this make this bit unnecessary ?

Looks like we could cut it out.  But I'm not sure how architectural it is
that we exit on interrupt window.
I guess there are reasons to exit on interrupt window but
isn't it better to make the feature independent of it?

This almost never happens in my testing anyway, so
however we handle it is unlikely to affect performance.

-- 
MST

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv4 3/5] kvm: host side for eoi optimization
  2012-05-16 16:22     ` Michael S. Tsirkin
@ 2012-05-16 16:32       ` Gleb Natapov
  2012-05-16 16:50         ` Michael S. Tsirkin
  2012-05-16 17:20       ` Marcelo Tosatti
  1 sibling, 1 reply; 39+ messages in thread
From: Gleb Natapov @ 2012-05-16 16:32 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Marcelo Tosatti, x86, kvm, Ingo Molnar, H. Peter Anvin,
	Avi Kivity, Linus Torvalds, linux-kernel

On Wed, May 16, 2012 at 07:22:47PM +0300, Michael S. Tsirkin wrote:
> On Wed, May 16, 2012 at 12:49:40PM -0300, Marcelo Tosatti wrote:
> > > @@ -1245,9 +1306,20 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
> > >  	int vector = kvm_apic_has_interrupt(vcpu);
> > >  	struct kvm_lapic *apic = vcpu->arch.apic;
> > >  
> > > -	if (vector == -1)
> > > +	/* Detect interrupt nesting and disable EOI optimization */
> > > +	if (pv_eoi_enabled(vcpu) && vector == -2)
> > > +		pv_eoi_clr_pending(vcpu);
> > > +
> > > +	if (vector < 0)
> > 
> > With interrupt window exiting, the guest will exit:
> > 
> > - as soon as it sets RFLAGS.IF=1 and there is any 
> > interrupt pending in IRR.
> > - any new interrupt is set in IRR will kick vcpu
> > out of guest mode and recalculate interrupt-window-exiting.
> > 
> > Doesnt this make this bit unnecessary ?
> 
> Looks like we could cut it out.  But I'm not sure how architectural it is
> that we exit on interrupt window.
We request exit on interrupt window only if there is pending irq that
can be delivered on a guest entry.

> I guess there are reasons to exit on interrupt window but
> isn't it better to make the feature independent of it?
> 
> This almost never happens in my testing anyway, so
> however we handle it is unlikely to affect performance.
> 
> -- 
> MST

--
			Gleb.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv4 3/5] kvm: host side for eoi optimization
  2012-05-16 16:32       ` Gleb Natapov
@ 2012-05-16 16:50         ` Michael S. Tsirkin
  2012-05-16 17:04           ` Marcelo Tosatti
  0 siblings, 1 reply; 39+ messages in thread
From: Michael S. Tsirkin @ 2012-05-16 16:50 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Marcelo Tosatti, x86, kvm, Ingo Molnar, H. Peter Anvin,
	Avi Kivity, Linus Torvalds, linux-kernel

On Wed, May 16, 2012 at 07:32:06PM +0300, Gleb Natapov wrote:
> On Wed, May 16, 2012 at 07:22:47PM +0300, Michael S. Tsirkin wrote:
> > On Wed, May 16, 2012 at 12:49:40PM -0300, Marcelo Tosatti wrote:
> > > > @@ -1245,9 +1306,20 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
> > > >  	int vector = kvm_apic_has_interrupt(vcpu);
> > > >  	struct kvm_lapic *apic = vcpu->arch.apic;
> > > >  
> > > > -	if (vector == -1)
> > > > +	/* Detect interrupt nesting and disable EOI optimization */
> > > > +	if (pv_eoi_enabled(vcpu) && vector == -2)
> > > > +		pv_eoi_clr_pending(vcpu);
> > > > +
> > > > +	if (vector < 0)
> > > 
> > > With interrupt window exiting, the guest will exit:
> > > 
> > > - as soon as it sets RFLAGS.IF=1 and there is any 
> > > interrupt pending in IRR.
> > > - any new interrupt is set in IRR will kick vcpu
> > > out of guest mode and recalculate interrupt-window-exiting.
> > > 
> > > Doesnt this make this bit unnecessary ?
> > 
> > Looks like we could cut it out.  But I'm not sure how architectural it is
> > that we exit on interrupt window.
> We request exit on interrupt window only if there is pending irq that
> can be delivered on a guest entry.

Aha. If so what Marcelo proposed won't work I think: if we inject A then B
which is lower priority, we need an exit on EOI, we can't inject
immediately.

> > I guess there are reasons to exit on interrupt window but
> > isn't it better to make the feature independent of it?
> > 
> > This almost never happens in my testing anyway, so
> > however we handle it is unlikely to affect performance.
> > 
> > -- 
> > MST
> 
> --
> 			Gleb.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv4 3/5] kvm: host side for eoi optimization
  2012-05-16 16:50         ` Michael S. Tsirkin
@ 2012-05-16 17:04           ` Marcelo Tosatti
  2012-05-16 17:21             ` Gleb Natapov
  0 siblings, 1 reply; 39+ messages in thread
From: Marcelo Tosatti @ 2012-05-16 17:04 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Gleb Natapov, x86, kvm, Ingo Molnar, H. Peter Anvin, Avi Kivity,
	Linus Torvalds, linux-kernel

On Wed, May 16, 2012 at 07:50:48PM +0300, Michael S. Tsirkin wrote:
> On Wed, May 16, 2012 at 07:32:06PM +0300, Gleb Natapov wrote:
> > On Wed, May 16, 2012 at 07:22:47PM +0300, Michael S. Tsirkin wrote:
> > > On Wed, May 16, 2012 at 12:49:40PM -0300, Marcelo Tosatti wrote:
> > > > > @@ -1245,9 +1306,20 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
> > > > >  	int vector = kvm_apic_has_interrupt(vcpu);
> > > > >  	struct kvm_lapic *apic = vcpu->arch.apic;
> > > > >  
> > > > > -	if (vector == -1)
> > > > > +	/* Detect interrupt nesting and disable EOI optimization */
> > > > > +	if (pv_eoi_enabled(vcpu) && vector == -2)
> > > > > +		pv_eoi_clr_pending(vcpu);
> > > > > +
> > > > > +	if (vector < 0)
> > > > 
> > > > With interrupt window exiting, the guest will exit:
> > > > 
> > > > - as soon as it sets RFLAGS.IF=1 and there is any 
> > > > interrupt pending in IRR.
> > > > - any new interrupt is set in IRR will kick vcpu
> > > > out of guest mode and recalculate interrupt-window-exiting.
> > > > 
> > > > Doesnt this make this bit unnecessary ?
> > > 
> > > Looks like we could cut it out.  But I'm not sure how architectural it is
> > > that we exit on interrupt window.
> > We request exit on interrupt window only if there is pending irq that
> > can be delivered on a guest entry.
> 
> Aha. If so what Marcelo proposed won't work I think: if we inject A then B
> which is lower priority, we need an exit on EOI, we can't inject
> immediately.

Please describe the scenario clearly, i can't see the problem.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv4 3/5] kvm: host side for eoi optimization
  2012-05-16 16:22     ` Michael S. Tsirkin
  2012-05-16 16:32       ` Gleb Natapov
@ 2012-05-16 17:20       ` Marcelo Tosatti
  2012-05-16 17:58         ` Michael S. Tsirkin
  1 sibling, 1 reply; 39+ messages in thread
From: Marcelo Tosatti @ 2012-05-16 17:20 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: x86, kvm, Ingo Molnar, H. Peter Anvin, Avi Kivity, gleb,
	Linus Torvalds, linux-kernel

On Wed, May 16, 2012 at 07:22:47PM +0300, Michael S. Tsirkin wrote:
> On Wed, May 16, 2012 at 12:49:40PM -0300, Marcelo Tosatti wrote:
> > > @@ -1245,9 +1306,20 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
> > >  	int vector = kvm_apic_has_interrupt(vcpu);
> > >  	struct kvm_lapic *apic = vcpu->arch.apic;
> > >  
> > > -	if (vector == -1)
> > > +	/* Detect interrupt nesting and disable EOI optimization */
> > > +	if (pv_eoi_enabled(vcpu) && vector == -2)
> > > +		pv_eoi_clr_pending(vcpu);
> > > +
> > > +	if (vector < 0)
> > 
> > With interrupt window exiting, the guest will exit:
> > 
> > - as soon as it sets RFLAGS.IF=1 and there is any 
> > interrupt pending in IRR.
> > - any new interrupt is set in IRR will kick vcpu
> > out of guest mode and recalculate interrupt-window-exiting.
> > 
> > Doesnt this make this bit unnecessary ?
> 
> Looks like we could cut it out.  But I'm not sure how architectural it is
> that we exit on interrupt window.
> I guess there are reasons to exit on interrupt window but
> isn't it better to make the feature independent of it?

Hum... not sure. Is it helpful for the Hyper-V interface?

> This almost never happens in my testing anyway, so
> however we handle it is unlikely to affect performance.

It decreases the amount of state that must be maintained.

BTW there is a bug covered by interrupt window exiting:

vcpu0                               host
- irr 5 set
- isr 5 set, irr 5 cleared
- eoi_skip bit not set, 
no other bit set in irr.
- enter guest

                                    irr 4 set
                                    kick vcpu0 out of guest mode

- eoi pending bit not set
  (previous interrupt injection 
   still pending)
- skip eoi

If it were not for interrupt window exiting, this would 
inject vector 4 on an unrelated exit who knows how long 
in the future.

Also note optimization depends on the fact that the host 
kicks vcpu out unconditionally (so it is dependent on 
certain kvm implementation details).





^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv4 3/5] kvm: host side for eoi optimization
  2012-05-16 17:04           ` Marcelo Tosatti
@ 2012-05-16 17:21             ` Gleb Natapov
  2012-05-16 17:23               ` Marcelo Tosatti
  0 siblings, 1 reply; 39+ messages in thread
From: Gleb Natapov @ 2012-05-16 17:21 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Michael S. Tsirkin, x86, kvm, Ingo Molnar, H. Peter Anvin,
	Avi Kivity, Linus Torvalds, linux-kernel

On Wed, May 16, 2012 at 02:04:27PM -0300, Marcelo Tosatti wrote:
> On Wed, May 16, 2012 at 07:50:48PM +0300, Michael S. Tsirkin wrote:
> > On Wed, May 16, 2012 at 07:32:06PM +0300, Gleb Natapov wrote:
> > > On Wed, May 16, 2012 at 07:22:47PM +0300, Michael S. Tsirkin wrote:
> > > > On Wed, May 16, 2012 at 12:49:40PM -0300, Marcelo Tosatti wrote:
> > > > > > @@ -1245,9 +1306,20 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
> > > > > >  	int vector = kvm_apic_has_interrupt(vcpu);
> > > > > >  	struct kvm_lapic *apic = vcpu->arch.apic;
> > > > > >  
> > > > > > -	if (vector == -1)
> > > > > > +	/* Detect interrupt nesting and disable EOI optimization */
> > > > > > +	if (pv_eoi_enabled(vcpu) && vector == -2)
> > > > > > +		pv_eoi_clr_pending(vcpu);
> > > > > > +
> > > > > > +	if (vector < 0)
> > > > > 
> > > > > With interrupt window exiting, the guest will exit:
> > > > > 
> > > > > - as soon as it sets RFLAGS.IF=1 and there is any 
> > > > > interrupt pending in IRR.
> > > > > - any new interrupt is set in IRR will kick vcpu
> > > > > out of guest mode and recalculate interrupt-window-exiting.
> > > > > 
> > > > > Doesnt this make this bit unnecessary ?
> > > > 
> > > > Looks like we could cut it out.  But I'm not sure how architectural it is
> > > > that we exit on interrupt window.
> > > We request exit on interrupt window only if there is pending irq that
> > > can be delivered on a guest entry.
> > 
> > Aha. If so what Marcelo proposed won't work I think: if we inject A then B
> > which is lower priority, we need an exit on EOI, we can't inject
> > immediately.
> 
> Please describe the scenario clearly, i can't see the problem.

During vcpu entry there are two IRRs set 100 and 200. 200 is injected,
but irq window is not requested because 100 can't be injected, During
EOI exit 100 is injected.

--
			Gleb.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv4 3/5] kvm: host side for eoi optimization
  2012-05-16 17:21             ` Gleb Natapov
@ 2012-05-16 17:23               ` Marcelo Tosatti
  2012-05-16 17:34                 ` Gleb Natapov
  2012-05-16 17:53                 ` Michael S. Tsirkin
  0 siblings, 2 replies; 39+ messages in thread
From: Marcelo Tosatti @ 2012-05-16 17:23 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Michael S. Tsirkin, x86, kvm, Ingo Molnar, H. Peter Anvin,
	Avi Kivity, Linus Torvalds, linux-kernel

On Wed, May 16, 2012 at 08:21:55PM +0300, Gleb Natapov wrote:
> On Wed, May 16, 2012 at 02:04:27PM -0300, Marcelo Tosatti wrote:
> > On Wed, May 16, 2012 at 07:50:48PM +0300, Michael S. Tsirkin wrote:
> > > On Wed, May 16, 2012 at 07:32:06PM +0300, Gleb Natapov wrote:
> > > > On Wed, May 16, 2012 at 07:22:47PM +0300, Michael S. Tsirkin wrote:
> > > > > On Wed, May 16, 2012 at 12:49:40PM -0300, Marcelo Tosatti wrote:
> > > > > > > @@ -1245,9 +1306,20 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
> > > > > > >  	int vector = kvm_apic_has_interrupt(vcpu);
> > > > > > >  	struct kvm_lapic *apic = vcpu->arch.apic;
> > > > > > >  
> > > > > > > -	if (vector == -1)
> > > > > > > +	/* Detect interrupt nesting and disable EOI optimization */
> > > > > > > +	if (pv_eoi_enabled(vcpu) && vector == -2)
> > > > > > > +		pv_eoi_clr_pending(vcpu);
> > > > > > > +
> > > > > > > +	if (vector < 0)
> > > > > > 
> > > > > > With interrupt window exiting, the guest will exit:
> > > > > > 
> > > > > > - as soon as it sets RFLAGS.IF=1 and there is any 
> > > > > > interrupt pending in IRR.
> > > > > > - any new interrupt is set in IRR will kick vcpu
> > > > > > out of guest mode and recalculate interrupt-window-exiting.
> > > > > > 
> > > > > > Doesnt this make this bit unnecessary ?
> > > > > 
> > > > > Looks like we could cut it out.  But I'm not sure how architectural it is
> > > > > that we exit on interrupt window.
> > > > We request exit on interrupt window only if there is pending irq that
> > > > can be delivered on a guest entry.
> > > 
> > > Aha. If so what Marcelo proposed won't work I think: if we inject A then B
> > > which is lower priority, we need an exit on EOI, we can't inject
> > > immediately.
> > 
> > Please describe the scenario clearly, i can't see the problem.
> 
> During vcpu entry there are two IRRs set 100 and 200. 200 is injected,
> but irq window is not requested because 100 can't be injected, During
> EOI exit 100 is injected.

interrupt window exiting is always requested if IRR is pending. Except 
if NMI window is requested (which has higher priority).

What am i missing here?


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv4 3/5] kvm: host side for eoi optimization
  2012-05-16 17:23               ` Marcelo Tosatti
@ 2012-05-16 17:34                 ` Gleb Natapov
  2012-05-16 17:40                   ` Marcelo Tosatti
  2012-05-16 17:53                 ` Michael S. Tsirkin
  1 sibling, 1 reply; 39+ messages in thread
From: Gleb Natapov @ 2012-05-16 17:34 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Michael S. Tsirkin, x86, kvm, Ingo Molnar, H. Peter Anvin,
	Avi Kivity, Linus Torvalds, linux-kernel

On Wed, May 16, 2012 at 02:23:45PM -0300, Marcelo Tosatti wrote:
> On Wed, May 16, 2012 at 08:21:55PM +0300, Gleb Natapov wrote:
> > On Wed, May 16, 2012 at 02:04:27PM -0300, Marcelo Tosatti wrote:
> > > On Wed, May 16, 2012 at 07:50:48PM +0300, Michael S. Tsirkin wrote:
> > > > On Wed, May 16, 2012 at 07:32:06PM +0300, Gleb Natapov wrote:
> > > > > On Wed, May 16, 2012 at 07:22:47PM +0300, Michael S. Tsirkin wrote:
> > > > > > On Wed, May 16, 2012 at 12:49:40PM -0300, Marcelo Tosatti wrote:
> > > > > > > > @@ -1245,9 +1306,20 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
> > > > > > > >  	int vector = kvm_apic_has_interrupt(vcpu);
> > > > > > > >  	struct kvm_lapic *apic = vcpu->arch.apic;
> > > > > > > >  
> > > > > > > > -	if (vector == -1)
> > > > > > > > +	/* Detect interrupt nesting and disable EOI optimization */
> > > > > > > > +	if (pv_eoi_enabled(vcpu) && vector == -2)
> > > > > > > > +		pv_eoi_clr_pending(vcpu);
> > > > > > > > +
> > > > > > > > +	if (vector < 0)
> > > > > > > 
> > > > > > > With interrupt window exiting, the guest will exit:
> > > > > > > 
> > > > > > > - as soon as it sets RFLAGS.IF=1 and there is any 
> > > > > > > interrupt pending in IRR.
> > > > > > > - any new interrupt is set in IRR will kick vcpu
> > > > > > > out of guest mode and recalculate interrupt-window-exiting.
> > > > > > > 
> > > > > > > Doesnt this make this bit unnecessary ?
> > > > > > 
> > > > > > Looks like we could cut it out.  But I'm not sure how architectural it is
> > > > > > that we exit on interrupt window.
> > > > > We request exit on interrupt window only if there is pending irq that
> > > > > can be delivered on a guest entry.
> > > > 
> > > > Aha. If so what Marcelo proposed won't work I think: if we inject A then B
> > > > which is lower priority, we need an exit on EOI, we can't inject
> > > > immediately.
> > > 
> > > Please describe the scenario clearly, i can't see the problem.
> > 
> > During vcpu entry there are two IRRs set 100 and 200. 200 is injected,
> > but irq window is not requested because 100 can't be injected, During
> > EOI exit 100 is injected.
> 
> interrupt window exiting is always requested if IRR is pending. Except 
> if NMI window is requested (which has higher priority).
> 
This is where we enable irq window. We do it only if there is interrupt
pending:
                if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
                        kvm_x86_ops->enable_irq_window(vcpu);

kvm_cpu_has_interrupt() checks apic by calling kvm_apic_has_interrupt() 


kvm_apic_has_interrupt():
        apic_update_ppr(apic);
        highest_irr = apic_find_highest_irr(apic);
        if ((highest_irr == -1) ||
            ((highest_irr & 0xF0) <= apic_get_reg(apic, APIC_PROCPRI)))
                return -1;
And above checks IRR priority, so we can have IRR set and do not enable
irq window exit.

> What am i missing here?

--
			Gleb.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv4 3/5] kvm: host side for eoi optimization
  2012-05-16 17:34                 ` Gleb Natapov
@ 2012-05-16 17:40                   ` Marcelo Tosatti
  2012-05-16 17:48                     ` Gleb Natapov
  0 siblings, 1 reply; 39+ messages in thread
From: Marcelo Tosatti @ 2012-05-16 17:40 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Michael S. Tsirkin, x86, kvm, Ingo Molnar, H. Peter Anvin,
	Avi Kivity, Linus Torvalds, linux-kernel

On Wed, May 16, 2012 at 08:34:23PM +0300, Gleb Natapov wrote:
> On Wed, May 16, 2012 at 02:23:45PM -0300, Marcelo Tosatti wrote:
> > On Wed, May 16, 2012 at 08:21:55PM +0300, Gleb Natapov wrote:
> > > On Wed, May 16, 2012 at 02:04:27PM -0300, Marcelo Tosatti wrote:
> > > > On Wed, May 16, 2012 at 07:50:48PM +0300, Michael S. Tsirkin wrote:
> > > > > On Wed, May 16, 2012 at 07:32:06PM +0300, Gleb Natapov wrote:
> > > > > > On Wed, May 16, 2012 at 07:22:47PM +0300, Michael S. Tsirkin wrote:
> > > > > > > On Wed, May 16, 2012 at 12:49:40PM -0300, Marcelo Tosatti wrote:
> > > > > > > > > @@ -1245,9 +1306,20 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
> > > > > > > > >  	int vector = kvm_apic_has_interrupt(vcpu);
> > > > > > > > >  	struct kvm_lapic *apic = vcpu->arch.apic;
> > > > > > > > >  
> > > > > > > > > -	if (vector == -1)
> > > > > > > > > +	/* Detect interrupt nesting and disable EOI optimization */
> > > > > > > > > +	if (pv_eoi_enabled(vcpu) && vector == -2)
> > > > > > > > > +		pv_eoi_clr_pending(vcpu);
> > > > > > > > > +
> > > > > > > > > +	if (vector < 0)
> > > > > > > > 
> > > > > > > > With interrupt window exiting, the guest will exit:
> > > > > > > > 
> > > > > > > > - as soon as it sets RFLAGS.IF=1 and there is any 
> > > > > > > > interrupt pending in IRR.
> > > > > > > > - any new interrupt is set in IRR will kick vcpu
> > > > > > > > out of guest mode and recalculate interrupt-window-exiting.
> > > > > > > > 
> > > > > > > > Doesnt this make this bit unnecessary ?
> > > > > > > 
> > > > > > > Looks like we could cut it out.  But I'm not sure how architectural it is
> > > > > > > that we exit on interrupt window.
> > > > > > We request exit on interrupt window only if there is pending irq that
> > > > > > can be delivered on a guest entry.
> > > > > 
> > > > > Aha. If so what Marcelo proposed won't work I think: if we inject A then B
> > > > > which is lower priority, we need an exit on EOI, we can't inject
> > > > > immediately.
> > > > 
> > > > Please describe the scenario clearly, i can't see the problem.
> > > 
> > > During vcpu entry there are two IRRs set 100 and 200. 200 is injected,
> > > but irq window is not requested because 100 can't be injected, During
> > > EOI exit 100 is injected.
> > 
> > interrupt window exiting is always requested if IRR is pending. Except 
> > if NMI window is requested (which has higher priority).
> > 
> This is where we enable irq window. We do it only if there is interrupt
> pending:
>                 if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
>                         kvm_x86_ops->enable_irq_window(vcpu);
> 
> kvm_cpu_has_interrupt() checks apic by calling kvm_apic_has_interrupt() 
> 
> 
> kvm_apic_has_interrupt():
>         apic_update_ppr(apic);
>         highest_irr = apic_find_highest_irr(apic);
>         if ((highest_irr == -1) ||
>             ((highest_irr & 0xF0) <= apic_get_reg(apic, APIC_PROCPRI)))
>                 return -1;
> And above checks IRR priority, so we can have IRR set and do not enable
> irq window exit.

Right, but then you cannot inject interrupt anyway so EOI is not
necessary. Instead TPR-below-threshold trap is set which handles
that case. No?




^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv4 3/5] kvm: host side for eoi optimization
  2012-05-16 17:40                   ` Marcelo Tosatti
@ 2012-05-16 17:48                     ` Gleb Natapov
  0 siblings, 0 replies; 39+ messages in thread
From: Gleb Natapov @ 2012-05-16 17:48 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Michael S. Tsirkin, x86, kvm, Ingo Molnar, H. Peter Anvin,
	Avi Kivity, Linus Torvalds, linux-kernel

On Wed, May 16, 2012 at 02:40:22PM -0300, Marcelo Tosatti wrote:
> On Wed, May 16, 2012 at 08:34:23PM +0300, Gleb Natapov wrote:
> > On Wed, May 16, 2012 at 02:23:45PM -0300, Marcelo Tosatti wrote:
> > > On Wed, May 16, 2012 at 08:21:55PM +0300, Gleb Natapov wrote:
> > > > On Wed, May 16, 2012 at 02:04:27PM -0300, Marcelo Tosatti wrote:
> > > > > On Wed, May 16, 2012 at 07:50:48PM +0300, Michael S. Tsirkin wrote:
> > > > > > On Wed, May 16, 2012 at 07:32:06PM +0300, Gleb Natapov wrote:
> > > > > > > On Wed, May 16, 2012 at 07:22:47PM +0300, Michael S. Tsirkin wrote:
> > > > > > > > On Wed, May 16, 2012 at 12:49:40PM -0300, Marcelo Tosatti wrote:
> > > > > > > > > > @@ -1245,9 +1306,20 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
> > > > > > > > > >  	int vector = kvm_apic_has_interrupt(vcpu);
> > > > > > > > > >  	struct kvm_lapic *apic = vcpu->arch.apic;
> > > > > > > > > >  
> > > > > > > > > > -	if (vector == -1)
> > > > > > > > > > +	/* Detect interrupt nesting and disable EOI optimization */
> > > > > > > > > > +	if (pv_eoi_enabled(vcpu) && vector == -2)
> > > > > > > > > > +		pv_eoi_clr_pending(vcpu);
> > > > > > > > > > +
> > > > > > > > > > +	if (vector < 0)
> > > > > > > > > 
> > > > > > > > > With interrupt window exiting, the guest will exit:
> > > > > > > > > 
> > > > > > > > > - as soon as it sets RFLAGS.IF=1 and there is any 
> > > > > > > > > interrupt pending in IRR.
> > > > > > > > > - any new interrupt is set in IRR will kick vcpu
> > > > > > > > > out of guest mode and recalculate interrupt-window-exiting.
> > > > > > > > > 
> > > > > > > > > Doesnt this make this bit unnecessary ?
> > > > > > > > 
> > > > > > > > Looks like we could cut it out.  But I'm not sure how architectural it is
> > > > > > > > that we exit on interrupt window.
> > > > > > > We request exit on interrupt window only if there is pending irq that
> > > > > > > can be delivered on a guest entry.
> > > > > > 
> > > > > > Aha. If so what Marcelo proposed won't work I think: if we inject A then B
> > > > > > which is lower priority, we need an exit on EOI, we can't inject
> > > > > > immediately.
> > > > > 
> > > > > Please describe the scenario clearly, i can't see the problem.
> > > > 
> > > > During vcpu entry there are two IRRs set 100 and 200. 200 is injected,
> > > > but irq window is not requested because 100 can't be injected, During
> > > > EOI exit 100 is injected.
> > > 
> > > interrupt window exiting is always requested if IRR is pending. Except 
> > > if NMI window is requested (which has higher priority).
> > > 
> > This is where we enable irq window. We do it only if there is interrupt
> > pending:
> >                 if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
> >                         kvm_x86_ops->enable_irq_window(vcpu);
> > 
> > kvm_cpu_has_interrupt() checks apic by calling kvm_apic_has_interrupt() 
> > 
> > 
> > kvm_apic_has_interrupt():
> >         apic_update_ppr(apic);
> >         highest_irr = apic_find_highest_irr(apic);
> >         if ((highest_irr == -1) ||
> >             ((highest_irr & 0xF0) <= apic_get_reg(apic, APIC_PROCPRI)))
> >                 return -1;
> > And above checks IRR priority, so we can have IRR set and do not enable
> > irq window exit.
> 
> Right, but then you cannot inject interrupt anyway so EOI is not
> necessary. Instead TPR-below-threshold trap is set which handles
> that case. No?
> 
No. EOI clears ISR -> PROCPRI is recalculated -> pending interrupt is
injected.

--
			Gleb.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv4 3/5] kvm: host side for eoi optimization
  2012-05-16 17:23               ` Marcelo Tosatti
  2012-05-16 17:34                 ` Gleb Natapov
@ 2012-05-16 17:53                 ` Michael S. Tsirkin
  1 sibling, 0 replies; 39+ messages in thread
From: Michael S. Tsirkin @ 2012-05-16 17:53 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Gleb Natapov, x86, kvm, Ingo Molnar, H. Peter Anvin, Avi Kivity,
	Linus Torvalds, linux-kernel

On Wed, May 16, 2012 at 02:23:45PM -0300, Marcelo Tosatti wrote:
> On Wed, May 16, 2012 at 08:21:55PM +0300, Gleb Natapov wrote:
> > On Wed, May 16, 2012 at 02:04:27PM -0300, Marcelo Tosatti wrote:
> > > On Wed, May 16, 2012 at 07:50:48PM +0300, Michael S. Tsirkin wrote:
> > > > On Wed, May 16, 2012 at 07:32:06PM +0300, Gleb Natapov wrote:
> > > > > On Wed, May 16, 2012 at 07:22:47PM +0300, Michael S. Tsirkin wrote:
> > > > > > On Wed, May 16, 2012 at 12:49:40PM -0300, Marcelo Tosatti wrote:
> > > > > > > > @@ -1245,9 +1306,20 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
> > > > > > > >  	int vector = kvm_apic_has_interrupt(vcpu);
> > > > > > > >  	struct kvm_lapic *apic = vcpu->arch.apic;
> > > > > > > >  
> > > > > > > > -	if (vector == -1)
> > > > > > > > +	/* Detect interrupt nesting and disable EOI optimization */
> > > > > > > > +	if (pv_eoi_enabled(vcpu) && vector == -2)
> > > > > > > > +		pv_eoi_clr_pending(vcpu);
> > > > > > > > +
> > > > > > > > +	if (vector < 0)
> > > > > > > 
> > > > > > > With interrupt window exiting, the guest will exit:
> > > > > > > 
> > > > > > > - as soon as it sets RFLAGS.IF=1 and there is any 
> > > > > > > interrupt pending in IRR.
> > > > > > > - any new interrupt is set in IRR will kick vcpu
> > > > > > > out of guest mode and recalculate interrupt-window-exiting.
> > > > > > > 
> > > > > > > Doesnt this make this bit unnecessary ?
> > > > > > 
> > > > > > Looks like we could cut it out.  But I'm not sure how architectural it is
> > > > > > that we exit on interrupt window.
> > > > > We request exit on interrupt window only if there is pending irq that
> > > > > can be delivered on a guest entry.
> > > > 
> > > > Aha. If so what Marcelo proposed won't work I think: if we inject A then B
> > > > which is lower priority, we need an exit on EOI, we can't inject
> > > > immediately.
> > > 
> > > Please describe the scenario clearly, i can't see the problem.
> > 
> > During vcpu entry there are two IRRs set 100 and 200. 200 is injected,
> > but irq window is not requested because 100 can't be injected, During
> > EOI exit 100 is injected.
> 
> interrupt window exiting is always requested if IRR is pending. Except 
> if NMI window is requested (which has higher priority).
> 
> What am i missing here?

That if EOI was not yet signalled by guest when we inject
the low priority irq, then windows exiting is not
enough we need an exit on EOI.



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv4 3/5] kvm: host side for eoi optimization
  2012-05-16 17:20       ` Marcelo Tosatti
@ 2012-05-16 17:58         ` Michael S. Tsirkin
  2012-05-16 18:15           ` Marcelo Tosatti
  0 siblings, 1 reply; 39+ messages in thread
From: Michael S. Tsirkin @ 2012-05-16 17:58 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: x86, kvm, Ingo Molnar, H. Peter Anvin, Avi Kivity, gleb,
	Linus Torvalds, linux-kernel

On Wed, May 16, 2012 at 02:20:58PM -0300, Marcelo Tosatti wrote:
> On Wed, May 16, 2012 at 07:22:47PM +0300, Michael S. Tsirkin wrote:
> > On Wed, May 16, 2012 at 12:49:40PM -0300, Marcelo Tosatti wrote:
> > > > @@ -1245,9 +1306,20 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
> > > >  	int vector = kvm_apic_has_interrupt(vcpu);
> > > >  	struct kvm_lapic *apic = vcpu->arch.apic;
> > > >  
> > > > -	if (vector == -1)
> > > > +	/* Detect interrupt nesting and disable EOI optimization */
> > > > +	if (pv_eoi_enabled(vcpu) && vector == -2)
> > > > +		pv_eoi_clr_pending(vcpu);
> > > > +
> > > > +	if (vector < 0)
> > > 
> > > With interrupt window exiting, the guest will exit:
> > > 
> > > - as soon as it sets RFLAGS.IF=1 and there is any 
> > > interrupt pending in IRR.
> > > - any new interrupt is set in IRR will kick vcpu
> > > out of guest mode and recalculate interrupt-window-exiting.
> > > 
> > > Doesnt this make this bit unnecessary ?
> > 
> > Looks like we could cut it out.  But I'm not sure how architectural it is
> > that we exit on interrupt window.
> > I guess there are reasons to exit on interrupt window but
> > isn't it better to make the feature independent of it?
> 
> Hum... not sure. Is it helpful for the Hyper-V interface?
> 
> > This almost never happens in my testing anyway, so
> > however we handle it is unlikely to affect performance.
> 
> It decreases the amount of state that must be maintained.
> 
> BTW there is a bug covered by interrupt window exiting:
> 
> vcpu0                               host
> - irr 5 set
> - isr 5 set, irr 5 cleared
> - eoi_skip bit not set, 
> no other bit set in irr.
> - enter guest
> 
>                                     irr 4 set
>                                     kick vcpu0 out of guest mode
> 
> - eoi pending bit not set
>   (previous interrupt injection 
>    still pending)
> - skip eoi
> 
> If it were not for interrupt window exiting, this would 
> inject vector 4 on an unrelated exit who knows how long 
> in the future.
> 
> Also note optimization depends on the fact that the host 
> kicks vcpu out unconditionally (so it is dependent on 
> certain kvm implementation details).
> 
> 
> 



Look we can summarize as follows: irq windows exit is
required both before and after this patch.
But it does not make the check above redundant.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv4 3/5] kvm: host side for eoi optimization
  2012-05-16 17:58         ` Michael S. Tsirkin
@ 2012-05-16 18:15           ` Marcelo Tosatti
  2012-05-16 18:25             ` Gleb Natapov
                               ` (2 more replies)
  0 siblings, 3 replies; 39+ messages in thread
From: Marcelo Tosatti @ 2012-05-16 18:15 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: x86, kvm, Ingo Molnar, H. Peter Anvin, Avi Kivity, gleb,
	Linus Torvalds, linux-kernel

On Wed, May 16, 2012 at 08:58:57PM +0300, Michael S. Tsirkin wrote:
> On Wed, May 16, 2012 at 02:20:58PM -0300, Marcelo Tosatti wrote:
> > On Wed, May 16, 2012 at 07:22:47PM +0300, Michael S. Tsirkin wrote:
> > > On Wed, May 16, 2012 at 12:49:40PM -0300, Marcelo Tosatti wrote:
> > > > > @@ -1245,9 +1306,20 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
> > > > >  	int vector = kvm_apic_has_interrupt(vcpu);
> > > > >  	struct kvm_lapic *apic = vcpu->arch.apic;
> > > > >  
> > > > > -	if (vector == -1)
> > > > > +	/* Detect interrupt nesting and disable EOI optimization */
> > > > > +	if (pv_eoi_enabled(vcpu) && vector == -2)
> > > > > +		pv_eoi_clr_pending(vcpu);
> > > > > +
> > > > > +	if (vector < 0)
> > > > 
> > > > With interrupt window exiting, the guest will exit:
> > > > 
> > > > - as soon as it sets RFLAGS.IF=1 and there is any 
> > > > interrupt pending in IRR.
> > > > - any new interrupt is set in IRR will kick vcpu
> > > > out of guest mode and recalculate interrupt-window-exiting.
> > > > 
> > > > Doesnt this make this bit unnecessary ?
> > > 
> > > Looks like we could cut it out.  But I'm not sure how architectural it is
> > > that we exit on interrupt window.
> > > I guess there are reasons to exit on interrupt window but
> > > isn't it better to make the feature independent of it?
> > 
> > Hum... not sure. Is it helpful for the Hyper-V interface?
> > 
> > > This almost never happens in my testing anyway, so
> > > however we handle it is unlikely to affect performance.
> > 
> > It decreases the amount of state that must be maintained.
> > 
> > BTW there is a bug covered by interrupt window exiting:
> > 
> > vcpu0                               host
> > - irr 5 set
> > - isr 5 set, irr 5 cleared
> > - eoi_skip bit not set, 
> > no other bit set in irr.
> > - enter guest
> > 
> >                                     irr 4 set
> >                                     kick vcpu0 out of guest mode
> > 
> > - eoi pending bit not set
> >   (previous interrupt injection 
> >    still pending)
> > - skip eoi
> > 
> > If it were not for interrupt window exiting, this would 
> > inject vector 4 on an unrelated exit who knows how long 
> > in the future.
> > 
> > Also note optimization depends on the fact that the host 
> > kicks vcpu out unconditionally (so it is dependent on 
> > certain kvm implementation details).
> > 
> 
> Look we can summarize as follows: irq windows exit is
> required both before and after this patch.
> But it does not make the check above redundant.

Right, it is not redundant.

The above is still a bug: a case where eoi pending bit is not updated
properly.

How come is this compatible with hyper-v again? Enlight me.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv4 3/5] kvm: host side for eoi optimization
  2012-05-16 18:15           ` Marcelo Tosatti
@ 2012-05-16 18:25             ` Gleb Natapov
  2012-05-16 18:29               ` Michael S. Tsirkin
  2012-05-16 18:28             ` Michael S. Tsirkin
  2012-05-16 18:35             ` Michael S. Tsirkin
  2 siblings, 1 reply; 39+ messages in thread
From: Gleb Natapov @ 2012-05-16 18:25 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Michael S. Tsirkin, x86, kvm, Ingo Molnar, H. Peter Anvin,
	Avi Kivity, Linus Torvalds, linux-kernel

On Wed, May 16, 2012 at 03:15:00PM -0300, Marcelo Tosatti wrote:
> On Wed, May 16, 2012 at 08:58:57PM +0300, Michael S. Tsirkin wrote:
> > On Wed, May 16, 2012 at 02:20:58PM -0300, Marcelo Tosatti wrote:
> > > On Wed, May 16, 2012 at 07:22:47PM +0300, Michael S. Tsirkin wrote:
> > > > On Wed, May 16, 2012 at 12:49:40PM -0300, Marcelo Tosatti wrote:
> > > > > > @@ -1245,9 +1306,20 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
> > > > > >  	int vector = kvm_apic_has_interrupt(vcpu);
> > > > > >  	struct kvm_lapic *apic = vcpu->arch.apic;
> > > > > >  
> > > > > > -	if (vector == -1)
> > > > > > +	/* Detect interrupt nesting and disable EOI optimization */
> > > > > > +	if (pv_eoi_enabled(vcpu) && vector == -2)
> > > > > > +		pv_eoi_clr_pending(vcpu);
> > > > > > +
> > > > > > +	if (vector < 0)
> > > > > 
> > > > > With interrupt window exiting, the guest will exit:
> > > > > 
> > > > > - as soon as it sets RFLAGS.IF=1 and there is any 
> > > > > interrupt pending in IRR.
> > > > > - any new interrupt is set in IRR will kick vcpu
> > > > > out of guest mode and recalculate interrupt-window-exiting.
> > > > > 
> > > > > Doesnt this make this bit unnecessary ?
> > > > 
> > > > Looks like we could cut it out.  But I'm not sure how architectural it is
> > > > that we exit on interrupt window.
> > > > I guess there are reasons to exit on interrupt window but
> > > > isn't it better to make the feature independent of it?
> > > 
> > > Hum... not sure. Is it helpful for the Hyper-V interface?
> > > 
> > > > This almost never happens in my testing anyway, so
> > > > however we handle it is unlikely to affect performance.
> > > 
> > > It decreases the amount of state that must be maintained.
> > > 
> > > BTW there is a bug covered by interrupt window exiting:
> > > 
> > > vcpu0                               host
> > > - irr 5 set
> > > - isr 5 set, irr 5 cleared
> > > - eoi_skip bit not set, 
> > > no other bit set in irr.
> > > - enter guest
> > > 
> > >                                     irr 4 set
> > >                                     kick vcpu0 out of guest mode
> > > 
> > > - eoi pending bit not set
> > >   (previous interrupt injection 
> > >    still pending)
> > > - skip eoi
> > > 
> > > If it were not for interrupt window exiting, this would 
> > > inject vector 4 on an unrelated exit who knows how long 
> > > in the future.
> > > 
> > > Also note optimization depends on the fact that the host 
> > > kicks vcpu out unconditionally (so it is dependent on 
> > > certain kvm implementation details).
> > > 
> > 
> > Look we can summarize as follows: irq windows exit is
> > required both before and after this patch.
> > But it does not make the check above redundant.
> 
> Right, it is not redundant.
> 
> The above is still a bug: a case where eoi pending bit is not updated
> properly.
When IRR is set while eoi_skip is enabled, eoi_skip should be cleared.
Michael does your patch do that?

> 
> How come is this compatible with hyper-v again? Enlight me.

--
			Gleb.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv4 3/5] kvm: host side for eoi optimization
  2012-05-16 18:15           ` Marcelo Tosatti
  2012-05-16 18:25             ` Gleb Natapov
@ 2012-05-16 18:28             ` Michael S. Tsirkin
  2012-05-16 18:35             ` Michael S. Tsirkin
  2 siblings, 0 replies; 39+ messages in thread
From: Michael S. Tsirkin @ 2012-05-16 18:28 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: x86, kvm, Ingo Molnar, H. Peter Anvin, Avi Kivity, gleb,
	Linus Torvalds, linux-kernel

On Wed, May 16, 2012 at 03:15:00PM -0300, Marcelo Tosatti wrote:
> On Wed, May 16, 2012 at 08:58:57PM +0300, Michael S. Tsirkin wrote:
> > On Wed, May 16, 2012 at 02:20:58PM -0300, Marcelo Tosatti wrote:
> > > On Wed, May 16, 2012 at 07:22:47PM +0300, Michael S. Tsirkin wrote:
> > > > On Wed, May 16, 2012 at 12:49:40PM -0300, Marcelo Tosatti wrote:
> > > > > > @@ -1245,9 +1306,20 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
> > > > > >  	int vector = kvm_apic_has_interrupt(vcpu);
> > > > > >  	struct kvm_lapic *apic = vcpu->arch.apic;
> > > > > >  
> > > > > > -	if (vector == -1)
> > > > > > +	/* Detect interrupt nesting and disable EOI optimization */
> > > > > > +	if (pv_eoi_enabled(vcpu) && vector == -2)
> > > > > > +		pv_eoi_clr_pending(vcpu);
> > > > > > +
> > > > > > +	if (vector < 0)
> > > > > 
> > > > > With interrupt window exiting, the guest will exit:
> > > > > 
> > > > > - as soon as it sets RFLAGS.IF=1 and there is any 
> > > > > interrupt pending in IRR.
> > > > > - any new interrupt is set in IRR will kick vcpu
> > > > > out of guest mode and recalculate interrupt-window-exiting.
> > > > > 
> > > > > Doesnt this make this bit unnecessary ?
> > > > 
> > > > Looks like we could cut it out.  But I'm not sure how architectural it is
> > > > that we exit on interrupt window.
> > > > I guess there are reasons to exit on interrupt window but
> > > > isn't it better to make the feature independent of it?
> > > 
> > > Hum... not sure. Is it helpful for the Hyper-V interface?
> > > 
> > > > This almost never happens in my testing anyway, so
> > > > however we handle it is unlikely to affect performance.
> > > 
> > > It decreases the amount of state that must be maintained.
> > > 
> > > BTW there is a bug covered by interrupt window exiting:
> > > 
> > > vcpu0                               host
> > > - irr 5 set
> > > - isr 5 set, irr 5 cleared
> > > - eoi_skip bit not set, 
> > > no other bit set in irr.
> > > - enter guest
> > > 
> > >                                     irr 4 set
> > >                                     kick vcpu0 out of guest mode
> > > 
> > > - eoi pending bit not set
> > >   (previous interrupt injection 
> > >    still pending)
> > > - skip eoi
> > > 
> > > If it were not for interrupt window exiting, this would 
> > > inject vector 4 on an unrelated exit who knows how long 
> > > in the future.
> > > 
> > > Also note optimization depends on the fact that the host 
> > > kicks vcpu out unconditionally (so it is dependent on 
> > > certain kvm implementation details).
> > > 
> > 
> > Look we can summarize as follows: irq windows exit is
> > required both before and after this patch.
> > But it does not make the check above redundant.
> 
> Right, it is not redundant.
> 
> The above is still a bug: a case where eoi pending bit is not updated
> properly.
> 
> How come is this compatible with hyper-v again? Enlight me.

Set kvm
to point pv_eoi msr data to va and it will just work.

-- 
MST

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv4 3/5] kvm: host side for eoi optimization
  2012-05-16 18:25             ` Gleb Natapov
@ 2012-05-16 18:29               ` Michael S. Tsirkin
  2012-05-16 18:38                 ` Gleb Natapov
  0 siblings, 1 reply; 39+ messages in thread
From: Michael S. Tsirkin @ 2012-05-16 18:29 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Marcelo Tosatti, x86, kvm, Ingo Molnar, H. Peter Anvin,
	Avi Kivity, Linus Torvalds, linux-kernel

On Wed, May 16, 2012 at 09:25:20PM +0300, Gleb Natapov wrote:
> On Wed, May 16, 2012 at 03:15:00PM -0300, Marcelo Tosatti wrote:
> > On Wed, May 16, 2012 at 08:58:57PM +0300, Michael S. Tsirkin wrote:
> > > On Wed, May 16, 2012 at 02:20:58PM -0300, Marcelo Tosatti wrote:
> > > > On Wed, May 16, 2012 at 07:22:47PM +0300, Michael S. Tsirkin wrote:
> > > > > On Wed, May 16, 2012 at 12:49:40PM -0300, Marcelo Tosatti wrote:
> > > > > > > @@ -1245,9 +1306,20 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
> > > > > > >  	int vector = kvm_apic_has_interrupt(vcpu);
> > > > > > >  	struct kvm_lapic *apic = vcpu->arch.apic;
> > > > > > >  
> > > > > > > -	if (vector == -1)
> > > > > > > +	/* Detect interrupt nesting and disable EOI optimization */
> > > > > > > +	if (pv_eoi_enabled(vcpu) && vector == -2)
> > > > > > > +		pv_eoi_clr_pending(vcpu);
> > > > > > > +
> > > > > > > +	if (vector < 0)
> > > > > > 
> > > > > > With interrupt window exiting, the guest will exit:
> > > > > > 
> > > > > > - as soon as it sets RFLAGS.IF=1 and there is any 
> > > > > > interrupt pending in IRR.
> > > > > > - any new interrupt is set in IRR will kick vcpu
> > > > > > out of guest mode and recalculate interrupt-window-exiting.
> > > > > > 
> > > > > > Doesnt this make this bit unnecessary ?
> > > > > 
> > > > > Looks like we could cut it out.  But I'm not sure how architectural it is
> > > > > that we exit on interrupt window.
> > > > > I guess there are reasons to exit on interrupt window but
> > > > > isn't it better to make the feature independent of it?
> > > > 
> > > > Hum... not sure. Is it helpful for the Hyper-V interface?
> > > > 
> > > > > This almost never happens in my testing anyway, so
> > > > > however we handle it is unlikely to affect performance.
> > > > 
> > > > It decreases the amount of state that must be maintained.
> > > > 
> > > > BTW there is a bug covered by interrupt window exiting:
> > > > 
> > > > vcpu0                               host
> > > > - irr 5 set
> > > > - isr 5 set, irr 5 cleared
> > > > - eoi_skip bit not set, 
> > > > no other bit set in irr.
> > > > - enter guest
> > > > 
> > > >                                     irr 4 set
> > > >                                     kick vcpu0 out of guest mode
> > > > 
> > > > - eoi pending bit not set
> > > >   (previous interrupt injection 
> > > >    still pending)
> > > > - skip eoi
> > > > 
> > > > If it were not for interrupt window exiting, this would 
> > > > inject vector 4 on an unrelated exit who knows how long 
> > > > in the future.
> > > > 
> > > > Also note optimization depends on the fact that the host 
> > > > kicks vcpu out unconditionally (so it is dependent on 
> > > > certain kvm implementation details).
> > > > 
> > > 
> > > Look we can summarize as follows: irq windows exit is
> > > required both before and after this patch.
> > > But it does not make the check above redundant.
> > 
> > Right, it is not redundant.
> > 
> > The above is still a bug: a case where eoi pending bit is not updated
> > properly.
> When IRR is set while eoi_skip is enabled, eoi_skip should be cleared.
> Michael does your patch do that?

I think this does it:
        /* Detect interrupt nesting and disable EOI optimization */
        if (pv_eoi_enabled(vcpu) && vector == -2)
                pv_eoi_clr_pending(vcpu);

-2 is returned when irr is set.


> > 
> > How come is this compatible with hyper-v again? Enlight me.
> 
> --
> 			Gleb.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv4 3/5] kvm: host side for eoi optimization
  2012-05-16 18:15           ` Marcelo Tosatti
  2012-05-16 18:25             ` Gleb Natapov
  2012-05-16 18:28             ` Michael S. Tsirkin
@ 2012-05-16 18:35             ` Michael S. Tsirkin
  2 siblings, 0 replies; 39+ messages in thread
From: Michael S. Tsirkin @ 2012-05-16 18:35 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: x86, kvm, Ingo Molnar, H. Peter Anvin, Avi Kivity, gleb,
	Linus Torvalds, linux-kernel

On Wed, May 16, 2012 at 03:15:00PM -0300, Marcelo Tosatti wrote:
> On Wed, May 16, 2012 at 08:58:57PM +0300, Michael S. Tsirkin wrote:
> > On Wed, May 16, 2012 at 02:20:58PM -0300, Marcelo Tosatti wrote:
> > > On Wed, May 16, 2012 at 07:22:47PM +0300, Michael S. Tsirkin wrote:
> > > > On Wed, May 16, 2012 at 12:49:40PM -0300, Marcelo Tosatti wrote:
> > > > > > @@ -1245,9 +1306,20 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
> > > > > >  	int vector = kvm_apic_has_interrupt(vcpu);
> > > > > >  	struct kvm_lapic *apic = vcpu->arch.apic;
> > > > > >  
> > > > > > -	if (vector == -1)
> > > > > > +	/* Detect interrupt nesting and disable EOI optimization */
> > > > > > +	if (pv_eoi_enabled(vcpu) && vector == -2)
> > > > > > +		pv_eoi_clr_pending(vcpu);
> > > > > > +
> > > > > > +	if (vector < 0)
> > > > > 
> > > > > With interrupt window exiting, the guest will exit:
> > > > > 
> > > > > - as soon as it sets RFLAGS.IF=1 and there is any 
> > > > > interrupt pending in IRR.
> > > > > - any new interrupt is set in IRR will kick vcpu
> > > > > out of guest mode and recalculate interrupt-window-exiting.
> > > > > 
> > > > > Doesnt this make this bit unnecessary ?
> > > > 
> > > > Looks like we could cut it out.  But I'm not sure how architectural it is
> > > > that we exit on interrupt window.
> > > > I guess there are reasons to exit on interrupt window but
> > > > isn't it better to make the feature independent of it?
> > > 
> > > Hum... not sure. Is it helpful for the Hyper-V interface?
> > > 
> > > > This almost never happens in my testing anyway, so
> > > > however we handle it is unlikely to affect performance.
> > > 
> > > It decreases the amount of state that must be maintained.
> > > 
> > > BTW there is a bug covered by interrupt window exiting:
> > > 
> > > vcpu0                               host
> > > - irr 5 set
> > > - isr 5 set, irr 5 cleared
> > > - eoi_skip bit not set, 
> > > no other bit set in irr.
> > > - enter guest
> > > 
> > >                                     irr 4 set
> > >                                     kick vcpu0 out of guest mode
> > > 
> > > - eoi pending bit not set
> > >   (previous interrupt injection 
> > >    still pending)
> > > - skip eoi
> > > 
> > > If it were not for interrupt window exiting, this would 
> > > inject vector 4 on an unrelated exit who knows how long 
> > > in the future.
> > > 
> > > Also note optimization depends on the fact that the host 
> > > kicks vcpu out unconditionally (so it is dependent on 
> > > certain kvm implementation details).
> > > 
> > 
> > Look we can summarize as follows: irq windows exit is
> > required both before and after this patch.
> > But it does not make the check above redundant.
> 
> Right, it is not redundant.
> 
> The above is still a bug: a case where eoi pending bit is not updated
> properly.

The diagram shows pv eoi in guest is disabled so no need
for pending bit?


> How come is this compatible with hyper-v again? Enlight me.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv4 3/5] kvm: host side for eoi optimization
  2012-05-16 18:29               ` Michael S. Tsirkin
@ 2012-05-16 18:38                 ` Gleb Natapov
  2012-05-16 19:07                   ` Michael S. Tsirkin
  0 siblings, 1 reply; 39+ messages in thread
From: Gleb Natapov @ 2012-05-16 18:38 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Marcelo Tosatti, x86, kvm, Ingo Molnar, H. Peter Anvin,
	Avi Kivity, Linus Torvalds, linux-kernel

On Wed, May 16, 2012 at 09:29:49PM +0300, Michael S. Tsirkin wrote:
> On Wed, May 16, 2012 at 09:25:20PM +0300, Gleb Natapov wrote:
> > On Wed, May 16, 2012 at 03:15:00PM -0300, Marcelo Tosatti wrote:
> > > On Wed, May 16, 2012 at 08:58:57PM +0300, Michael S. Tsirkin wrote:
> > > > On Wed, May 16, 2012 at 02:20:58PM -0300, Marcelo Tosatti wrote:
> > > > > On Wed, May 16, 2012 at 07:22:47PM +0300, Michael S. Tsirkin wrote:
> > > > > > On Wed, May 16, 2012 at 12:49:40PM -0300, Marcelo Tosatti wrote:
> > > > > > > > @@ -1245,9 +1306,20 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
> > > > > > > >  	int vector = kvm_apic_has_interrupt(vcpu);
> > > > > > > >  	struct kvm_lapic *apic = vcpu->arch.apic;
> > > > > > > >  
> > > > > > > > -	if (vector == -1)
> > > > > > > > +	/* Detect interrupt nesting and disable EOI optimization */
> > > > > > > > +	if (pv_eoi_enabled(vcpu) && vector == -2)
> > > > > > > > +		pv_eoi_clr_pending(vcpu);
> > > > > > > > +
> > > > > > > > +	if (vector < 0)
> > > > > > > 
> > > > > > > With interrupt window exiting, the guest will exit:
> > > > > > > 
> > > > > > > - as soon as it sets RFLAGS.IF=1 and there is any 
> > > > > > > interrupt pending in IRR.
> > > > > > > - any new interrupt is set in IRR will kick vcpu
> > > > > > > out of guest mode and recalculate interrupt-window-exiting.
> > > > > > > 
> > > > > > > Doesnt this make this bit unnecessary ?
> > > > > > 
> > > > > > Looks like we could cut it out.  But I'm not sure how architectural it is
> > > > > > that we exit on interrupt window.
> > > > > > I guess there are reasons to exit on interrupt window but
> > > > > > isn't it better to make the feature independent of it?
> > > > > 
> > > > > Hum... not sure. Is it helpful for the Hyper-V interface?
> > > > > 
> > > > > > This almost never happens in my testing anyway, so
> > > > > > however we handle it is unlikely to affect performance.
> > > > > 
> > > > > It decreases the amount of state that must be maintained.
> > > > > 
> > > > > BTW there is a bug covered by interrupt window exiting:
> > > > > 
> > > > > vcpu0                               host
> > > > > - irr 5 set
> > > > > - isr 5 set, irr 5 cleared
> > > > > - eoi_skip bit not set, 
> > > > > no other bit set in irr.
> > > > > - enter guest
> > > > > 
> > > > >                                     irr 4 set
> > > > >                                     kick vcpu0 out of guest mode
> > > > > 
> > > > > - eoi pending bit not set
> > > > >   (previous interrupt injection 
> > > > >    still pending)
> > > > > - skip eoi
> > > > > 
> > > > > If it were not for interrupt window exiting, this would 
> > > > > inject vector 4 on an unrelated exit who knows how long 
> > > > > in the future.
> > > > > 
> > > > > Also note optimization depends on the fact that the host 
> > > > > kicks vcpu out unconditionally (so it is dependent on 
> > > > > certain kvm implementation details).
> > > > > 
> > > > 
> > > > Look we can summarize as follows: irq windows exit is
> > > > required both before and after this patch.
> > > > But it does not make the check above redundant.
> > > 
> > > Right, it is not redundant.
> > > 
> > > The above is still a bug: a case where eoi pending bit is not updated
> > > properly.
> > When IRR is set while eoi_skip is enabled, eoi_skip should be cleared.
> > Michael does your patch do that?
> 
> I think this does it:
>         /* Detect interrupt nesting and disable EOI optimization */
>         if (pv_eoi_enabled(vcpu) && vector == -2)
>                 pv_eoi_clr_pending(vcpu);
> 
> -2 is returned when irr is set.
> 
This code is reached from kvm_cpu_get_interrupt(), but this function will
not be called in above scenario.

> 
> > > 
> > > How come is this compatible with hyper-v again? Enlight me.
> > 
> > --
> > 			Gleb.

--
			Gleb.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv4 3/5] kvm: host side for eoi optimization
  2012-05-16 18:38                 ` Gleb Natapov
@ 2012-05-16 19:07                   ` Michael S. Tsirkin
  2012-05-16 21:37                     ` Marcelo Tosatti
  2012-05-17  7:28                     ` Gleb Natapov
  0 siblings, 2 replies; 39+ messages in thread
From: Michael S. Tsirkin @ 2012-05-16 19:07 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Marcelo Tosatti, x86, kvm, Ingo Molnar, H. Peter Anvin,
	Avi Kivity, Linus Torvalds, linux-kernel

On Wed, May 16, 2012 at 09:38:22PM +0300, Gleb Natapov wrote:
> On Wed, May 16, 2012 at 09:29:49PM +0300, Michael S. Tsirkin wrote:
> > On Wed, May 16, 2012 at 09:25:20PM +0300, Gleb Natapov wrote:
> > > On Wed, May 16, 2012 at 03:15:00PM -0300, Marcelo Tosatti wrote:
> > > > On Wed, May 16, 2012 at 08:58:57PM +0300, Michael S. Tsirkin wrote:
> > > > > On Wed, May 16, 2012 at 02:20:58PM -0300, Marcelo Tosatti wrote:
> > > > > > On Wed, May 16, 2012 at 07:22:47PM +0300, Michael S. Tsirkin wrote:
> > > > > > > On Wed, May 16, 2012 at 12:49:40PM -0300, Marcelo Tosatti wrote:
> > > > > > > > > @@ -1245,9 +1306,20 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
> > > > > > > > >  	int vector = kvm_apic_has_interrupt(vcpu);
> > > > > > > > >  	struct kvm_lapic *apic = vcpu->arch.apic;
> > > > > > > > >  
> > > > > > > > > -	if (vector == -1)
> > > > > > > > > +	/* Detect interrupt nesting and disable EOI optimization */
> > > > > > > > > +	if (pv_eoi_enabled(vcpu) && vector == -2)
> > > > > > > > > +		pv_eoi_clr_pending(vcpu);
> > > > > > > > > +
> > > > > > > > > +	if (vector < 0)
> > > > > > > > 
> > > > > > > > With interrupt window exiting, the guest will exit:
> > > > > > > > 
> > > > > > > > - as soon as it sets RFLAGS.IF=1 and there is any 
> > > > > > > > interrupt pending in IRR.
> > > > > > > > - any new interrupt is set in IRR will kick vcpu
> > > > > > > > out of guest mode and recalculate interrupt-window-exiting.
> > > > > > > > 
> > > > > > > > Doesnt this make this bit unnecessary ?
> > > > > > > 
> > > > > > > Looks like we could cut it out.  But I'm not sure how architectural it is
> > > > > > > that we exit on interrupt window.
> > > > > > > I guess there are reasons to exit on interrupt window but
> > > > > > > isn't it better to make the feature independent of it?
> > > > > > 
> > > > > > Hum... not sure. Is it helpful for the Hyper-V interface?
> > > > > > 
> > > > > > > This almost never happens in my testing anyway, so
> > > > > > > however we handle it is unlikely to affect performance.
> > > > > > 
> > > > > > It decreases the amount of state that must be maintained.
> > > > > > 
> > > > > > BTW there is a bug covered by interrupt window exiting:
> > > > > > 
> > > > > > vcpu0                               host
> > > > > > - irr 5 set
> > > > > > - isr 5 set, irr 5 cleared
> > > > > > - eoi_skip bit not set, 
> > > > > > no other bit set in irr.
> > > > > > - enter guest
> > > > > > 
> > > > > >                                     irr 4 set
> > > > > >                                     kick vcpu0 out of guest mode
> > > > > > 
> > > > > > - eoi pending bit not set
> > > > > >   (previous interrupt injection 
> > > > > >    still pending)
> > > > > > - skip eoi
> > > > > > 
> > > > > > If it were not for interrupt window exiting, this would 
> > > > > > inject vector 4 on an unrelated exit who knows how long 
> > > > > > in the future.
> > > > > > 
> > > > > > Also note optimization depends on the fact that the host 
> > > > > > kicks vcpu out unconditionally (so it is dependent on 
> > > > > > certain kvm implementation details).
> > > > > > 
> > > > > 
> > > > > Look we can summarize as follows: irq windows exit is
> > > > > required both before and after this patch.
> > > > > But it does not make the check above redundant.
> > > > 
> > > > Right, it is not redundant.
> > > > 
> > > > The above is still a bug: a case where eoi pending bit is not updated
> > > > properly.
> > > When IRR is set while eoi_skip is enabled, eoi_skip should be cleared.
> > > Michael does your patch do that?
> > 
> > I think this does it:
> >         /* Detect interrupt nesting and disable EOI optimization */
> >         if (pv_eoi_enabled(vcpu) && vector == -2)
> >                 pv_eoi_clr_pending(vcpu);
> > 
> > -2 is returned when irr is set.
> > 
> This code is reached from kvm_cpu_get_interrupt(), but this function will
> not be called in above scenario.

I think I see. So this shall fix it also makes code cleaner
(no -2 hack). Right? kvm_apic_has_interrupt is called correct?

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index b4f7013..5a38e34 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1273,8 +1273,12 @@ int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu)
 	highest_irr = apic_find_highest_irr(apic);
 	if (highest_irr == -1)
 		return -1;
-	if (((highest_irr & 0xF0) <= apic_get_reg(apic, APIC_PROCPRI)))
-		return -2;
+	/* Detect interrupt nesting and disable EOI optimization */
+	if ((highest_irr & 0xF0) <= apic_get_reg(apic, APIC_PROCPRI)) {
+		if (pv_eoi_enabled(vcpu))
+			pv_eoi_clr_pending(vcpu);
+		return -1;
+	}
 	return highest_irr;
 }
 
@@ -1306,10 +1310,6 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
 	int vector = kvm_apic_has_interrupt(vcpu);
 	struct kvm_lapic *apic = vcpu->arch.apic;
 
-	/* Detect interrupt nesting and disable EOI optimization */
-	if (pv_eoi_enabled(vcpu) && vector == -2)
-		pv_eoi_clr_pending(vcpu);
-
 	if (vector < 0)
 		return -1;
 

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCHv4 3/5] kvm: host side for eoi optimization
  2012-05-16 19:07                   ` Michael S. Tsirkin
@ 2012-05-16 21:37                     ` Marcelo Tosatti
  2012-05-17  7:28                     ` Gleb Natapov
  1 sibling, 0 replies; 39+ messages in thread
From: Marcelo Tosatti @ 2012-05-16 21:37 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Gleb Natapov, x86, kvm, Ingo Molnar, H. Peter Anvin, Avi Kivity,
	Linus Torvalds, linux-kernel

On Wed, May 16, 2012 at 10:07:58PM +0300, Michael S. Tsirkin wrote:
> > not be called in above scenario.
> 
> I think I see. So this shall fix it also makes code cleaner
> (no -2 hack). Right? kvm_apic_has_interrupt is called correct?

Yes, but its used by multiple callsites. Best to unify it (both setting
and clearing) in KVM_REQ_EVENT processing before guest entry.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv4 3/5] kvm: host side for eoi optimization
  2012-05-16 19:07                   ` Michael S. Tsirkin
  2012-05-16 21:37                     ` Marcelo Tosatti
@ 2012-05-17  7:28                     ` Gleb Natapov
  2012-05-17  7:49                       ` Michael S. Tsirkin
  1 sibling, 1 reply; 39+ messages in thread
From: Gleb Natapov @ 2012-05-17  7:28 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Marcelo Tosatti, x86, kvm, Ingo Molnar, H. Peter Anvin,
	Avi Kivity, Linus Torvalds, linux-kernel

On Wed, May 16, 2012 at 10:07:58PM +0300, Michael S. Tsirkin wrote:
> > This code is reached from kvm_cpu_get_interrupt(), but this function will
> > not be called in above scenario.
> 
> I think I see. So this shall fix it also makes code cleaner
> (no -2 hack). Right? kvm_apic_has_interrupt is called correct?
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index b4f7013..5a38e34 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -1273,8 +1273,12 @@ int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu)
>  	highest_irr = apic_find_highest_irr(apic);
>  	if (highest_irr == -1)
>  		return -1;
> -	if (((highest_irr & 0xF0) <= apic_get_reg(apic, APIC_PROCPRI)))
> -		return -2;
> +	/* Detect interrupt nesting and disable EOI optimization */
> +	if ((highest_irr & 0xF0) <= apic_get_reg(apic, APIC_PROCPRI)) {
> +		if (pv_eoi_enabled(vcpu))
> +			pv_eoi_clr_pending(vcpu);
> +		return -1;
> +	}
>  	return highest_irr;
>  }
>  
I do not like it. kvm_apic_has_interrupt() does not suppose to mutate
state.

> @@ -1306,10 +1310,6 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
>  	int vector = kvm_apic_has_interrupt(vcpu);
>  	struct kvm_lapic *apic = vcpu->arch.apic;
>  
> -	/* Detect interrupt nesting and disable EOI optimization */
> -	if (pv_eoi_enabled(vcpu) && vector == -2)
> -		pv_eoi_clr_pending(vcpu);
> -
>  	if (vector < 0)
>  		return -1;
>  

--
			Gleb.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv4 3/5] kvm: host side for eoi optimization
  2012-05-17  7:28                     ` Gleb Natapov
@ 2012-05-17  7:49                       ` Michael S. Tsirkin
  2012-05-17  7:56                         ` Gleb Natapov
  2012-05-17  7:57                         ` Avi Kivity
  0 siblings, 2 replies; 39+ messages in thread
From: Michael S. Tsirkin @ 2012-05-17  7:49 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Marcelo Tosatti, x86, kvm, Ingo Molnar, H. Peter Anvin,
	Avi Kivity, Linus Torvalds, linux-kernel

On Thu, May 17, 2012 at 10:28:41AM +0300, Gleb Natapov wrote:
> On Wed, May 16, 2012 at 10:07:58PM +0300, Michael S. Tsirkin wrote:
> > > This code is reached from kvm_cpu_get_interrupt(), but this function will
> > > not be called in above scenario.
> > 
> > I think I see. So this shall fix it also makes code cleaner
> > (no -2 hack). Right? kvm_apic_has_interrupt is called correct?
> > 
> > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > index b4f7013..5a38e34 100644
> > --- a/arch/x86/kvm/lapic.c
> > +++ b/arch/x86/kvm/lapic.c
> > @@ -1273,8 +1273,12 @@ int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu)
> >  	highest_irr = apic_find_highest_irr(apic);
> >  	if (highest_irr == -1)
> >  		return -1;
> > -	if (((highest_irr & 0xF0) <= apic_get_reg(apic, APIC_PROCPRI)))
> > -		return -2;
> > +	/* Detect interrupt nesting and disable EOI optimization */
> > +	if ((highest_irr & 0xF0) <= apic_get_reg(apic, APIC_PROCPRI)) {
> > +		if (pv_eoi_enabled(vcpu))
> > +			pv_eoi_clr_pending(vcpu);
> > +		return -1;
> > +	}
> >  	return highest_irr;
> >  }
> >  
> I do not like it. kvm_apic_has_interrupt() does not suppose to mutate
> state.

OK, so let's rename it so it's clear it can mutate state?

> > @@ -1306,10 +1310,6 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
> >  	int vector = kvm_apic_has_interrupt(vcpu);
> >  	struct kvm_lapic *apic = vcpu->arch.apic;
> >  
> > -	/* Detect interrupt nesting and disable EOI optimization */
> > -	if (pv_eoi_enabled(vcpu) && vector == -2)
> > -		pv_eoi_clr_pending(vcpu);
> > -
> >  	if (vector < 0)
> >  		return -1;
> >  
> 
> --
> 			Gleb.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv4 3/5] kvm: host side for eoi optimization
  2012-05-17  7:49                       ` Michael S. Tsirkin
@ 2012-05-17  7:56                         ` Gleb Natapov
  2012-05-17  7:57                         ` Avi Kivity
  1 sibling, 0 replies; 39+ messages in thread
From: Gleb Natapov @ 2012-05-17  7:56 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Marcelo Tosatti, x86, kvm, Ingo Molnar, H. Peter Anvin,
	Avi Kivity, Linus Torvalds, linux-kernel

On Thu, May 17, 2012 at 10:49:47AM +0300, Michael S. Tsirkin wrote:
> On Thu, May 17, 2012 at 10:28:41AM +0300, Gleb Natapov wrote:
> > On Wed, May 16, 2012 at 10:07:58PM +0300, Michael S. Tsirkin wrote:
> > > > This code is reached from kvm_cpu_get_interrupt(), but this function will
> > > > not be called in above scenario.
> > > 
> > > I think I see. So this shall fix it also makes code cleaner
> > > (no -2 hack). Right? kvm_apic_has_interrupt is called correct?
> > > 
> > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > > index b4f7013..5a38e34 100644
> > > --- a/arch/x86/kvm/lapic.c
> > > +++ b/arch/x86/kvm/lapic.c
> > > @@ -1273,8 +1273,12 @@ int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu)
> > >  	highest_irr = apic_find_highest_irr(apic);
> > >  	if (highest_irr == -1)
> > >  		return -1;
> > > -	if (((highest_irr & 0xF0) <= apic_get_reg(apic, APIC_PROCPRI)))
> > > -		return -2;
> > > +	/* Detect interrupt nesting and disable EOI optimization */
> > > +	if ((highest_irr & 0xF0) <= apic_get_reg(apic, APIC_PROCPRI)) {
> > > +		if (pv_eoi_enabled(vcpu))
> > > +			pv_eoi_clr_pending(vcpu);
> > > +		return -1;
> > > +	}
> > >  	return highest_irr;
> > >  }
> > >  
> > I do not like it. kvm_apic_has_interrupt() does not suppose to mutate
> > state.
> 
> OK, so let's rename it so it's clear it can mutate state?
> 
It is not about the name, it is about how function is used, and it is
used to check if there is something to deliver. We should clear pv_eoi
during interrupt delivery to apic. May be setting vcpu request bit and
calling pv_eoi_clr_pending() on a guest entry. 

> > > @@ -1306,10 +1310,6 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
> > >  	int vector = kvm_apic_has_interrupt(vcpu);
> > >  	struct kvm_lapic *apic = vcpu->arch.apic;
> > >  
> > > -	/* Detect interrupt nesting and disable EOI optimization */
> > > -	if (pv_eoi_enabled(vcpu) && vector == -2)
> > > -		pv_eoi_clr_pending(vcpu);
> > > -
> > >  	if (vector < 0)
> > >  		return -1;
> > >  
> > 
> > --
> > 			Gleb.

--
			Gleb.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv4 3/5] kvm: host side for eoi optimization
  2012-05-17  7:49                       ` Michael S. Tsirkin
  2012-05-17  7:56                         ` Gleb Natapov
@ 2012-05-17  7:57                         ` Avi Kivity
  2012-05-17  8:07                           ` Gleb Natapov
  2012-05-17  9:10                           ` Michael S. Tsirkin
  1 sibling, 2 replies; 39+ messages in thread
From: Avi Kivity @ 2012-05-17  7:57 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Gleb Natapov, Marcelo Tosatti, x86, kvm, Ingo Molnar,
	H. Peter Anvin, Linus Torvalds, linux-kernel

On 05/17/2012 10:49 AM, Michael S. Tsirkin wrote:
> On Thu, May 17, 2012 at 10:28:41AM +0300, Gleb Natapov wrote:
> > On Wed, May 16, 2012 at 10:07:58PM +0300, Michael S. Tsirkin wrote:
> > > > This code is reached from kvm_cpu_get_interrupt(), but this function will
> > > > not be called in above scenario.
> > > 
> > > I think I see. So this shall fix it also makes code cleaner
> > > (no -2 hack). Right? kvm_apic_has_interrupt is called correct?
> > > 
> > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > > index b4f7013..5a38e34 100644
> > > --- a/arch/x86/kvm/lapic.c
> > > +++ b/arch/x86/kvm/lapic.c
> > > @@ -1273,8 +1273,12 @@ int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu)
> > >  	highest_irr = apic_find_highest_irr(apic);
> > >  	if (highest_irr == -1)
> > >  		return -1;
> > > -	if (((highest_irr & 0xF0) <= apic_get_reg(apic, APIC_PROCPRI)))
> > > -		return -2;
> > > +	/* Detect interrupt nesting and disable EOI optimization */
> > > +	if ((highest_irr & 0xF0) <= apic_get_reg(apic, APIC_PROCPRI)) {
> > > +		if (pv_eoi_enabled(vcpu))
> > > +			pv_eoi_clr_pending(vcpu);
> > > +		return -1;
> > > +	}
> > >  	return highest_irr;
> > >  }
> > >  
> > I do not like it. kvm_apic_has_interrupt() does not suppose to mutate
> > state.
>
> OK, so let's rename it so it's clear it can mutate state?
>

No, let's refactor this so it makes sense.  The {has|get}_interrupt
split is the cause of the problem, I think.  We need a single function,
with callbacks that are called when an event happens.  The callbacks can
request an irq window exit, inject an interrupt, play with pveoi, or
cause a #vmexit.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv4 3/5] kvm: host side for eoi optimization
  2012-05-17  7:57                         ` Avi Kivity
@ 2012-05-17  8:07                           ` Gleb Natapov
  2012-05-17  9:24                             ` Avi Kivity
  2012-05-17  9:10                           ` Michael S. Tsirkin
  1 sibling, 1 reply; 39+ messages in thread
From: Gleb Natapov @ 2012-05-17  8:07 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Michael S. Tsirkin, Marcelo Tosatti, x86, kvm, Ingo Molnar,
	H. Peter Anvin, Linus Torvalds, linux-kernel

On Thu, May 17, 2012 at 10:57:33AM +0300, Avi Kivity wrote:
> On 05/17/2012 10:49 AM, Michael S. Tsirkin wrote:
> > On Thu, May 17, 2012 at 10:28:41AM +0300, Gleb Natapov wrote:
> > > On Wed, May 16, 2012 at 10:07:58PM +0300, Michael S. Tsirkin wrote:
> > > > > This code is reached from kvm_cpu_get_interrupt(), but this function will
> > > > > not be called in above scenario.
> > > > 
> > > > I think I see. So this shall fix it also makes code cleaner
> > > > (no -2 hack). Right? kvm_apic_has_interrupt is called correct?
> > > > 
> > > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > > > index b4f7013..5a38e34 100644
> > > > --- a/arch/x86/kvm/lapic.c
> > > > +++ b/arch/x86/kvm/lapic.c
> > > > @@ -1273,8 +1273,12 @@ int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu)
> > > >  	highest_irr = apic_find_highest_irr(apic);
> > > >  	if (highest_irr == -1)
> > > >  		return -1;
> > > > -	if (((highest_irr & 0xF0) <= apic_get_reg(apic, APIC_PROCPRI)))
> > > > -		return -2;
> > > > +	/* Detect interrupt nesting and disable EOI optimization */
> > > > +	if ((highest_irr & 0xF0) <= apic_get_reg(apic, APIC_PROCPRI)) {
> > > > +		if (pv_eoi_enabled(vcpu))
> > > > +			pv_eoi_clr_pending(vcpu);
> > > > +		return -1;
> > > > +	}
> > > >  	return highest_irr;
> > > >  }
> > > >  
> > > I do not like it. kvm_apic_has_interrupt() does not suppose to mutate
> > > state.
> >
> > OK, so let's rename it so it's clear it can mutate state?
> >
> 
> No, let's refactor this so it makes sense.  The {has|get}_interrupt
> split is the cause of the problem, I think.  We need a single function,
> with callbacks that are called when an event happens.  The callbacks can
> request an irq window exit, inject an interrupt, play with pveoi, or
> cause a #vmexit.
> 
Not sure what do you mean here. I kind of like the code we have now, but
this may be because I understand it :)

--
			Gleb.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv4 3/5] kvm: host side for eoi optimization
  2012-05-17  7:57                         ` Avi Kivity
  2012-05-17  8:07                           ` Gleb Natapov
@ 2012-05-17  9:10                           ` Michael S. Tsirkin
  2012-05-17  9:12                             ` Gleb Natapov
  1 sibling, 1 reply; 39+ messages in thread
From: Michael S. Tsirkin @ 2012-05-17  9:10 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Gleb Natapov, Marcelo Tosatti, x86, kvm, Ingo Molnar,
	H. Peter Anvin, Linus Torvalds, linux-kernel

On Thu, May 17, 2012 at 10:57:33AM +0300, Avi Kivity wrote:
> On 05/17/2012 10:49 AM, Michael S. Tsirkin wrote:
> > On Thu, May 17, 2012 at 10:28:41AM +0300, Gleb Natapov wrote:
> > > On Wed, May 16, 2012 at 10:07:58PM +0300, Michael S. Tsirkin wrote:
> > > > > This code is reached from kvm_cpu_get_interrupt(), but this function will
> > > > > not be called in above scenario.
> > > > 
> > > > I think I see. So this shall fix it also makes code cleaner
> > > > (no -2 hack). Right? kvm_apic_has_interrupt is called correct?
> > > > 
> > > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > > > index b4f7013..5a38e34 100644
> > > > --- a/arch/x86/kvm/lapic.c
> > > > +++ b/arch/x86/kvm/lapic.c
> > > > @@ -1273,8 +1273,12 @@ int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu)
> > > >  	highest_irr = apic_find_highest_irr(apic);
> > > >  	if (highest_irr == -1)
> > > >  		return -1;
> > > > -	if (((highest_irr & 0xF0) <= apic_get_reg(apic, APIC_PROCPRI)))
> > > > -		return -2;
> > > > +	/* Detect interrupt nesting and disable EOI optimization */
> > > > +	if ((highest_irr & 0xF0) <= apic_get_reg(apic, APIC_PROCPRI)) {
> > > > +		if (pv_eoi_enabled(vcpu))
> > > > +			pv_eoi_clr_pending(vcpu);
> > > > +		return -1;
> > > > +	}
> > > >  	return highest_irr;
> > > >  }
> > > >  
> > > I do not like it. kvm_apic_has_interrupt() does not suppose to mutate
> > > state.
> >
> > OK, so let's rename it so it's clear it can mutate state?
> >
> 
> No, let's refactor this so it makes sense.  The {has|get}_interrupt
> split is the cause of the problem, I think.  We need a single function,
> with callbacks that are called when an event happens.  The callbacks can
> request an irq window exit, inject an interrupt, play with pveoi, or
> cause a #vmexit.

OK will look into this next week.

> -- 
> error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv4 3/5] kvm: host side for eoi optimization
  2012-05-17  9:10                           ` Michael S. Tsirkin
@ 2012-05-17  9:12                             ` Gleb Natapov
  2012-05-17  9:25                               ` Avi Kivity
  0 siblings, 1 reply; 39+ messages in thread
From: Gleb Natapov @ 2012-05-17  9:12 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Avi Kivity, Marcelo Tosatti, x86, kvm, Ingo Molnar,
	H. Peter Anvin, Linus Torvalds, linux-kernel

On Thu, May 17, 2012 at 12:10:55PM +0300, Michael S. Tsirkin wrote:
> On Thu, May 17, 2012 at 10:57:33AM +0300, Avi Kivity wrote:
> > On 05/17/2012 10:49 AM, Michael S. Tsirkin wrote:
> > > On Thu, May 17, 2012 at 10:28:41AM +0300, Gleb Natapov wrote:
> > > > On Wed, May 16, 2012 at 10:07:58PM +0300, Michael S. Tsirkin wrote:
> > > > > > This code is reached from kvm_cpu_get_interrupt(), but this function will
> > > > > > not be called in above scenario.
> > > > > 
> > > > > I think I see. So this shall fix it also makes code cleaner
> > > > > (no -2 hack). Right? kvm_apic_has_interrupt is called correct?
> > > > > 
> > > > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > > > > index b4f7013..5a38e34 100644
> > > > > --- a/arch/x86/kvm/lapic.c
> > > > > +++ b/arch/x86/kvm/lapic.c
> > > > > @@ -1273,8 +1273,12 @@ int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu)
> > > > >  	highest_irr = apic_find_highest_irr(apic);
> > > > >  	if (highest_irr == -1)
> > > > >  		return -1;
> > > > > -	if (((highest_irr & 0xF0) <= apic_get_reg(apic, APIC_PROCPRI)))
> > > > > -		return -2;
> > > > > +	/* Detect interrupt nesting and disable EOI optimization */
> > > > > +	if ((highest_irr & 0xF0) <= apic_get_reg(apic, APIC_PROCPRI)) {
> > > > > +		if (pv_eoi_enabled(vcpu))
> > > > > +			pv_eoi_clr_pending(vcpu);
> > > > > +		return -1;
> > > > > +	}
> > > > >  	return highest_irr;
> > > > >  }
> > > > >  
> > > > I do not like it. kvm_apic_has_interrupt() does not suppose to mutate
> > > > state.
> > >
> > > OK, so let's rename it so it's clear it can mutate state?
> > >
> > 
> > No, let's refactor this so it makes sense.  The {has|get}_interrupt
> > split is the cause of the problem, I think.  We need a single function,
> > with callbacks that are called when an event happens.  The callbacks can
> > request an irq window exit, inject an interrupt, play with pveoi, or
> > cause a #vmexit.
> 
> OK will look into this next week.
> 
What about looking into my suggestion with setting vcpu->request bit? I
think totally refactoring irq handling path should not be done as part
of this series.

--
			Gleb.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv4 3/5] kvm: host side for eoi optimization
  2012-05-17  8:07                           ` Gleb Natapov
@ 2012-05-17  9:24                             ` Avi Kivity
  2012-05-17  9:34                               ` Gleb Natapov
  0 siblings, 1 reply; 39+ messages in thread
From: Avi Kivity @ 2012-05-17  9:24 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Michael S. Tsirkin, Marcelo Tosatti, x86, kvm, Ingo Molnar,
	H. Peter Anvin, Linus Torvalds, linux-kernel

On 05/17/2012 11:07 AM, Gleb Natapov wrote:
> > 
> > No, let's refactor this so it makes sense.  The {has|get}_interrupt
> > split is the cause of the problem, I think.  We need a single function,
> > with callbacks that are called when an event happens.  The callbacks can
> > request an irq window exit, inject an interrupt, play with pveoi, or
> > cause a #vmexit.
> > 
> Not sure what do you mean here. I kind of like the code we have now, but
> this may be because I understand it :)

Right now we have

   if (has_interrupt)
       do something
   if (get_interrupt)
       do_something_else

this duplicates some of the logic and causes non-atomicty (which isn't a
problem per se, but requires us to think of what happens if conditions
change between the two steps).

What I'm thinking of is

   void process_interrupt(bool (*handle)());

Where the return value tells us whether the interrupt was accepted by
the handler.  The callback could decide to enable the irq window, to
queue the interrupt, or to #vmexit (note that the latter can return
either true or false, depending on whether vmx is configured to ack the
interrupt or not; svm would return true here if interrupts are intercepted).

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv4 3/5] kvm: host side for eoi optimization
  2012-05-17  9:12                             ` Gleb Natapov
@ 2012-05-17  9:25                               ` Avi Kivity
  0 siblings, 0 replies; 39+ messages in thread
From: Avi Kivity @ 2012-05-17  9:25 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Michael S. Tsirkin, Marcelo Tosatti, x86, kvm, Ingo Molnar,
	H. Peter Anvin, Linus Torvalds, linux-kernel

On 05/17/2012 12:12 PM, Gleb Natapov wrote:
> > 
> > OK will look into this next week.
> > 
> What about looking into my suggestion with setting vcpu->request bit? I
> think totally refactoring irq handling path should not be done as part
> of this series.
>

Certainly not as part of it, but maybe as a prerequisite.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv4 3/5] kvm: host side for eoi optimization
  2012-05-16 11:46 ` [PATCHv4 3/5] kvm: host side for eoi optimization Michael S. Tsirkin
  2012-05-16 15:49   ` Marcelo Tosatti
@ 2012-05-17  9:28   ` Avi Kivity
  1 sibling, 0 replies; 39+ messages in thread
From: Avi Kivity @ 2012-05-17  9:28 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: x86, kvm, Ingo Molnar, H. Peter Anvin, Marcelo Tosatti, gleb,
	Linus Torvalds, linux-kernel

On 05/16/2012 02:46 PM, Michael S. Tsirkin wrote:
> Implementation of PV EOI using shared memory.
> This reduces the number of exits an interrupt
> causes as much as by half.
>
> The idea is simple: there's a bit, per APIC, in guest memory,
> that tells the guest that it does not need EOI.
> We set it before injecting an interrupt and clear
> before injecting a nested one. Guest tests it using
> a test and clear operation - this is necessary
> so that host can detect interrupt nesting -
> and if set, it can skip the EOI MSR.
>
> There's a new MSR to set the address of said register
> in guest memory. Otherwise not much changed:
> - Guest EOI is not required
> - Register is tested & ISR is automatically cleared on exit
>
> For testing results see description of previous patch
> 'kvm_para: guest side for eoi avoidance'.
>
>  
> +	/*
> +	 * It's legal for guest to ignore the PV EOI optimization
> +	 * and signal EOI by APIC write. If this happens, clear
> +	 * PV EOI on guest's behalf.
> +	 */
> +	if (pv_eoi_enabled(apic->vcpu))
> +		pv_eoi_clr_pending(apic->vcpu);

I'm a little worried about all the clr_pending() calls scattered
around.  What happens if we forget one?  In particular, we might miss
one on nested vmentry.

A safer path is to always clear it, but to enable it again during
reentry if all conditions are satisified.  Might be a little slower though.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv4 3/5] kvm: host side for eoi optimization
  2012-05-17  9:24                             ` Avi Kivity
@ 2012-05-17  9:34                               ` Gleb Natapov
  0 siblings, 0 replies; 39+ messages in thread
From: Gleb Natapov @ 2012-05-17  9:34 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Michael S. Tsirkin, Marcelo Tosatti, x86, kvm, Ingo Molnar,
	H. Peter Anvin, Linus Torvalds, linux-kernel

On Thu, May 17, 2012 at 12:24:30PM +0300, Avi Kivity wrote:
> On 05/17/2012 11:07 AM, Gleb Natapov wrote:
> > > 
> > > No, let's refactor this so it makes sense.  The {has|get}_interrupt
> > > split is the cause of the problem, I think.  We need a single function,
> > > with callbacks that are called when an event happens.  The callbacks can
> > > request an irq window exit, inject an interrupt, play with pveoi, or
> > > cause a #vmexit.
> > > 
> > Not sure what do you mean here. I kind of like the code we have now, but
> > this may be because I understand it :)
> 
> Right now we have
> 
>    if (has_interrupt)
>        do something
>    if (get_interrupt)
>        do_something_else
> 
Not exactly. Now we have:
  if (has_interrupt && can inject)
    inject(get_interrupt())
  if (still has_interrupt)
    notify me when I can inject it.

There is not if(get_interrupt).

> this duplicates some of the logic and causes non-atomicty (which isn't a
> problem per se, but requires us to think of what happens if conditions
> change between the two steps).
> 
> What I'm thinking of is
> 
>    void process_interrupt(bool (*handle)());
Why we even want to pass different handle() to the function?

> 
> Where the return value tells us whether the interrupt was accepted by
> the handler.  The callback could decide to enable the irq window, to
> queue the interrupt, or to #vmexit (note that the latter can return
Queuing interrupt and requesting irq window ate not mutually exclusive.

> either true or false, depending on whether vmx is configured to ack the
> interrupt or not; svm would return true here if interrupts are intercepted).
> 
> -- 
> error compiling committee.c: too many arguments to function

--
			Gleb.

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2012-05-17  9:34 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-16 11:45 [PATCHv4 0/5] apic: eoi optimization support Michael S. Tsirkin
2012-05-16 11:45 ` [PATCHv4 1/5] kvm_para: guest side for eoi avoidance Michael S. Tsirkin
2012-05-16 11:46 ` [PATCHv4 2/5] x86/bitops: note on __test_and_clear_bit atomicity Michael S. Tsirkin
2012-05-16 11:46 ` [PATCHv4 3/5] kvm: host side for eoi optimization Michael S. Tsirkin
2012-05-16 15:49   ` Marcelo Tosatti
2012-05-16 16:22     ` Michael S. Tsirkin
2012-05-16 16:32       ` Gleb Natapov
2012-05-16 16:50         ` Michael S. Tsirkin
2012-05-16 17:04           ` Marcelo Tosatti
2012-05-16 17:21             ` Gleb Natapov
2012-05-16 17:23               ` Marcelo Tosatti
2012-05-16 17:34                 ` Gleb Natapov
2012-05-16 17:40                   ` Marcelo Tosatti
2012-05-16 17:48                     ` Gleb Natapov
2012-05-16 17:53                 ` Michael S. Tsirkin
2012-05-16 17:20       ` Marcelo Tosatti
2012-05-16 17:58         ` Michael S. Tsirkin
2012-05-16 18:15           ` Marcelo Tosatti
2012-05-16 18:25             ` Gleb Natapov
2012-05-16 18:29               ` Michael S. Tsirkin
2012-05-16 18:38                 ` Gleb Natapov
2012-05-16 19:07                   ` Michael S. Tsirkin
2012-05-16 21:37                     ` Marcelo Tosatti
2012-05-17  7:28                     ` Gleb Natapov
2012-05-17  7:49                       ` Michael S. Tsirkin
2012-05-17  7:56                         ` Gleb Natapov
2012-05-17  7:57                         ` Avi Kivity
2012-05-17  8:07                           ` Gleb Natapov
2012-05-17  9:24                             ` Avi Kivity
2012-05-17  9:34                               ` Gleb Natapov
2012-05-17  9:10                           ` Michael S. Tsirkin
2012-05-17  9:12                             ` Gleb Natapov
2012-05-17  9:25                               ` Avi Kivity
2012-05-16 18:28             ` Michael S. Tsirkin
2012-05-16 18:35             ` Michael S. Tsirkin
2012-05-17  9:28   ` Avi Kivity
2012-05-16 11:46 ` [PATCHv4 4/5] kvm: eoi msi documentation Michael S. Tsirkin
2012-05-16 11:46 ` [PATCHv4 5/5] kvm: only sync when attention bits set Michael S. Tsirkin
2012-05-16 15:41 ` [PATCHv4 0/5] apic: eoi optimization support Michael S. Tsirkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).