All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv6 0/8] kvm: eoi optimization support
@ 2012-06-03  7:27 Michael S. Tsirkin
  2012-06-03  7:27 ` [PATCHv6 1/8] kvm: document lapic regs field Michael S. Tsirkin
                   ` (7 more replies)
  0 siblings, 8 replies; 23+ messages in thread
From: Michael S. Tsirkin @ 2012-06-03  7:27 UTC (permalink / raw)
  To: x86, kvm
  Cc: Ingo Molnar, H. Peter Anvin, Avi Kivity, Marcelo Tosatti, gleb,
	Linus Torvalds, linux-kernel, Thomas Gleixner

I'm looking at reducing the interrupt overhead for virtualized guests:
some workloads spend a large part of their time processing interrupts.

On kvm, an EOI write from the guest causes an expensive exit to host; we
avoid this using shared memory.

The patches work fine on my boxes. See individual patches
for perf tests. You need to patch qemu to whitelist the kvm feature.
qemu patch was sent separately.

The patches are against Linus's master and apply to kvm.git
cleanly.  The last patch in the series, supplying the host
part, also depends on the ISR optimization patch that I
have for convenience included in the series (patch 2),
I also included a documentation patch (patch 1) - it is
here since it clarifies patch 2.  This revision does not yet address
Thomas's idea of reworking the APIC page handling.  Changes to this
optimization would require reworking this last patch in the series.

The rest of the patchset has not changed since v2.

Thanks,
MST

Changes from v5:
	Clear PV EOI when we cancel interrupts.
		Pointed out by Gleb.
	Always set ISR cache when we inject an interrupt.
		Suggested by Ronen Hod.

Changes from v4:
	Turn off PV EOI on each exit. Turn it back on when safe.
		Suggested by Avi.
	Address bug with nested interrupts pointed out by Marcelo.

Changes from v3:
	Address review comments by Marcelo:
		Multiple cosmetic changes eoi -> pv_eoi
		Added multiple comments
Changes from v2:
	Kill guest with GP on an illegal MSR value
	Add documentation

Changes from v1:
	Add host side patch to series
	Remove kvm-specific __test_and_clear_bit, document
	that x86 one does what we want already
	Clear msr on cpu unplug

Michael S. Tsirkin (8):
  kvm: document lapic regs field
  kvm: optimize ISR lookups
  kvm_para: guest side for eoi avoidance
  x86/bitops: note on __test_and_clear_bit atomicity
  kvm: eoi msi documentation
  kvm: only sync when attention bits set
  kvm: rearrange injection cancelling code
  kvm: host side for eoi optimization

 Documentation/virtual/kvm/msr.txt |   32 ++++++
 arch/x86/include/asm/bitops.h     |   13 ++-
 arch/x86/include/asm/kvm_host.h   |   12 +++
 arch/x86/include/asm/kvm_para.h   |    7 ++
 arch/x86/kernel/kvm.c             |   51 ++++++++++-
 arch/x86/kvm/cpuid.c              |    1 +
 arch/x86/kvm/lapic.c              |  188 +++++++++++++++++++++++++++++++++++-
 arch/x86/kvm/lapic.h              |   11 ++
 arch/x86/kvm/trace.h              |   34 +++++++
 arch/x86/kvm/x86.c                |   20 +++-
 10 files changed, 353 insertions(+), 16 deletions(-)

-- 
MST

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCHv6 1/8] kvm: document lapic regs field
  2012-06-03  7:27 [PATCHv6 0/8] kvm: eoi optimization support Michael S. Tsirkin
@ 2012-06-03  7:27 ` Michael S. Tsirkin
  2012-06-03  7:27 ` [PATCHv6 2/8] kvm: optimize ISR lookups Michael S. Tsirkin
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 23+ messages in thread
From: Michael S. Tsirkin @ 2012-06-03  7:27 UTC (permalink / raw)
  To: x86, kvm
  Cc: Ingo Molnar, H. Peter Anvin, Avi Kivity, Marcelo Tosatti, gleb,
	Linus Torvalds, linux-kernel, Thomas Gleixner

The logic in find_highest_vector looks
strange until you realize the reason for the
weird memory layout, which is because this is
what the CPU microcode expects.

Add a comment so this stops tripping people up.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 arch/x86/kvm/lapic.h |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 6f4ce25..d29da25 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -13,6 +13,11 @@ struct kvm_lapic {
 	u32 divide_count;
 	struct kvm_vcpu *vcpu;
 	bool irr_pending;
+	/**
+	 * APIC register page.  The layout matches the register layout seen by
+	 * the guest 1:1, because it is accessed by the vmx microcode.
+	 * Note: Only one register, the TPR, is used by the microcode.
+	 */
 	void *regs;
 	gpa_t vapic_addr;
 	struct page *vapic_page;
-- 
MST


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCHv6 2/8] kvm: optimize ISR lookups
  2012-06-03  7:27 [PATCHv6 0/8] kvm: eoi optimization support Michael S. Tsirkin
  2012-06-03  7:27 ` [PATCHv6 1/8] kvm: document lapic regs field Michael S. Tsirkin
@ 2012-06-03  7:27 ` Michael S. Tsirkin
  2012-06-12 21:08   ` Marcelo Tosatti
  2012-06-03  7:28 ` [PATCHv6 3/8] kvm_para: guest side for eoi avoidance Michael S. Tsirkin
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Michael S. Tsirkin @ 2012-06-03  7:27 UTC (permalink / raw)
  To: x86, kvm
  Cc: Ingo Molnar, H. Peter Anvin, Avi Kivity, Marcelo Tosatti, gleb,
	Linus Torvalds, linux-kernel, Thomas Gleixner

We perform ISR lookups twice: during interrupt
injection and on EOI. Typical workloads only have
a single bit set there. So we can avoid ISR scans by
1. counting bits as we set/clear them in ISR
2. on set, caching the injected vector number
3. on clear, invalidating the cache

The real purpose of this is enabling PV EOI
which needs to quickly validate the vector.
But non PV guests also benefit: with this patch,
and without interrupt nesting, apic_find_highest_isr
will always return immediately without scanning ISR.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---

This revision is slightly reworked from the last version
I sent: I don't invalidate the cache when there is more
than one bit set in ISR. This makes the code simpler,
makes cache valid in more cases and avoids a branch on data path.

Otherwise this is basically the same as v2: this revision does not
yet address Thomas's idea of reworking the APIC page handling.

 arch/x86/kvm/lapic.c |   48 ++++++++++++++++++++++++++++++++++++++++++++++--
 arch/x86/kvm/lapic.h |    4 ++++
 2 files changed, 50 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 93c1574..db54e63 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -107,6 +107,16 @@ static inline void apic_clear_vector(int vec, void *bitmap)
 	clear_bit(VEC_POS(vec), (bitmap) + REG_POS(vec));
 }
 
+static inline int __apic_test_and_set_vector(int vec, void *bitmap)
+{
+	return __test_and_set_bit(VEC_POS(vec), (bitmap) + REG_POS(vec));
+}
+
+static inline int __apic_test_and_clear_vector(int vec, void *bitmap)
+{
+	return __test_and_clear_bit(VEC_POS(vec), (bitmap) + REG_POS(vec));
+}
+
 static inline int apic_hw_enabled(struct kvm_lapic *apic)
 {
 	return (apic)->vcpu->arch.apic_base & MSR_IA32_APICBASE_ENABLE;
@@ -210,6 +220,16 @@ static int find_highest_vector(void *bitmap)
 		return fls(word[word_offset << 2]) - 1 + (word_offset << 5);
 }
 
+static u8 count_vectors(void *bitmap)
+{
+	u32 *word = bitmap;
+	int word_offset;
+	u8 count = 0;
+	for (word_offset = 0; word_offset < MAX_APIC_VECTOR >> 5; ++word_offset)
+		count += hweight32(word[word_offset << 2]);
+	return count;
+}
+
 static inline int apic_test_and_set_irr(int vec, struct kvm_lapic *apic)
 {
 	apic->irr_pending = true;
@@ -242,6 +262,22 @@ static inline void apic_clear_irr(int vec, struct kvm_lapic *apic)
 		apic->irr_pending = true;
 }
 
+static inline void apic_set_isr(int vec, struct kvm_lapic *apic)
+{
+	if (!__apic_test_and_set_vector(vec, apic->regs + APIC_ISR))
+		++apic->isr_count;
+	BUG_ON(apic->isr_count > MAX_APIC_VECTOR);
+	apic->isr_cache = vec;
+}
+
+static inline void apic_clear_isr(int vec, struct kvm_lapic *apic)
+{
+	if (__apic_test_and_clear_vector(vec, apic->regs + APIC_ISR))
+		--apic->isr_count;
+	BUG_ON(apic->isr_count < 0);
+	apic->isr_cache = -1;
+}
+
 int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu)
 {
 	struct kvm_lapic *apic = vcpu->arch.apic;
@@ -273,6 +309,10 @@ int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq)
 static inline int apic_find_highest_isr(struct kvm_lapic *apic)
 {
 	int result;
+	if (!apic->isr_count)
+		return -1;
+	if (likely(apic->isr_cache != -1))
+		return apic->isr_cache;
 
 	result = find_highest_vector(apic->regs + APIC_ISR);
 	ASSERT(result == -1 || result >= 16);
@@ -492,7 +532,7 @@ static void apic_set_eoi(struct kvm_lapic *apic)
 	if (vector == -1)
 		return;
 
-	apic_clear_vector(vector, apic->regs + APIC_ISR);
+	apic_clear_isr(vector, apic);
 	apic_update_ppr(apic);
 
 	if (!(apic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI) &&
@@ -1081,6 +1121,8 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu)
 		apic_set_reg(apic, APIC_TMR + 0x10 * i, 0);
 	}
 	apic->irr_pending = false;
+	apic->isr_count = 0;
+	apic->isr_cache = -1;
 	update_divide_count(apic);
 	atomic_set(&apic->lapic_timer.pending, 0);
 	if (kvm_vcpu_is_bsp(vcpu))
@@ -1248,7 +1290,7 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
 	if (vector == -1)
 		return -1;
 
-	apic_set_vector(vector, apic->regs + APIC_ISR);
+	apic_set_isr(vector, apic);
 	apic_update_ppr(apic);
 	apic_clear_irr(vector, apic);
 	return vector;
@@ -1267,6 +1309,8 @@ void kvm_apic_post_state_restore(struct kvm_vcpu *vcpu)
 	update_divide_count(apic);
 	start_apic_timer(apic);
 	apic->irr_pending = true;
+	apic->isr_count = count_vectors(apic->regs + APIC_ISR);
+	apic->isr_cache = -1;
 	kvm_make_request(KVM_REQ_EVENT, vcpu);
 }
 
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index d29da25..edc985e 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -13,6 +13,10 @@ struct kvm_lapic {
 	u32 divide_count;
 	struct kvm_vcpu *vcpu;
 	bool irr_pending;
+	/* Number of bits set in ISR. */
+	s16 isr_count;
+	/* The highest vector set in ISR; if -1 - invalid, must scan ISR. */
+	int isr_cache;
 	/**
 	 * APIC register page.  The layout matches the register layout seen by
 	 * the guest 1:1, because it is accessed by the vmx microcode.
-- 
MST


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCHv6 3/8] kvm_para: guest side for eoi avoidance
  2012-06-03  7:27 [PATCHv6 0/8] kvm: eoi optimization support Michael S. Tsirkin
  2012-06-03  7:27 ` [PATCHv6 1/8] kvm: document lapic regs field Michael S. Tsirkin
  2012-06-03  7:27 ` [PATCHv6 2/8] kvm: optimize ISR lookups Michael S. Tsirkin
@ 2012-06-03  7:28 ` Michael S. Tsirkin
  2012-06-03  7:28 ` [PATCHv6 4/8] x86/bitops: note on __test_and_clear_bit atomicity Michael S. Tsirkin
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 23+ messages in thread
From: Michael S. Tsirkin @ 2012-06-03  7:28 UTC (permalink / raw)
  To: x86, kvm
  Cc: Ingo Molnar, H. Peter Anvin, Avi Kivity, Marcelo Tosatti, gleb,
	Linus Torvalds, linux-kernel, Thomas Gleixner

The idea is simple: there's a bit, per APIC, in guest memory,
that tells the guest that it does not need EOI.
Guest tests it using a single est and clear operation - this is
necessary so that host can detect interrupt nesting - and if set, it can
skip the EOI MSR.

I run a simple microbenchmark to show exit reduction
(note: for testing, need to apply follow-up patch
'kvm: host side for eoi optimization' + a qemu patch
 I posted separately, on host):

Before:

Performance counter stats for 'sleep 1s':

            47,357 kvm:kvm_entry                                                [99.98%]
                 0 kvm:kvm_hypercall                                            [99.98%]
                 0 kvm:kvm_hv_hypercall                                         [99.98%]
             5,001 kvm:kvm_pio                                                  [99.98%]
                 0 kvm:kvm_cpuid                                                [99.98%]
            22,124 kvm:kvm_apic                                                 [99.98%]
            49,849 kvm:kvm_exit                                                 [99.98%]
            21,115 kvm:kvm_inj_virq                                             [99.98%]
                 0 kvm:kvm_inj_exception                                        [99.98%]
                 0 kvm:kvm_page_fault                                           [99.98%]
            22,937 kvm:kvm_msr                                                  [99.98%]
                 0 kvm:kvm_cr                                                   [99.98%]
                 0 kvm:kvm_pic_set_irq                                          [99.98%]
                 0 kvm:kvm_apic_ipi                                             [99.98%]
            22,207 kvm:kvm_apic_accept_irq                                      [99.98%]
            22,421 kvm:kvm_eoi                                                  [99.98%]
                 0 kvm:kvm_pv_eoi                                               [99.99%]
                 0 kvm:kvm_nested_vmrun                                         [99.99%]
                 0 kvm:kvm_nested_intercepts                                    [99.99%]
                 0 kvm:kvm_nested_vmexit                                        [99.99%]
                 0 kvm:kvm_nested_vmexit_inject                                    [99.99%]
                 0 kvm:kvm_nested_intr_vmexit                                    [99.99%]
                 0 kvm:kvm_invlpga                                              [99.99%]
                 0 kvm:kvm_skinit                                               [99.99%]
                57 kvm:kvm_emulate_insn                                         [99.99%]
                 0 kvm:vcpu_match_mmio                                          [99.99%]
                 0 kvm:kvm_userspace_exit                                       [99.99%]
                 2 kvm:kvm_set_irq                                              [99.99%]
                 2 kvm:kvm_ioapic_set_irq                                       [99.99%]
            23,609 kvm:kvm_msi_set_irq                                          [99.99%]
                 1 kvm:kvm_ack_irq                                              [99.99%]
               131 kvm:kvm_mmio                                                 [99.99%]
               226 kvm:kvm_fpu                                                  [100.00%]
                 0 kvm:kvm_age_page                                             [100.00%]
                 0 kvm:kvm_try_async_get_page                                    [100.00%]
                 0 kvm:kvm_async_pf_doublefault                                    [100.00%]
                 0 kvm:kvm_async_pf_not_present                                    [100.00%]
                 0 kvm:kvm_async_pf_ready                                       [100.00%]
                 0 kvm:kvm_async_pf_completed

       1.002100578 seconds time elapsed

After:

 Performance counter stats for 'sleep 1s':

            28,354 kvm:kvm_entry                                                [99.98%]
                 0 kvm:kvm_hypercall                                            [99.98%]
                 0 kvm:kvm_hv_hypercall                                         [99.98%]
             1,347 kvm:kvm_pio                                                  [99.98%]
                 0 kvm:kvm_cpuid                                                [99.98%]
             1,931 kvm:kvm_apic                                                 [99.98%]
            29,595 kvm:kvm_exit                                                 [99.98%]
            24,884 kvm:kvm_inj_virq                                             [99.98%]
                 0 kvm:kvm_inj_exception                                        [99.98%]
                 0 kvm:kvm_page_fault                                           [99.98%]
             1,986 kvm:kvm_msr                                                  [99.98%]
                 0 kvm:kvm_cr                                                   [99.98%]
                 0 kvm:kvm_pic_set_irq                                          [99.98%]
                 0 kvm:kvm_apic_ipi                                             [99.99%]
            25,953 kvm:kvm_apic_accept_irq                                      [99.99%]
            26,132 kvm:kvm_eoi                                                  [99.99%]
            26,593 kvm:kvm_pv_eoi                                               [99.99%]
                 0 kvm:kvm_nested_vmrun                                         [99.99%]
                 0 kvm:kvm_nested_intercepts                                    [99.99%]
                 0 kvm:kvm_nested_vmexit                                        [99.99%]
                 0 kvm:kvm_nested_vmexit_inject                                    [99.99%]
                 0 kvm:kvm_nested_intr_vmexit                                    [99.99%]
                 0 kvm:kvm_invlpga                                              [99.99%]
                 0 kvm:kvm_skinit                                               [99.99%]
               284 kvm:kvm_emulate_insn                                         [99.99%]
                68 kvm:vcpu_match_mmio                                          [99.99%]
                68 kvm:kvm_userspace_exit                                       [99.99%]
                 2 kvm:kvm_set_irq                                              [99.99%]
                 2 kvm:kvm_ioapic_set_irq                                       [99.99%]
            28,288 kvm:kvm_msi_set_irq                                          [99.99%]
                 1 kvm:kvm_ack_irq                                              [99.99%]
               131 kvm:kvm_mmio                                                 [100.00%]
               588 kvm:kvm_fpu                                                  [100.00%]
                 0 kvm:kvm_age_page                                             [100.00%]
                 0 kvm:kvm_try_async_get_page                                    [100.00%]
                 0 kvm:kvm_async_pf_doublefault                                    [100.00%]
                 0 kvm:kvm_async_pf_not_present                                    [100.00%]
                 0 kvm:kvm_async_pf_ready                                       [100.00%]
                 0 kvm:kvm_async_pf_completed

       1.002039622 seconds time elapsed

We see that # of exits is almost halved.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 arch/x86/include/asm/bitops.h   |    6 +++-
 arch/x86/include/asm/kvm_para.h |    7 +++++
 arch/x86/kernel/kvm.c           |   51 ++++++++++++++++++++++++++++++++++++--
 3 files changed, 59 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h
index b97596e..c9c70ea 100644
--- a/arch/x86/include/asm/bitops.h
+++ b/arch/x86/include/asm/bitops.h
@@ -26,11 +26,13 @@
 #if __GNUC__ < 4 || (__GNUC__ == 4 && __GNUC_MINOR__ < 1)
 /* Technically wrong, but this avoids compilation errors on some gcc
    versions. */
-#define BITOP_ADDR(x) "=m" (*(volatile long *) (x))
+#define BITOP_ADDR_CONSTRAINT "=m"
 #else
-#define BITOP_ADDR(x) "+m" (*(volatile long *) (x))
+#define BITOP_ADDR_CONSTRAINT "+m"
 #endif
 
+#define BITOP_ADDR(x) BITOP_ADDR_CONSTRAINT (*(volatile long *) (x))
+
 #define ADDR				BITOP_ADDR(addr)
 
 /*
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 63ab166..2f7712e 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -22,6 +22,7 @@
 #define KVM_FEATURE_CLOCKSOURCE2        3
 #define KVM_FEATURE_ASYNC_PF		4
 #define KVM_FEATURE_STEAL_TIME		5
+#define KVM_FEATURE_PV_EOI		6
 
 /* The last 8 bits are used to indicate how to interpret the flags field
  * in pvclock structure. If no bits are set, all flags are ignored.
@@ -37,6 +38,7 @@
 #define MSR_KVM_SYSTEM_TIME_NEW 0x4b564d01
 #define MSR_KVM_ASYNC_PF_EN 0x4b564d02
 #define MSR_KVM_STEAL_TIME  0x4b564d03
+#define MSR_KVM_PV_EOI_EN      0x4b564d04
 
 struct kvm_steal_time {
 	__u64 steal;
@@ -89,6 +91,11 @@ struct kvm_vcpu_pv_apf_data {
 	__u32 enabled;
 };
 
+#define KVM_PV_EOI_BIT 0
+#define KVM_PV_EOI_MASK (0x1 << KVM_PV_EOI_BIT)
+#define KVM_PV_EOI_ENABLED KVM_PV_EOI_MASK
+#define KVM_PV_EOI_DISABLED 0x0
+
 #ifdef __KERNEL__
 #include <asm/processor.h>
 
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index e554e5a..85cd6ac 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -39,6 +39,8 @@
 #include <asm/desc.h>
 #include <asm/tlbflush.h>
 #include <asm/idle.h>
+#include <asm/apic.h>
+#include <asm/apicdef.h>
 
 static int kvmapf = 1;
 
@@ -283,6 +285,24 @@ static void kvm_register_steal_time(void)
 		cpu, __pa(st));
 }
 
+/* size alignment is implied but just to make it explicit. */
+static DEFINE_PER_CPU(unsigned long, kvm_apic_eoi) __aligned(2) =
+	KVM_PV_EOI_DISABLED;
+
+static void kvm_guest_apic_eoi_write(u32 reg, u32 val)
+{
+	/**
+	 * This relies on __test_and_clear_bit to modify the memory
+	 * in a way that is atomic with respect to the local CPU.
+	 * The hypervisor only accesses this memory from the local CPU so
+	 * there's no need for lock or memory barriers.
+	 * An optimization barrier is implied in apic write.
+	 */
+	if (__test_and_clear_bit(KVM_PV_EOI_BIT, &__get_cpu_var(kvm_apic_eoi)))
+		return;
+	apic->write(APIC_EOI, APIC_EOI_ACK);
+}
+
 void __cpuinit kvm_guest_cpu_init(void)
 {
 	if (!kvm_para_available())
@@ -300,11 +320,17 @@ void __cpuinit kvm_guest_cpu_init(void)
 		       smp_processor_id());
 	}
 
+	if (kvm_para_has_feature(KVM_FEATURE_PV_EOI)) {
+		__get_cpu_var(kvm_apic_eoi) = 0;
+		wrmsrl(MSR_KVM_PV_EOI_EN, __pa(&__get_cpu_var(kvm_apic_eoi)) |
+		       KVM_MSR_ENABLED);
+	}
+
 	if (has_steal_clock)
 		kvm_register_steal_time();
 }
 
-static void kvm_pv_disable_apf(void *unused)
+static void kvm_pv_disable_apf(void)
 {
 	if (!__get_cpu_var(apf_reason).enabled)
 		return;
@@ -316,11 +342,18 @@ static void kvm_pv_disable_apf(void *unused)
 	       smp_processor_id());
 }
 
+static void kvm_pv_guest_cpu_reboot(void *unused)
+{
+	if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
+		wrmsrl(MSR_KVM_PV_EOI_EN, 0);
+	kvm_pv_disable_apf();
+}
+
 static int kvm_pv_reboot_notify(struct notifier_block *nb,
 				unsigned long code, void *unused)
 {
 	if (code == SYS_RESTART)
-		on_each_cpu(kvm_pv_disable_apf, NULL, 1);
+		on_each_cpu(kvm_pv_guest_cpu_reboot, NULL, 1);
 	return NOTIFY_DONE;
 }
 
@@ -371,7 +404,9 @@ static void __cpuinit kvm_guest_cpu_online(void *dummy)
 static void kvm_guest_cpu_offline(void *dummy)
 {
 	kvm_disable_steal_time();
-	kvm_pv_disable_apf(NULL);
+	if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
+		wrmsrl(MSR_KVM_PV_EOI_EN, 0);
+	kvm_pv_disable_apf();
 	apf_task_wake_all();
 }
 
@@ -424,6 +459,16 @@ void __init kvm_guest_init(void)
 		pv_time_ops.steal_clock = kvm_steal_clock;
 	}
 
+	if (kvm_para_has_feature(KVM_FEATURE_PV_EOI)) {
+		struct apic **drv;
+
+		for (drv = __apicdrivers; drv < __apicdrivers_end; drv++) {
+			/* Should happen once for each apic */
+			WARN_ON((*drv)->eoi_write == kvm_guest_apic_eoi_write);
+			(*drv)->eoi_write = kvm_guest_apic_eoi_write;
+		}
+	}
+
 #ifdef CONFIG_SMP
 	smp_ops.smp_prepare_boot_cpu = kvm_smp_prepare_boot_cpu;
 	register_cpu_notifier(&kvm_cpu_notifier);
-- 
MST


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCHv6 4/8] x86/bitops: note on __test_and_clear_bit atomicity
  2012-06-03  7:27 [PATCHv6 0/8] kvm: eoi optimization support Michael S. Tsirkin
                   ` (2 preceding siblings ...)
  2012-06-03  7:28 ` [PATCHv6 3/8] kvm_para: guest side for eoi avoidance Michael S. Tsirkin
@ 2012-06-03  7:28 ` Michael S. Tsirkin
  2012-06-03  7:28 ` [PATCHv6 5/8] kvm: eoi msi documentation Michael S. Tsirkin
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 23+ messages in thread
From: Michael S. Tsirkin @ 2012-06-03  7:28 UTC (permalink / raw)
  To: x86, kvm
  Cc: Ingo Molnar, H. Peter Anvin, Avi Kivity, Marcelo Tosatti, gleb,
	Linus Torvalds, linux-kernel, Thomas Gleixner

__test_and_clear_bit is actually atomic with respect
to the local CPU. Add a note saying that KVM on x86
relies on this behaviour so people don't accidentaly break it.
Also warn not to rely on this in portable code.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 arch/x86/include/asm/bitops.h |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h
index c9c70ea..86f3a1e 100644
--- a/arch/x86/include/asm/bitops.h
+++ b/arch/x86/include/asm/bitops.h
@@ -264,6 +264,13 @@ static inline int test_and_clear_bit(int nr, volatile unsigned long *addr)
  * This operation is non-atomic and can be reordered.
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
+ *
+ * Note: the operation is performed atomically with respect to
+ * the local CPU, but not other CPUs. Portable code should not
+ * rely on this behaviour.
+ * KVM relies on this behaviour on x86 for modifying memory that is also
+ * accessed from a hypervisor on the same CPU if running in a VM: don't change
+ * this without also updating arch/x86/kernel/kvm.c
  */
 static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
 {
-- 
MST


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCHv6 5/8] kvm: eoi msi documentation
  2012-06-03  7:27 [PATCHv6 0/8] kvm: eoi optimization support Michael S. Tsirkin
                   ` (3 preceding siblings ...)
  2012-06-03  7:28 ` [PATCHv6 4/8] x86/bitops: note on __test_and_clear_bit atomicity Michael S. Tsirkin
@ 2012-06-03  7:28 ` Michael S. Tsirkin
  2012-06-12 22:24   ` Marcelo Tosatti
  2012-06-03  7:28 ` [PATCHv6 6/8] kvm: only sync when attention bits set Michael S. Tsirkin
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Michael S. Tsirkin @ 2012-06-03  7:28 UTC (permalink / raw)
  To: x86, kvm
  Cc: Ingo Molnar, H. Peter Anvin, Avi Kivity, Marcelo Tosatti, gleb,
	Linus Torvalds, linux-kernel, Thomas Gleixner

Document the new EOI MSR. Couldn't decide whether this change belongs
conceptually on guest or host side, so a separate patch.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 Documentation/virtual/kvm/msr.txt |   32 ++++++++++++++++++++++++++++++++
 1 files changed, 32 insertions(+), 0 deletions(-)

diff --git a/Documentation/virtual/kvm/msr.txt b/Documentation/virtual/kvm/msr.txt
index 96b41bd..f202a22 100644
--- a/Documentation/virtual/kvm/msr.txt
+++ b/Documentation/virtual/kvm/msr.txt
@@ -223,3 +223,35 @@ MSR_KVM_STEAL_TIME: 0x4b564d03
 		steal: the amount of time in which this vCPU did not run, in
 		nanoseconds. Time during which the vcpu is idle, will not be
 		reported as steal time.
+
+MSR_KVM_EOI_EN: 0x4b564d04
+	data: Bit 0 is 1 when PV end of interrupt is enabled on the vcpu; 0
+	when disabled.  When enabled, bits 63-1 hold 2-byte aligned physical address
+	of a 2 byte memory area which must be in guest RAM and must be zeroed.
+
+	The first, least significant bit of 2 byte memory location will be
+	written to by the hypervisor, typically at the time of interrupt
+	injection.  Value of 1 means that guest can skip writing EOI to the apic
+	(using MSR or MMIO write); instead, it is sufficient to signal
+	EOI by clearing the bit in guest memory - this location will
+	later be polled by the hypervisor.
+	Value of 0 means that the EOI write is required.
+
+	It is always safe for the guest to ignore the optimization and perform
+	the APIC EOI write anyway.
+
+	Hypervisor is guaranteed to only modify this least
+	significant bit while in the current VCPU context, this means that
+	guest does not need to use either lock prefix or memory ordering
+	primitives to synchronise with the hypervisor.
+
+	However, hypervisor can set and clear this memory bit at any time:
+	therefore to make sure hypervisor does not interrupt the
+	guest and clear the least significant bit in the memory area
+	in the window between guest testing it to detect
+	whether it can skip EOI apic write and between guest
+	clearing it to signal EOI to the hypervisor,
+	guest must both read the least sgnificant bit in the memory area and
+	clear it using a single CPU instruction, such as test and clear, or
+	compare and exchange.
+
-- 
MST


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCHv6 6/8] kvm: only sync when attention bits set
  2012-06-03  7:27 [PATCHv6 0/8] kvm: eoi optimization support Michael S. Tsirkin
                   ` (4 preceding siblings ...)
  2012-06-03  7:28 ` [PATCHv6 5/8] kvm: eoi msi documentation Michael S. Tsirkin
@ 2012-06-03  7:28 ` Michael S. Tsirkin
  2012-06-12 22:27   ` Marcelo Tosatti
  2012-06-03  7:28 ` [PATCHv6 7/8] kvm: rearrange injection cancelling code Michael S. Tsirkin
  2012-06-03  7:28 ` [PATCHv6 8/8] kvm: host side for eoi optimization Michael S. Tsirkin
  7 siblings, 1 reply; 23+ messages in thread
From: Michael S. Tsirkin @ 2012-06-03  7:28 UTC (permalink / raw)
  To: x86, kvm
  Cc: Ingo Molnar, H. Peter Anvin, Avi Kivity, Marcelo Tosatti, gleb,
	Linus Torvalds, linux-kernel, Thomas Gleixner

Commit eb0dc6d0368072236dcd086d7fdc17fd3c4574d4 introduced apic
attention bitmask but kvm still syncs lapic unconditionally.
As that commit suggested and in anticipation of adding more attention
bits, only sync lapic if(apic_attention).

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 arch/x86/kvm/x86.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index be6d549..2f70861 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5388,7 +5388,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 	if (unlikely(vcpu->arch.tsc_always_catchup))
 		kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
 
-	kvm_lapic_sync_from_vapic(vcpu);
+	if (unlikely(vcpu->arch.apic_attention))
+		kvm_lapic_sync_from_vapic(vcpu);
 
 	r = kvm_x86_ops->handle_exit(vcpu);
 out:
-- 
MST


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCHv6 7/8] kvm: rearrange injection cancelling code
  2012-06-03  7:27 [PATCHv6 0/8] kvm: eoi optimization support Michael S. Tsirkin
                   ` (5 preceding siblings ...)
  2012-06-03  7:28 ` [PATCHv6 6/8] kvm: only sync when attention bits set Michael S. Tsirkin
@ 2012-06-03  7:28 ` Michael S. Tsirkin
  2012-06-03  7:28 ` [PATCHv6 8/8] kvm: host side for eoi optimization Michael S. Tsirkin
  7 siblings, 0 replies; 23+ messages in thread
From: Michael S. Tsirkin @ 2012-06-03  7:28 UTC (permalink / raw)
  To: x86, kvm
  Cc: Ingo Molnar, H. Peter Anvin, Avi Kivity, Marcelo Tosatti, gleb,
	Linus Torvalds, linux-kernel, Thomas Gleixner

Each time we need to cancel injection we invoke same code
(cancel_injection callback).  Move it towards the end of function using
the familiar goto on error pattern.

Will make it easier to do more cleanups for PV EOI.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 arch/x86/kvm/x86.c |   10 ++++++----
 1 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2f70861..67824f0 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5296,8 +5296,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 
 	r = kvm_mmu_reload(vcpu);
 	if (unlikely(r)) {
-		kvm_x86_ops->cancel_injection(vcpu);
-		goto out;
+		goto cancel_injection;
 	}
 
 	preempt_disable();
@@ -5322,9 +5321,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 		smp_wmb();
 		local_irq_enable();
 		preempt_enable();
-		kvm_x86_ops->cancel_injection(vcpu);
 		r = 1;
-		goto out;
+		goto cancel_injection;
 	}
 
 	srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
@@ -5392,6 +5390,10 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 		kvm_lapic_sync_from_vapic(vcpu);
 
 	r = kvm_x86_ops->handle_exit(vcpu);
+	return r;
+
+cancel_injection:
+	kvm_x86_ops->cancel_injection(vcpu);
 out:
 	return r;
 }
-- 
MST


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCHv6 8/8] kvm: host side for eoi optimization
  2012-06-03  7:27 [PATCHv6 0/8] kvm: eoi optimization support Michael S. Tsirkin
                   ` (6 preceding siblings ...)
  2012-06-03  7:28 ` [PATCHv6 7/8] kvm: rearrange injection cancelling code Michael S. Tsirkin
@ 2012-06-03  7:28 ` Michael S. Tsirkin
  2012-06-13 21:10   ` Marcelo Tosatti
  7 siblings, 1 reply; 23+ messages in thread
From: Michael S. Tsirkin @ 2012-06-03  7:28 UTC (permalink / raw)
  To: x86, kvm
  Cc: Ingo Molnar, H. Peter Anvin, Avi Kivity, Marcelo Tosatti, gleb,
	Linus Torvalds, linux-kernel, Thomas Gleixner

Implementation of PV EOI using shared memory.
This reduces the number of exits an interrupt
causes as much as by half.

The idea is simple: there's a bit, per APIC, in guest memory,
that tells the guest that it does not need EOI.
We set it before injecting an interrupt and clear
before injecting a nested one. Guest tests it using
a test and clear operation - this is necessary
so that host can detect interrupt nesting -
and if set, it can skip the EOI MSR.

There's a new MSR to set the address of said register
in guest memory. Otherwise not much changed:
- Guest EOI is not required
- Register is tested & ISR is automatically cleared on exit

For testing results see description of previous patch
'kvm_para: guest side for eoi avoidance'.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |   12 ++++
 arch/x86/kvm/cpuid.c            |    1 +
 arch/x86/kvm/lapic.c            |  140 +++++++++++++++++++++++++++++++++++++-
 arch/x86/kvm/lapic.h            |    2 +
 arch/x86/kvm/trace.h            |   34 ++++++++++
 arch/x86/kvm/x86.c              |    7 ++
 6 files changed, 192 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index db7c1f2..dfc066c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -175,6 +175,13 @@ enum {
 
 /* apic attention bits */
 #define KVM_APIC_CHECK_VAPIC	0
+/*
+ * The following bit is set with PV-EOI, unset on EOI.
+ * We detect PV-EOI changes by guest by comparing
+ * this bit with PV-EOI in guest memory.
+ * See the implementation in apic_update_pv_eoi.
+ */ 
+#define KVM_APIC_PV_EOI_PENDING	1
 
 /*
  * We don't want allocation failures within the mmu code, so we preallocate
@@ -484,6 +491,11 @@ struct kvm_vcpu_arch {
 		u64 length;
 		u64 status;
 	} osvw;
+
+	struct {
+		u64 msr_val;
+		struct gfn_to_hva_cache data;
+	} pv_eoi;
 };
 
 struct kvm_lpage_info {
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 7df1c6d..61ccbdf 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -409,6 +409,7 @@ static int do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 			     (1 << KVM_FEATURE_NOP_IO_DELAY) |
 			     (1 << KVM_FEATURE_CLOCKSOURCE2) |
 			     (1 << KVM_FEATURE_ASYNC_PF) |
+			     (1 << KVM_FEATURE_PV_EOI) |
 			     (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT);
 
 		if (sched_info_on())
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index db54e63..d16cfc5 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -306,6 +306,54 @@ int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq)
 			irq->level, irq->trig_mode);
 }
 
+static int pv_eoi_put_user(struct kvm_vcpu *vcpu, u8 val)
+{
+
+	return kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.pv_eoi.data, &val,
+				      sizeof(val));
+}
+
+static int pv_eoi_get_user(struct kvm_vcpu *vcpu, u8 *val)
+{
+
+	return kvm_read_guest_cached(vcpu->kvm, &vcpu->arch.pv_eoi.data, val,
+				      sizeof(*val));
+}
+
+static inline bool pv_eoi_enabled(struct kvm_vcpu *vcpu)
+{
+	return vcpu->arch.pv_eoi.msr_val & KVM_MSR_ENABLED;
+}
+
+static bool pv_eoi_get_pending(struct kvm_vcpu *vcpu)
+{
+	u8 val;
+	if (pv_eoi_get_user(vcpu, &val) < 0)
+		apic_debug("Can't read EOI MSR value: 0x%llx\n",
+			   (unsigned long long)vcpi->arch.pv_eoi.msr_val);
+	return val & 0x1;
+}
+
+static void pv_eoi_set_pending(struct kvm_vcpu *vcpu)
+{
+	if (pv_eoi_put_user(vcpu, KVM_PV_EOI_ENABLED) < 0) {
+		apic_debug("Can't set EOI MSR value: 0x%llx\n",
+			   (unsigned long long)vcpi->arch.pv_eoi.msr_val);
+		return;
+	}
+	__set_bit(KVM_APIC_PV_EOI_PENDING, &vcpu->arch.apic_attention);
+}
+
+static void pv_eoi_clr_pending(struct kvm_vcpu *vcpu)
+{
+	if (pv_eoi_put_user(vcpu, KVM_PV_EOI_DISABLED) < 0) {
+		apic_debug("Can't clear EOI MSR value: 0x%llx\n",
+			   (unsigned long long)vcpi->arch.pv_eoi.msr_val);
+		return;
+	}
+	__clear_bit(KVM_APIC_PV_EOI_PENDING, &vcpu->arch.apic_attention);
+}
+
 static inline int apic_find_highest_isr(struct kvm_lapic *apic)
 {
 	int result;
@@ -522,15 +570,18 @@ int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1, struct kvm_vcpu *vcpu2)
 	return vcpu1->arch.apic_arb_prio - vcpu2->arch.apic_arb_prio;
 }
 
-static void apic_set_eoi(struct kvm_lapic *apic)
+static int apic_set_eoi(struct kvm_lapic *apic)
 {
 	int vector = apic_find_highest_isr(apic);
+
+	trace_kvm_eoi(apic, vector);
+
 	/*
 	 * Not every write EOI will has corresponding ISR,
 	 * one example is when Kernel check timer on setup_IO_APIC
 	 */
 	if (vector == -1)
-		return;
+		return vector;
 
 	apic_clear_isr(vector, apic);
 	apic_update_ppr(apic);
@@ -545,6 +596,7 @@ static void apic_set_eoi(struct kvm_lapic *apic)
 		kvm_ioapic_update_eoi(apic->vcpu->kvm, vector, trigger_mode);
 	}
 	kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
+	return vector;
 }
 
 static void apic_send_ipi(struct kvm_lapic *apic)
@@ -1127,6 +1179,7 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu)
 	atomic_set(&apic->lapic_timer.pending, 0);
 	if (kvm_vcpu_is_bsp(vcpu))
 		vcpu->arch.apic_base |= MSR_IA32_APICBASE_BSP;
+	vcpu->arch.pv_eoi.msr_val = 0;
 	apic_update_ppr(apic);
 
 	vcpu->arch.apic_arb_prio = 0;
@@ -1327,11 +1380,51 @@ void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu)
 		hrtimer_start_expires(timer, HRTIMER_MODE_ABS);
 }
 
+/*
+ * apic_sync_pv_eoi_from_guest - called on vmexit or cancel interrupt
+ *
+ * Detect whether guest triggered PV EOI since the
+ * last entry. If yes, set EOI on guests's behalf.
+ * Clear PV EOI in guest memory in any case.
+ */
+static void apic_sync_pv_eoi_from_guest(struct kvm_vcpu *vcpu,
+					struct kvm_lapic *apic)
+{
+	bool pending;
+	int vector;
+	/*
+	 * PV EOI state is derived from KVM_APIC_PV_EOI_PENDING in host
+	 * and KVM_PV_EOI_ENABLED in guest memory as follows:
+	 *
+	 * KVM_APIC_PV_EOI_PENDING is unset:
+	 * 	-> host disabled PV EOI.
+	 * KVM_APIC_PV_EOI_PENDING is set, KVM_PV_EOI_ENABLED is set:
+	 * 	-> host enabled PV EOI, guest did not execute EOI yet.
+	 * KVM_APIC_PV_EOI_PENDING is set, KVM_PV_EOI_ENABLED is unset:
+	 * 	-> host enabled PV EOI, guest executed EOI.
+	 */
+	BUG_ON(!pv_eoi_enabled(vcpu));
+	pending = pv_eoi_get_pending(vcpu);
+	/*
+	 * Clear pending bit in any case: it will be set again on vmentry.
+	 * While this might not be ideal from performance point of view,
+	 * this makes sure pv eoi is only enabled when we know it's safe.
+	 */
+	pv_eoi_clr_pending(vcpu);
+	if (pending)
+		return;
+	vector = apic_set_eoi(apic);
+	trace_kvm_pv_eoi(apic, vector);
+}
+
 void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu)
 {
 	u32 data;
 	void *vapic;
 
+	if (test_bit(KVM_APIC_PV_EOI_PENDING, &vcpu->arch.apic_attention))
+		apic_sync_pv_eoi_from_guest(vcpu, vcpu->arch.apic);
+
 	if (!test_bit(KVM_APIC_CHECK_VAPIC, &vcpu->arch.apic_attention))
 		return;
 
@@ -1342,17 +1435,44 @@ void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu)
 	apic_set_tpr(vcpu->arch.apic, data & 0xff);
 }
 
+/*
+ * apic_sync_pv_eoi_to_guest - called before vmentry
+ *
+ * Detect whether it's safe to enable PV EOI and
+ * if yes do so.
+ */
+static void apic_sync_pv_eoi_to_guest(struct kvm_vcpu *vcpu,
+					struct kvm_lapic *apic)
+{
+	if (!pv_eoi_enabled(vcpu) ||
+	    /* IRR set or many bits in ISR: could be nested. */
+	    unlikely(apic->irr_pending) ||
+	    /* Cache not set: could be safe but we don't bother. */
+	    unlikely(apic->isr_cache == -1) ||
+	    /* Need EOI to update ioapic. */
+	    unlikely(kvm_ioapic_handles_vector(vcpu->kvm, apic->isr_cache))) {
+		/*
+		 * PV EOI was disabled by apic_sync_pv_eoi_from_guest
+		 * so we need not do anything here.
+		 */
+		return;
+	}
+
+	pv_eoi_set_pending(apic->vcpu);
+}
+
 void kvm_lapic_sync_to_vapic(struct kvm_vcpu *vcpu)
 {
 	u32 data, tpr;
 	int max_irr, max_isr;
-	struct kvm_lapic *apic;
+	struct kvm_lapic *apic = vcpu->arch.apic;
 	void *vapic;
 
+	apic_sync_pv_eoi_to_guest(vcpu, apic);
+
 	if (!test_bit(KVM_APIC_CHECK_VAPIC, &vcpu->arch.apic_attention))
 		return;
 
-	apic = vcpu->arch.apic;
 	tpr = apic_get_reg(apic, APIC_TASKPRI) & 0xff;
 	max_irr = apic_find_highest_irr(apic);
 	if (max_irr < 0)
@@ -1438,3 +1558,15 @@ int kvm_hv_vapic_msr_read(struct kvm_vcpu *vcpu, u32 reg, u64 *data)
 
 	return 0;
 }
+
+int kvm_lapic_enable_pv_eoi(struct kvm_vcpu *vcpu, u64 data)
+{
+	u64 addr = data & ~KVM_MSR_ENABLED;
+	if (pv_eoi_enabled(vcpu))
+		pv_eoi_clr_pending(vcpu);
+	vcpu->arch.pv_eoi.msr_val = data;
+	if (!pv_eoi_enabled(vcpu))
+		return 0;
+	return kvm_gfn_to_hva_cache_init(vcpu->kvm, &vcpu->arch.pv_eoi.data,
+					 addr);
+}
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index edc985e..ff8efd4 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -69,4 +69,6 @@ static inline bool kvm_hv_vapic_assist_page_enabled(struct kvm_vcpu *vcpu)
 {
 	return vcpu->arch.hv_vapic & HV_X64_MSR_APIC_ASSIST_PAGE_ENABLE;
 }
+
+int kvm_lapic_enable_pv_eoi(struct kvm_vcpu *vcpu, u64 data);
 #endif
diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index 911d264..851914e 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -517,6 +517,40 @@ TRACE_EVENT(kvm_apic_accept_irq,
 		  __entry->coalesced ? " (coalesced)" : "")
 );
 
+TRACE_EVENT(kvm_eoi,
+	    TP_PROTO(struct kvm_lapic *apic, int vector),
+	    TP_ARGS(apic, vector),
+
+	TP_STRUCT__entry(
+		__field(	__u32,		apicid		)
+		__field(	int,		vector		)
+	),
+
+	TP_fast_assign(
+		__entry->apicid		= apic->vcpu->vcpu_id;
+		__entry->vector		= vector;
+	),
+
+	TP_printk("apicid %x vector %d", __entry->apicid, __entry->vector)
+);
+
+TRACE_EVENT(kvm_pv_eoi,
+	    TP_PROTO(struct kvm_lapic *apic, int vector),
+	    TP_ARGS(apic, vector),
+
+	TP_STRUCT__entry(
+		__field(	__u32,		apicid		)
+		__field(	int,		vector		)
+	),
+
+	TP_fast_assign(
+		__entry->apicid		= apic->vcpu->vcpu_id;
+		__entry->vector		= vector;
+	),
+
+	TP_printk("apicid %x vector %d", __entry->apicid, __entry->vector)
+);
+
 /*
  * Tracepoint for nested VMRUN
  */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 67824f0..5790af8 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -795,6 +795,7 @@ static u32 msrs_to_save[] = {
 	MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW,
 	HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL,
 	HV_X64_MSR_APIC_ASSIST_PAGE, MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME,
+	MSR_KVM_PV_EOI_EN,
 	MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP,
 	MSR_STAR,
 #ifdef CONFIG_X86_64
@@ -1653,6 +1654,10 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
 		kvm_make_request(KVM_REQ_STEAL_UPDATE, vcpu);
 
 		break;
+	case MSR_KVM_PV_EOI_EN:
+		if (kvm_lapic_enable_pv_eoi(vcpu, data))
+			return 1;
+		break;
 
 	case MSR_IA32_MCG_CTL:
 	case MSR_IA32_MCG_STATUS:
@@ -5394,6 +5399,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 
 cancel_injection:
 	kvm_x86_ops->cancel_injection(vcpu);
+	if (unlikely(vcpu->arch.apic_attention))
+		kvm_lapic_sync_from_vapic(vcpu);
 out:
 	return r;
 }
-- 
MST

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCHv6 2/8] kvm: optimize ISR lookups
  2012-06-03  7:27 ` [PATCHv6 2/8] kvm: optimize ISR lookups Michael S. Tsirkin
@ 2012-06-12 21:08   ` Marcelo Tosatti
  2012-06-13  9:02     ` Michael S. Tsirkin
  0 siblings, 1 reply; 23+ messages in thread
From: Marcelo Tosatti @ 2012-06-12 21:08 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: x86, kvm, Ingo Molnar, H. Peter Anvin, Avi Kivity, gleb,
	Linus Torvalds, linux-kernel, Thomas Gleixner

On Sun, Jun 03, 2012 at 10:27:59AM +0300, Michael S. Tsirkin wrote:
> We perform ISR lookups twice: during interrupt
> injection and on EOI. Typical workloads only have
> a single bit set there. So we can avoid ISR scans by
> 1. counting bits as we set/clear them in ISR
> 2. on set, caching the injected vector number
> 3. on clear, invalidating the cache
> 
> The real purpose of this is enabling PV EOI
> which needs to quickly validate the vector.
> But non PV guests also benefit: with this patch,
> and without interrupt nesting, apic_find_highest_isr
> will always return immediately without scanning ISR.
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
> 
> This revision is slightly reworked from the last version
> I sent: I don't invalidate the cache when there is more
> than one bit set in ISR. This makes the code simpler,
> makes cache valid in more cases and avoids a branch on data path.
> 
> Otherwise this is basically the same as v2: this revision does not
> yet address Thomas's idea of reworking the APIC page handling.
> 
>  arch/x86/kvm/lapic.c |   48 ++++++++++++++++++++++++++++++++++++++++++++++--
>  arch/x86/kvm/lapic.h |    4 ++++
>  2 files changed, 50 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 93c1574..db54e63 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -107,6 +107,16 @@ static inline void apic_clear_vector(int vec, void *bitmap)
>  	clear_bit(VEC_POS(vec), (bitmap) + REG_POS(vec));
>  }
>  
> +static inline int __apic_test_and_set_vector(int vec, void *bitmap)
> +{
> +	return __test_and_set_bit(VEC_POS(vec), (bitmap) + REG_POS(vec));
> +}
> +
> +static inline int __apic_test_and_clear_vector(int vec, void *bitmap)
> +{
> +	return __test_and_clear_bit(VEC_POS(vec), (bitmap) + REG_POS(vec));
> +}
> +
>  static inline int apic_hw_enabled(struct kvm_lapic *apic)
>  {
>  	return (apic)->vcpu->arch.apic_base & MSR_IA32_APICBASE_ENABLE;
> @@ -210,6 +220,16 @@ static int find_highest_vector(void *bitmap)
>  		return fls(word[word_offset << 2]) - 1 + (word_offset << 5);
>  }
>  
> +static u8 count_vectors(void *bitmap)
> +{
> +	u32 *word = bitmap;
> +	int word_offset;
> +	u8 count = 0;
> +	for (word_offset = 0; word_offset < MAX_APIC_VECTOR >> 5; ++word_offset)
> +		count += hweight32(word[word_offset << 2]);
> +	return count;
> +}
> +
>  static inline int apic_test_and_set_irr(int vec, struct kvm_lapic *apic)
>  {
>  	apic->irr_pending = true;
> @@ -242,6 +262,22 @@ static inline void apic_clear_irr(int vec, struct kvm_lapic *apic)
>  		apic->irr_pending = true;
>  }
>  
> +static inline void apic_set_isr(int vec, struct kvm_lapic *apic)
> +{
> +	if (!__apic_test_and_set_vector(vec, apic->regs + APIC_ISR))
> +		++apic->isr_count;
> +	BUG_ON(apic->isr_count > MAX_APIC_VECTOR);
> +	apic->isr_cache = vec;
> +}
> +
> +static inline void apic_clear_isr(int vec, struct kvm_lapic *apic)
> +{
> +	if (__apic_test_and_clear_vector(vec, apic->regs + APIC_ISR))
> +		--apic->isr_count;
> +	BUG_ON(apic->isr_count < 0);
> +	apic->isr_cache = -1;
> +}
> +
>  int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu)
>  {
>  	struct kvm_lapic *apic = vcpu->arch.apic;
> @@ -273,6 +309,10 @@ int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq)
>  static inline int apic_find_highest_isr(struct kvm_lapic *apic)
>  {
>  	int result;
> +	if (!apic->isr_count)
> +		return -1;
> +	if (likely(apic->isr_cache != -1))
> +		return apic->isr_cache;

What about TPR updates that modify highest ISR available? They are not
fully covered. Ex: writes to TPR register not via CR8.

apic_update_ppr should reset cache?

Instead of isr_cache what about a highest_isr field?

When setting ISR:

if (apic->highest_isr < me)
	apic->highest_isr = me;

To be invalidated on TPR updates properly.

Its more meaningful.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCHv6 5/8] kvm: eoi msi documentation
  2012-06-03  7:28 ` [PATCHv6 5/8] kvm: eoi msi documentation Michael S. Tsirkin
@ 2012-06-12 22:24   ` Marcelo Tosatti
  0 siblings, 0 replies; 23+ messages in thread
From: Marcelo Tosatti @ 2012-06-12 22:24 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: x86, kvm, Ingo Molnar, H. Peter Anvin, Avi Kivity, gleb,
	Linus Torvalds, linux-kernel, Thomas Gleixner

On Sun, Jun 03, 2012 at 10:28:21AM +0300, Michael S. Tsirkin wrote:
> Document the new EOI MSR. Couldn't decide whether this change belongs
> conceptually on guest or host side, so a separate patch.
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>  Documentation/virtual/kvm/msr.txt |   32 ++++++++++++++++++++++++++++++++
>  1 files changed, 32 insertions(+), 0 deletions(-)
> 
> diff --git a/Documentation/virtual/kvm/msr.txt b/Documentation/virtual/kvm/msr.txt
> index 96b41bd..f202a22 100644
> --- a/Documentation/virtual/kvm/msr.txt
> +++ b/Documentation/virtual/kvm/msr.txt
> @@ -223,3 +223,35 @@ MSR_KVM_STEAL_TIME: 0x4b564d03
>  		steal: the amount of time in which this vCPU did not run, in
>  		nanoseconds. Time during which the vcpu is idle, will not be
>  		reported as steal time.
> +
> +MSR_KVM_EOI_EN: 0x4b564d04
> +	data: Bit 0 is 1 when PV end of interrupt is enabled on the vcpu; 0
> +	when disabled.  When enabled, bits 63-1 hold 2-byte aligned physical address
> +	of a 2 byte memory area which must be in guest RAM and must be zeroed.
> +
> +	The first, least significant bit of 2 byte memory location will be
> +	written to by the hypervisor, typically at the time of interrupt
> +	injection.  Value of 1 means that guest can skip writing EOI to the apic
> +	(using MSR or MMIO write); instead, it is sufficient to signal
> +	EOI by clearing the bit in guest memory - this location will
> +	later be polled by the hypervisor.
> +	Value of 0 means that the EOI write is required.
> +
> +	It is always safe for the guest to ignore the optimization and perform
> +	the APIC EOI write anyway.
> +
> +	Hypervisor is guaranteed to only modify this least
> +	significant bit while in the current VCPU context, this means that
> +	guest does not need to use either lock prefix or memory ordering
> +	primitives to synchronise with the hypervisor.
> +
> +	However, hypervisor can set and clear this memory bit at any time:
> +	therefore to make sure hypervisor does not interrupt the
> +	guest and clear the least significant bit in the memory area
> +	in the window between guest testing it to detect
> +	whether it can skip EOI apic write and between guest
> +	clearing it to signal EOI to the hypervisor,
> +	guest must both read the least sgnificant bit in the memory area and

typo

> +	clear it using a single CPU instruction, such as test and clear, or
> +	compare and exchange.
> +
> -- 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCHv6 6/8] kvm: only sync when attention bits set
  2012-06-03  7:28 ` [PATCHv6 6/8] kvm: only sync when attention bits set Michael S. Tsirkin
@ 2012-06-12 22:27   ` Marcelo Tosatti
  2012-06-13  8:19     ` Michael S. Tsirkin
  0 siblings, 1 reply; 23+ messages in thread
From: Marcelo Tosatti @ 2012-06-12 22:27 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: x86, kvm, Ingo Molnar, H. Peter Anvin, Avi Kivity, gleb,
	Linus Torvalds, linux-kernel, Thomas Gleixner

On Sun, Jun 03, 2012 at 10:28:29AM +0300, Michael S. Tsirkin wrote:
> Commit eb0dc6d0368072236dcd086d7fdc17fd3c4574d4 introduced apic
> attention bitmask but kvm still syncs lapic unconditionally.
> As that commit suggested and in anticipation of adding more attention
> bits, only sync lapic if(apic_attention).
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>  arch/x86/kvm/x86.c |    3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index be6d549..2f70861 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5388,7 +5388,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  	if (unlikely(vcpu->arch.tsc_always_catchup))
>  		kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
>  
> -	kvm_lapic_sync_from_vapic(vcpu);
> +	if (unlikely(vcpu->arch.apic_attention))
> +		kvm_lapic_sync_from_vapic(vcpu);

void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu)
{
        u32 data;
        void *vapic;

        if (!test_bit(KVM_APIC_CHECK_VAPIC, &vcpu->arch.apic_attention))
                return;

Please use unlikely more carefully, when a gain is measureable:
http://lwn.net/Articles/420019/


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCHv6 6/8] kvm: only sync when attention bits set
  2012-06-12 22:27   ` Marcelo Tosatti
@ 2012-06-13  8:19     ` Michael S. Tsirkin
  2012-06-13  8:35       ` Michael S. Tsirkin
  0 siblings, 1 reply; 23+ messages in thread
From: Michael S. Tsirkin @ 2012-06-13  8:19 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: x86, kvm, Ingo Molnar, H. Peter Anvin, Avi Kivity, gleb,
	Linus Torvalds, linux-kernel, Thomas Gleixner

On Tue, Jun 12, 2012 at 07:27:48PM -0300, Marcelo Tosatti wrote:
> On Sun, Jun 03, 2012 at 10:28:29AM +0300, Michael S. Tsirkin wrote:
> > Commit eb0dc6d0368072236dcd086d7fdc17fd3c4574d4 introduced apic
> > attention bitmask but kvm still syncs lapic unconditionally.
> > As that commit suggested and in anticipation of adding more attention
> > bits, only sync lapic if(apic_attention).
> > 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> >  arch/x86/kvm/x86.c |    3 ++-
> >  1 files changed, 2 insertions(+), 1 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index be6d549..2f70861 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -5388,7 +5388,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
> >  	if (unlikely(vcpu->arch.tsc_always_catchup))
> >  		kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
> >  
> > -	kvm_lapic_sync_from_vapic(vcpu);
> > +	if (unlikely(vcpu->arch.apic_attention))
> > +		kvm_lapic_sync_from_vapic(vcpu);
> 
> void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu)
> {
>         u32 data;
>         void *vapic;
> 
>         if (!test_bit(KVM_APIC_CHECK_VAPIC, &vcpu->arch.apic_attention))
>                 return;
> 
> Please use unlikely more carefully, when a gain is measureable:
> http://lwn.net/Articles/420019/

Do we have to measure every single thing?
Sometimes it's obvious: vapic is slow path, isn't it?

-- 
MST

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCHv6 6/8] kvm: only sync when attention bits set
  2012-06-13  8:19     ` Michael S. Tsirkin
@ 2012-06-13  8:35       ` Michael S. Tsirkin
  2012-06-13 20:53         ` Marcelo Tosatti
  0 siblings, 1 reply; 23+ messages in thread
From: Michael S. Tsirkin @ 2012-06-13  8:35 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: x86, kvm, Ingo Molnar, H. Peter Anvin, Avi Kivity, gleb,
	Linus Torvalds, linux-kernel, Thomas Gleixner

On Wed, Jun 13, 2012 at 11:19:24AM +0300, Michael S. Tsirkin wrote:
> On Tue, Jun 12, 2012 at 07:27:48PM -0300, Marcelo Tosatti wrote:
> > On Sun, Jun 03, 2012 at 10:28:29AM +0300, Michael S. Tsirkin wrote:
> > > Commit eb0dc6d0368072236dcd086d7fdc17fd3c4574d4 introduced apic
> > > attention bitmask but kvm still syncs lapic unconditionally.
> > > As that commit suggested and in anticipation of adding more attention
> > > bits, only sync lapic if(apic_attention).
> > > 
> > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > ---
> > >  arch/x86/kvm/x86.c |    3 ++-
> > >  1 files changed, 2 insertions(+), 1 deletions(-)
> > > 
> > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > index be6d549..2f70861 100644
> > > --- a/arch/x86/kvm/x86.c
> > > +++ b/arch/x86/kvm/x86.c
> > > @@ -5388,7 +5388,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
> > >  	if (unlikely(vcpu->arch.tsc_always_catchup))
> > >  		kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
> > >  
> > > -	kvm_lapic_sync_from_vapic(vcpu);
> > > +	if (unlikely(vcpu->arch.apic_attention))
> > > +		kvm_lapic_sync_from_vapic(vcpu);
> > 
> > void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu)
> > {
> >         u32 data;
> >         void *vapic;
> > 
> >         if (!test_bit(KVM_APIC_CHECK_VAPIC, &vcpu->arch.apic_attention))
> >                 return;
> > 
> > Please use unlikely more carefully, when a gain is measureable:
> > http://lwn.net/Articles/420019/
> 
> Do we have to measure every single thing?
> Sometimes it's obvious: vapic is slow path, isn't it?

Just to clarify the question: I think it's obvious this condition is
false more often than true. By how much, depends on the workload.
Do you think this is enough to tag this unlikely?


> -- 
> MST

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCHv6 2/8] kvm: optimize ISR lookups
  2012-06-12 21:08   ` Marcelo Tosatti
@ 2012-06-13  9:02     ` Michael S. Tsirkin
  2012-06-13  9:26       ` Michael S. Tsirkin
  2012-06-13 20:21       ` Marcelo Tosatti
  0 siblings, 2 replies; 23+ messages in thread
From: Michael S. Tsirkin @ 2012-06-13  9:02 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: x86, kvm, Ingo Molnar, H. Peter Anvin, Avi Kivity, gleb,
	Linus Torvalds, linux-kernel, Thomas Gleixner

On Tue, Jun 12, 2012 at 06:08:33PM -0300, Marcelo Tosatti wrote:
> On Sun, Jun 03, 2012 at 10:27:59AM +0300, Michael S. Tsirkin wrote:
> > We perform ISR lookups twice: during interrupt
> > injection and on EOI. Typical workloads only have
> > a single bit set there. So we can avoid ISR scans by
> > 1. counting bits as we set/clear them in ISR
> > 2. on set, caching the injected vector number
> > 3. on clear, invalidating the cache
> > 
> > The real purpose of this is enabling PV EOI
> > which needs to quickly validate the vector.
> > But non PV guests also benefit: with this patch,
> > and without interrupt nesting, apic_find_highest_isr
> > will always return immediately without scanning ISR.
> > 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> > 
> > This revision is slightly reworked from the last version
> > I sent: I don't invalidate the cache when there is more
> > than one bit set in ISR. This makes the code simpler,
> > makes cache valid in more cases and avoids a branch on data path.
> > 
> > Otherwise this is basically the same as v2: this revision does not
> > yet address Thomas's idea of reworking the APIC page handling.
> > 
> >  arch/x86/kvm/lapic.c |   48 ++++++++++++++++++++++++++++++++++++++++++++++--
> >  arch/x86/kvm/lapic.h |    4 ++++
> >  2 files changed, 50 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > index 93c1574..db54e63 100644
> > --- a/arch/x86/kvm/lapic.c
> > +++ b/arch/x86/kvm/lapic.c
> > @@ -107,6 +107,16 @@ static inline void apic_clear_vector(int vec, void *bitmap)
> >  	clear_bit(VEC_POS(vec), (bitmap) + REG_POS(vec));
> >  }
> >  
> > +static inline int __apic_test_and_set_vector(int vec, void *bitmap)
> > +{
> > +	return __test_and_set_bit(VEC_POS(vec), (bitmap) + REG_POS(vec));
> > +}
> > +
> > +static inline int __apic_test_and_clear_vector(int vec, void *bitmap)
> > +{
> > +	return __test_and_clear_bit(VEC_POS(vec), (bitmap) + REG_POS(vec));
> > +}
> > +
> >  static inline int apic_hw_enabled(struct kvm_lapic *apic)
> >  {
> >  	return (apic)->vcpu->arch.apic_base & MSR_IA32_APICBASE_ENABLE;
> > @@ -210,6 +220,16 @@ static int find_highest_vector(void *bitmap)
> >  		return fls(word[word_offset << 2]) - 1 + (word_offset << 5);
> >  }
> >  
> > +static u8 count_vectors(void *bitmap)
> > +{
> > +	u32 *word = bitmap;
> > +	int word_offset;
> > +	u8 count = 0;
> > +	for (word_offset = 0; word_offset < MAX_APIC_VECTOR >> 5; ++word_offset)
> > +		count += hweight32(word[word_offset << 2]);
> > +	return count;
> > +}
> > +
> >  static inline int apic_test_and_set_irr(int vec, struct kvm_lapic *apic)
> >  {
> >  	apic->irr_pending = true;
> > @@ -242,6 +262,22 @@ static inline void apic_clear_irr(int vec, struct kvm_lapic *apic)
> >  		apic->irr_pending = true;
> >  }
> >  
> > +static inline void apic_set_isr(int vec, struct kvm_lapic *apic)
> > +{
> > +	if (!__apic_test_and_set_vector(vec, apic->regs + APIC_ISR))
> > +		++apic->isr_count;
> > +	BUG_ON(apic->isr_count > MAX_APIC_VECTOR);
> > +	apic->isr_cache = vec;
> > +}
> > +
> > +static inline void apic_clear_isr(int vec, struct kvm_lapic *apic)
> > +{
> > +	if (__apic_test_and_clear_vector(vec, apic->regs + APIC_ISR))
> > +		--apic->isr_count;
> > +	BUG_ON(apic->isr_count < 0);
> > +	apic->isr_cache = -1;
> > +}
> > +
> >  int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu)
> >  {
> >  	struct kvm_lapic *apic = vcpu->arch.apic;
> > @@ -273,6 +309,10 @@ int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq)
> >  static inline int apic_find_highest_isr(struct kvm_lapic *apic)
> >  {
> >  	int result;
> > +	if (!apic->isr_count)
> > +		return -1;
> > +	if (likely(apic->isr_cache != -1))
> > +		return apic->isr_cache;
> 
> What about TPR updates that modify highest ISR available? They are not
> fully covered. Ex: writes to TPR register not via CR8.

Sorry, I don't yet understand what you are saying.

Why would TPR/PPR updates need to affect any of this:
TPR affects the injection of interrupts but if we don't inject
and interrupt we don't set ISR, right?

Maybe I misunderstand, I'm going to look more at the spec, but maybe
you could point out an example where this logic is wrong?

> 
> apic_update_ppr should reset cache?

I think this would kill the optimization as we call this all the time.
With this patch applied ISR was searched in less than 1% of the cases
in my testing, it almost always was either a cache hit or count == 0.

> Instead of isr_cache what about a highest_isr field?

> 
> When setting ISR:
> 
> if (apic->highest_isr < me)
> 	apic->highest_isr = me;
> 
> To be invalidated on TPR updates properly.
> 
> Its more meaningful.

OK, I'll rename it highest_isr but we do not need the if:

IIUC, ISR (in service register) bit is only set when we inject an
interrupt.  So the latest bit we set tells us exactly which isr is the
highest: it would not be set if it was not the highest.  We are simply
caching this last bit which to me looks cleaner than adding logic here:
if we did we'd also need to special-case the invalid value, so it
becomes tricky.

I'll add a comment here that explains the above.

As I said I don't yet understand why we need to invalidate on TPR updates.

-- 
MST

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCHv6 2/8] kvm: optimize ISR lookups
  2012-06-13  9:02     ` Michael S. Tsirkin
@ 2012-06-13  9:26       ` Michael S. Tsirkin
  2012-06-13 20:21       ` Marcelo Tosatti
  1 sibling, 0 replies; 23+ messages in thread
From: Michael S. Tsirkin @ 2012-06-13  9:26 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: x86, kvm, Ingo Molnar, H. Peter Anvin, Avi Kivity, gleb,
	Linus Torvalds, linux-kernel, Thomas Gleixner

On Wed, Jun 13, 2012 at 12:02:42PM +0300, Michael S. Tsirkin wrote:
> > Instead of isr_cache what about a highest_isr field?
> 
> > 
> > When setting ISR:
> > 
> > if (apic->highest_isr < me)
> > 	apic->highest_isr = me;
> > 
> > To be invalidated on TPR updates properly.
> > 
> > Its more meaningful.
> 
> OK, I'll rename it highest_isr

Even better - highest_isr_cache.
Makes it explicit that it can be invalid.

-- 
MST

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCHv6 2/8] kvm: optimize ISR lookups
  2012-06-13  9:02     ` Michael S. Tsirkin
  2012-06-13  9:26       ` Michael S. Tsirkin
@ 2012-06-13 20:21       ` Marcelo Tosatti
  1 sibling, 0 replies; 23+ messages in thread
From: Marcelo Tosatti @ 2012-06-13 20:21 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: x86, kvm, Ingo Molnar, H. Peter Anvin, Avi Kivity, gleb,
	Linus Torvalds, linux-kernel, Thomas Gleixner

On Wed, Jun 13, 2012 at 12:02:43PM +0300, Michael S. Tsirkin wrote:
> On Tue, Jun 12, 2012 at 06:08:33PM -0300, Marcelo Tosatti wrote:
> > On Sun, Jun 03, 2012 at 10:27:59AM +0300, Michael S. Tsirkin wrote:
> > > We perform ISR lookups twice: during interrupt
> > > injection and on EOI. Typical workloads only have
> > > a single bit set there. So we can avoid ISR scans by
> > > 1. counting bits as we set/clear them in ISR
> > > 2. on set, caching the injected vector number
> > > 3. on clear, invalidating the cache
> > > 
> > > The real purpose of this is enabling PV EOI
> > > which needs to quickly validate the vector.
> > > But non PV guests also benefit: with this patch,
> > > and without interrupt nesting, apic_find_highest_isr
> > > will always return immediately without scanning ISR.
> > > 
> > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > ---
> > > 
> > > This revision is slightly reworked from the last version
> > > I sent: I don't invalidate the cache when there is more
> > > than one bit set in ISR. This makes the code simpler,
> > > makes cache valid in more cases and avoids a branch on data path.
> > > 
> > > Otherwise this is basically the same as v2: this revision does not
> > > yet address Thomas's idea of reworking the APIC page handling.
> > > 
> > >  arch/x86/kvm/lapic.c |   48 ++++++++++++++++++++++++++++++++++++++++++++++--
> > >  arch/x86/kvm/lapic.h |    4 ++++
> > >  2 files changed, 50 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > > index 93c1574..db54e63 100644
> > > --- a/arch/x86/kvm/lapic.c
> > > +++ b/arch/x86/kvm/lapic.c
> > > @@ -107,6 +107,16 @@ static inline void apic_clear_vector(int vec, void *bitmap)
> > >  	clear_bit(VEC_POS(vec), (bitmap) + REG_POS(vec));
> > >  }
> > >  
> > > +static inline int __apic_test_and_set_vector(int vec, void *bitmap)
> > > +{
> > > +	return __test_and_set_bit(VEC_POS(vec), (bitmap) + REG_POS(vec));
> > > +}
> > > +
> > > +static inline int __apic_test_and_clear_vector(int vec, void *bitmap)
> > > +{
> > > +	return __test_and_clear_bit(VEC_POS(vec), (bitmap) + REG_POS(vec));
> > > +}
> > > +
> > >  static inline int apic_hw_enabled(struct kvm_lapic *apic)
> > >  {
> > >  	return (apic)->vcpu->arch.apic_base & MSR_IA32_APICBASE_ENABLE;
> > > @@ -210,6 +220,16 @@ static int find_highest_vector(void *bitmap)
> > >  		return fls(word[word_offset << 2]) - 1 + (word_offset << 5);
> > >  }
> > >  
> > > +static u8 count_vectors(void *bitmap)
> > > +{
> > > +	u32 *word = bitmap;
> > > +	int word_offset;
> > > +	u8 count = 0;
> > > +	for (word_offset = 0; word_offset < MAX_APIC_VECTOR >> 5; ++word_offset)
> > > +		count += hweight32(word[word_offset << 2]);
> > > +	return count;
> > > +}
> > > +
> > >  static inline int apic_test_and_set_irr(int vec, struct kvm_lapic *apic)
> > >  {
> > >  	apic->irr_pending = true;
> > > @@ -242,6 +262,22 @@ static inline void apic_clear_irr(int vec, struct kvm_lapic *apic)
> > >  		apic->irr_pending = true;
> > >  }
> > >  
> > > +static inline void apic_set_isr(int vec, struct kvm_lapic *apic)
> > > +{
> > > +	if (!__apic_test_and_set_vector(vec, apic->regs + APIC_ISR))
> > > +		++apic->isr_count;
> > > +	BUG_ON(apic->isr_count > MAX_APIC_VECTOR);
> > > +	apic->isr_cache = vec;
> > > +}
> > > +
> > > +static inline void apic_clear_isr(int vec, struct kvm_lapic *apic)
> > > +{
> > > +	if (__apic_test_and_clear_vector(vec, apic->regs + APIC_ISR))
> > > +		--apic->isr_count;
> > > +	BUG_ON(apic->isr_count < 0);
> > > +	apic->isr_cache = -1;
> > > +}
> > > +
> > >  int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu)
> > >  {
> > >  	struct kvm_lapic *apic = vcpu->arch.apic;
> > > @@ -273,6 +309,10 @@ int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq)
> > >  static inline int apic_find_highest_isr(struct kvm_lapic *apic)
> > >  {
> > >  	int result;
> > > +	if (!apic->isr_count)
> > > +		return -1;
> > > +	if (likely(apic->isr_cache != -1))
> > > +		return apic->isr_cache;
> > 
> > What about TPR updates that modify highest ISR available? They are not
> > fully covered. Ex: writes to TPR register not via CR8.
> 
> Sorry, I don't yet understand what you are saying.
> 
> Why would TPR/PPR updates need to affect any of this:
> TPR affects the injection of interrupts but if we don't inject
> and interrupt we don't set ISR, right?

They don't, ignore that comment.

> Maybe I misunderstand, I'm going to look more at the spec, but maybe
> you could point out an example where this logic is wrong?
> 
> > 
> > apic_update_ppr should reset cache?
> 
> I think this would kill the optimization as we call this all the time.
> With this patch applied ISR was searched in less than 1% of the cases
> in my testing, it almost always was either a cache hit or count == 0.
> 
> > Instead of isr_cache what about a highest_isr field?
> 
> > 
> > When setting ISR:
> > 
> > if (apic->highest_isr < me)
> > 	apic->highest_isr = me;
> > 
> > To be invalidated on TPR updates properly.
> > 
> > Its more meaningful.
> 
> OK, I'll rename it highest_isr but we do not need the if:
> 
> IIUC, ISR (in service register) bit is only set when we inject an
> interrupt.  So the latest bit we set tells us exactly which isr is the
> highest: it would not be set if it was not the highest.  We are simply
> caching this last bit which to me looks cleaner than adding logic here:
> if we did we'd also need to special-case the invalid value, so it
> becomes tricky.
> 
> I'll add a comment here that explains the above.
> 
> As I said I don't yet understand why we need to invalidate on TPR updates.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCHv6 6/8] kvm: only sync when attention bits set
  2012-06-13  8:35       ` Michael S. Tsirkin
@ 2012-06-13 20:53         ` Marcelo Tosatti
  2012-06-13 21:04           ` Michael S. Tsirkin
  0 siblings, 1 reply; 23+ messages in thread
From: Marcelo Tosatti @ 2012-06-13 20:53 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: x86, kvm, Ingo Molnar, H. Peter Anvin, Avi Kivity, gleb,
	Linus Torvalds, linux-kernel, Thomas Gleixner

On Wed, Jun 13, 2012 at 11:35:07AM +0300, Michael S. Tsirkin wrote:
> On Wed, Jun 13, 2012 at 11:19:24AM +0300, Michael S. Tsirkin wrote:
> > On Tue, Jun 12, 2012 at 07:27:48PM -0300, Marcelo Tosatti wrote:
> > > On Sun, Jun 03, 2012 at 10:28:29AM +0300, Michael S. Tsirkin wrote:
> > > > Commit eb0dc6d0368072236dcd086d7fdc17fd3c4574d4 introduced apic
> > > > attention bitmask but kvm still syncs lapic unconditionally.
> > > > As that commit suggested and in anticipation of adding more attention
> > > > bits, only sync lapic if(apic_attention).
> > > > 
> > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > > ---
> > > >  arch/x86/kvm/x86.c |    3 ++-
> > > >  1 files changed, 2 insertions(+), 1 deletions(-)
> > > > 
> > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > > index be6d549..2f70861 100644
> > > > --- a/arch/x86/kvm/x86.c
> > > > +++ b/arch/x86/kvm/x86.c
> > > > @@ -5388,7 +5388,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
> > > >  	if (unlikely(vcpu->arch.tsc_always_catchup))
> > > >  		kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
> > > >  
> > > > -	kvm_lapic_sync_from_vapic(vcpu);
> > > > +	if (unlikely(vcpu->arch.apic_attention))
> > > > +		kvm_lapic_sync_from_vapic(vcpu);
> > > 
> > > void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu)
> > > {
> > >         u32 data;
> > >         void *vapic;
> > > 
> > >         if (!test_bit(KVM_APIC_CHECK_VAPIC, &vcpu->arch.apic_attention))
> > >                 return;
> > > 
> > > Please use unlikely more carefully, when a gain is measureable:
> > > http://lwn.net/Articles/420019/
> > 
> > Do we have to measure every single thing?
> > Sometimes it's obvious: vapic is slow path, isn't it?
> 
> Just to clarify the question: I think it's obvious this condition is
> false more often than true. By how much, depends on the workload.
> Do you think this is enough to tag this unlikely?

Depends whether your processor supports flexpriority or not. I don't
want to argue in favour/against the particular instance

GCC docs:

"
— Built-in Function: long __builtin_expect (long exp, long c)

    You may use __builtin_expect to provide the compiler with branch
prediction information. In general, you should prefer to use actual
profile feedback for this (-fprofile-arcs), as programmers are
notoriously bad at predicting how their programs actually perform.
However, there are applications in which this data is hard to collect.
"

Lately half of branches in your patches are annotated.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCHv6 6/8] kvm: only sync when attention bits set
  2012-06-13 20:53         ` Marcelo Tosatti
@ 2012-06-13 21:04           ` Michael S. Tsirkin
  2012-06-13 23:38             ` Marcelo Tosatti
  0 siblings, 1 reply; 23+ messages in thread
From: Michael S. Tsirkin @ 2012-06-13 21:04 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: x86, kvm, Ingo Molnar, H. Peter Anvin, Avi Kivity, gleb,
	Linus Torvalds, linux-kernel, Thomas Gleixner

On Wed, Jun 13, 2012 at 05:53:36PM -0300, Marcelo Tosatti wrote:
> On Wed, Jun 13, 2012 at 11:35:07AM +0300, Michael S. Tsirkin wrote:
> > On Wed, Jun 13, 2012 at 11:19:24AM +0300, Michael S. Tsirkin wrote:
> > > On Tue, Jun 12, 2012 at 07:27:48PM -0300, Marcelo Tosatti wrote:
> > > > On Sun, Jun 03, 2012 at 10:28:29AM +0300, Michael S. Tsirkin wrote:
> > > > > Commit eb0dc6d0368072236dcd086d7fdc17fd3c4574d4 introduced apic
> > > > > attention bitmask but kvm still syncs lapic unconditionally.
> > > > > As that commit suggested and in anticipation of adding more attention
> > > > > bits, only sync lapic if(apic_attention).
> > > > > 
> > > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > > > ---
> > > > >  arch/x86/kvm/x86.c |    3 ++-
> > > > >  1 files changed, 2 insertions(+), 1 deletions(-)
> > > > > 
> > > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > > > index be6d549..2f70861 100644
> > > > > --- a/arch/x86/kvm/x86.c
> > > > > +++ b/arch/x86/kvm/x86.c
> > > > > @@ -5388,7 +5388,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
> > > > >  	if (unlikely(vcpu->arch.tsc_always_catchup))
> > > > >  		kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
> > > > >  
> > > > > -	kvm_lapic_sync_from_vapic(vcpu);
> > > > > +	if (unlikely(vcpu->arch.apic_attention))
> > > > > +		kvm_lapic_sync_from_vapic(vcpu);
> > > > 
> > > > void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu)
> > > > {
> > > >         u32 data;
> > > >         void *vapic;
> > > > 
> > > >         if (!test_bit(KVM_APIC_CHECK_VAPIC, &vcpu->arch.apic_attention))
> > > >                 return;
> > > > 
> > > > Please use unlikely more carefully, when a gain is measureable:
> > > > http://lwn.net/Articles/420019/
> > > 
> > > Do we have to measure every single thing?
> > > Sometimes it's obvious: vapic is slow path, isn't it?
> > 
> > Just to clarify the question: I think it's obvious this condition is
> > false more often than true. By how much, depends on the workload.
> > Do you think this is enough to tag this unlikely?
> 
> Depends whether your processor supports flexpriority or not. I don't
> want to argue in favour/against the particular instance
> 
> GCC docs:
> 
> "
> — Built-in Function: long __builtin_expect (long exp, long c)
> 
>     You may use __builtin_expect to provide the compiler with branch
> prediction information. In general, you should prefer to use actual
> profile feedback for this (-fprofile-arcs), as programmers are
> notoriously bad at predicting how their programs actually perform.
> However, there are applications in which this data is hard to collect.
> "
> 
> Lately half of branches in your patches are annotated.

So if I instrument and show that branch is almost never taken that is enough?
This citation does not require measuring the perf impact.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCHv6 8/8] kvm: host side for eoi optimization
  2012-06-03  7:28 ` [PATCHv6 8/8] kvm: host side for eoi optimization Michael S. Tsirkin
@ 2012-06-13 21:10   ` Marcelo Tosatti
  2012-06-14  8:01     ` Michael S. Tsirkin
  0 siblings, 1 reply; 23+ messages in thread
From: Marcelo Tosatti @ 2012-06-13 21:10 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: x86, kvm, Ingo Molnar, H. Peter Anvin, Avi Kivity, gleb,
	Linus Torvalds, linux-kernel, Thomas Gleixner

On Sun, Jun 03, 2012 at 10:28:43AM +0300, Michael S. Tsirkin wrote:
> Implementation of PV EOI using shared memory.
> This reduces the number of exits an interrupt
> causes as much as by half.
> 
> The idea is simple: there's a bit, per APIC, in guest memory,
> that tells the guest that it does not need EOI.
> We set it before injecting an interrupt and clear
> before injecting a nested one. Guest tests it using
> a test and clear operation - this is necessary
> so that host can detect interrupt nesting -
> and if set, it can skip the EOI MSR.
> 
> There's a new MSR to set the address of said register
> in guest memory. Otherwise not much changed:
> - Guest EOI is not required
> - Register is tested & ISR is automatically cleared on exit
> 
> For testing results see description of previous patch
> 'kvm_para: guest side for eoi avoidance'.
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>  arch/x86/include/asm/kvm_host.h |   12 ++++
>  arch/x86/kvm/cpuid.c            |    1 +
>  arch/x86/kvm/lapic.c            |  140 +++++++++++++++++++++++++++++++++++++-
>  arch/x86/kvm/lapic.h            |    2 +
>  arch/x86/kvm/trace.h            |   34 ++++++++++
>  arch/x86/kvm/x86.c              |    7 ++
>  6 files changed, 192 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index db7c1f2..dfc066c 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -175,6 +175,13 @@ enum {
>  
>  /* apic attention bits */
>  #define KVM_APIC_CHECK_VAPIC	0
> +/*
> + * The following bit is set with PV-EOI, unset on EOI.
> + * We detect PV-EOI changes by guest by comparing
> + * this bit with PV-EOI in guest memory.
> + * See the implementation in apic_update_pv_eoi.
> + */ 
> +#define KVM_APIC_PV_EOI_PENDING	1
>  
>  /*
>   * We don't want allocation failures within the mmu code, so we preallocate
> @@ -484,6 +491,11 @@ struct kvm_vcpu_arch {
>  		u64 length;
>  		u64 status;
>  	} osvw;
> +
> +	struct {
> +		u64 msr_val;
> +		struct gfn_to_hva_cache data;
> +	} pv_eoi;
>  };
>  
>  struct kvm_lpage_info {
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 7df1c6d..61ccbdf 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -409,6 +409,7 @@ static int do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
>  			     (1 << KVM_FEATURE_NOP_IO_DELAY) |
>  			     (1 << KVM_FEATURE_CLOCKSOURCE2) |
>  			     (1 << KVM_FEATURE_ASYNC_PF) |
> +			     (1 << KVM_FEATURE_PV_EOI) |
>  			     (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT);
>  
>  		if (sched_info_on())
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index db54e63..d16cfc5 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -306,6 +306,54 @@ int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq)
>  			irq->level, irq->trig_mode);
>  }
>  
> +static int pv_eoi_put_user(struct kvm_vcpu *vcpu, u8 val)
> +{
> +
> +	return kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.pv_eoi.data, &val,
> +				      sizeof(val));
> +}
> +
> +static int pv_eoi_get_user(struct kvm_vcpu *vcpu, u8 *val)
> +{
> +
> +	return kvm_read_guest_cached(vcpu->kvm, &vcpu->arch.pv_eoi.data, val,
> +				      sizeof(*val));
> +}
> +
> +static inline bool pv_eoi_enabled(struct kvm_vcpu *vcpu)
> +{
> +	return vcpu->arch.pv_eoi.msr_val & KVM_MSR_ENABLED;
> +}
> +
> +static bool pv_eoi_get_pending(struct kvm_vcpu *vcpu)
> +{
> +	u8 val;
> +	if (pv_eoi_get_user(vcpu, &val) < 0)
> +		apic_debug("Can't read EOI MSR value: 0x%llx\n",
> +			   (unsigned long long)vcpi->arch.pv_eoi.msr_val);
> +	return val & 0x1;
> +}
> +
> +static void pv_eoi_set_pending(struct kvm_vcpu *vcpu)
> +{
> +	if (pv_eoi_put_user(vcpu, KVM_PV_EOI_ENABLED) < 0) {
> +		apic_debug("Can't set EOI MSR value: 0x%llx\n",
> +			   (unsigned long long)vcpi->arch.pv_eoi.msr_val);
> +		return;
> +	}
> +	__set_bit(KVM_APIC_PV_EOI_PENDING, &vcpu->arch.apic_attention);
> +}
> +
> +static void pv_eoi_clr_pending(struct kvm_vcpu *vcpu)
> +{
> +	if (pv_eoi_put_user(vcpu, KVM_PV_EOI_DISABLED) < 0) {
> +		apic_debug("Can't clear EOI MSR value: 0x%llx\n",
> +			   (unsigned long long)vcpi->arch.pv_eoi.msr_val);
> +		return;
> +	}
> +	__clear_bit(KVM_APIC_PV_EOI_PENDING, &vcpu->arch.apic_attention);
> +}
> +
>  static inline int apic_find_highest_isr(struct kvm_lapic *apic)
>  {
>  	int result;
> @@ -522,15 +570,18 @@ int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1, struct kvm_vcpu *vcpu2)
>  	return vcpu1->arch.apic_arb_prio - vcpu2->arch.apic_arb_prio;
>  }
>  
> -static void apic_set_eoi(struct kvm_lapic *apic)
> +static int apic_set_eoi(struct kvm_lapic *apic)
>  {
>  	int vector = apic_find_highest_isr(apic);
> +
> +	trace_kvm_eoi(apic, vector);
> +
>  	/*
>  	 * Not every write EOI will has corresponding ISR,
>  	 * one example is when Kernel check timer on setup_IO_APIC
>  	 */
>  	if (vector == -1)
> -		return;
> +		return vector;
>  
>  	apic_clear_isr(vector, apic);
>  	apic_update_ppr(apic);
> @@ -545,6 +596,7 @@ static void apic_set_eoi(struct kvm_lapic *apic)
>  		kvm_ioapic_update_eoi(apic->vcpu->kvm, vector, trigger_mode);
>  	}
>  	kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
> +	return vector;
>  }
>  
>  static void apic_send_ipi(struct kvm_lapic *apic)
> @@ -1127,6 +1179,7 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu)
>  	atomic_set(&apic->lapic_timer.pending, 0);
>  	if (kvm_vcpu_is_bsp(vcpu))
>  		vcpu->arch.apic_base |= MSR_IA32_APICBASE_BSP;
> +	vcpu->arch.pv_eoi.msr_val = 0;
>  	apic_update_ppr(apic);
>  
>  	vcpu->arch.apic_arb_prio = 0;
> @@ -1327,11 +1380,51 @@ void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu)
>  		hrtimer_start_expires(timer, HRTIMER_MODE_ABS);
>  }
>  
> +/*
> + * apic_sync_pv_eoi_from_guest - called on vmexit or cancel interrupt
> + *
> + * Detect whether guest triggered PV EOI since the
> + * last entry. If yes, set EOI on guests's behalf.
> + * Clear PV EOI in guest memory in any case.
> + */
> +static void apic_sync_pv_eoi_from_guest(struct kvm_vcpu *vcpu,
> +					struct kvm_lapic *apic)
> +{
> +	bool pending;
> +	int vector;
> +	/*
> +	 * PV EOI state is derived from KVM_APIC_PV_EOI_PENDING in host
> +	 * and KVM_PV_EOI_ENABLED in guest memory as follows:
> +	 *
> +	 * KVM_APIC_PV_EOI_PENDING is unset:
> +	 * 	-> host disabled PV EOI.
> +	 * KVM_APIC_PV_EOI_PENDING is set, KVM_PV_EOI_ENABLED is set:
> +	 * 	-> host enabled PV EOI, guest did not execute EOI yet.
> +	 * KVM_APIC_PV_EOI_PENDING is set, KVM_PV_EOI_ENABLED is unset:
> +	 * 	-> host enabled PV EOI, guest executed EOI.
> +	 */
> +	BUG_ON(!pv_eoi_enabled(vcpu));
> +	pending = pv_eoi_get_pending(vcpu);
> +	/*
> +	 * Clear pending bit in any case: it will be set again on vmentry.
> +	 * While this might not be ideal from performance point of view,
> +	 * this makes sure pv eoi is only enabled when we know it's safe.
> +	 */
> +	pv_eoi_clr_pending(vcpu);
> +	if (pending)
> +		return;
> +	vector = apic_set_eoi(apic);
> +	trace_kvm_pv_eoi(apic, vector);
> +}
> +
>  void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu)
>  {
>  	u32 data;
>  	void *vapic;
>  
> +	if (test_bit(KVM_APIC_PV_EOI_PENDING, &vcpu->arch.apic_attention))
> +		apic_sync_pv_eoi_from_guest(vcpu, vcpu->arch.apic);
> +
>  	if (!test_bit(KVM_APIC_CHECK_VAPIC, &vcpu->arch.apic_attention))
>  		return;
>  
> @@ -1342,17 +1435,44 @@ void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu)
>  	apic_set_tpr(vcpu->arch.apic, data & 0xff);
>  }
>  
> +/*
> + * apic_sync_pv_eoi_to_guest - called before vmentry
> + *
> + * Detect whether it's safe to enable PV EOI and
> + * if yes do so.
> + */
> +static void apic_sync_pv_eoi_to_guest(struct kvm_vcpu *vcpu,
> +					struct kvm_lapic *apic)
> +{
> +	if (!pv_eoi_enabled(vcpu) ||
> +	    /* IRR set or many bits in ISR: could be nested. */
> +	    unlikely(apic->irr_pending) ||
> +	    /* Cache not set: could be safe but we don't bother. */
> +	    unlikely(apic->isr_cache == -1) ||
> +	    /* Need EOI to update ioapic. */
> +	    unlikely(kvm_ioapic_handles_vector(vcpu->kvm, apic->isr_cache))) {

This adds an lfence. Can't you move this logic to ioapic configuration
time?

Otherwise looks good (clear and very well commented).


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCHv6 6/8] kvm: only sync when attention bits set
  2012-06-13 21:04           ` Michael S. Tsirkin
@ 2012-06-13 23:38             ` Marcelo Tosatti
  2012-06-14  8:04               ` Michael S. Tsirkin
  0 siblings, 1 reply; 23+ messages in thread
From: Marcelo Tosatti @ 2012-06-13 23:38 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: x86, kvm, Ingo Molnar, H. Peter Anvin, Avi Kivity, gleb,
	Linus Torvalds, linux-kernel, Thomas Gleixner

On Thu, Jun 14, 2012 at 12:04:23AM +0300, Michael S. Tsirkin wrote:
> On Wed, Jun 13, 2012 at 05:53:36PM -0300, Marcelo Tosatti wrote:
> > On Wed, Jun 13, 2012 at 11:35:07AM +0300, Michael S. Tsirkin wrote:
> > > On Wed, Jun 13, 2012 at 11:19:24AM +0300, Michael S. Tsirkin wrote:
> > > > On Tue, Jun 12, 2012 at 07:27:48PM -0300, Marcelo Tosatti wrote:
> > > > > On Sun, Jun 03, 2012 at 10:28:29AM +0300, Michael S. Tsirkin wrote:
> > > > > > Commit eb0dc6d0368072236dcd086d7fdc17fd3c4574d4 introduced apic
> > > > > > attention bitmask but kvm still syncs lapic unconditionally.
> > > > > > As that commit suggested and in anticipation of adding more attention
> > > > > > bits, only sync lapic if(apic_attention).
> > > > > > 
> > > > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > > > > ---
> > > > > >  arch/x86/kvm/x86.c |    3 ++-
> > > > > >  1 files changed, 2 insertions(+), 1 deletions(-)
> > > > > > 
> > > > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > > > > index be6d549..2f70861 100644
> > > > > > --- a/arch/x86/kvm/x86.c
> > > > > > +++ b/arch/x86/kvm/x86.c
> > > > > > @@ -5388,7 +5388,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
> > > > > >  	if (unlikely(vcpu->arch.tsc_always_catchup))
> > > > > >  		kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
> > > > > >  
> > > > > > -	kvm_lapic_sync_from_vapic(vcpu);
> > > > > > +	if (unlikely(vcpu->arch.apic_attention))
> > > > > > +		kvm_lapic_sync_from_vapic(vcpu);
> > > > > 
> > > > > void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu)
> > > > > {
> > > > >         u32 data;
> > > > >         void *vapic;
> > > > > 
> > > > >         if (!test_bit(KVM_APIC_CHECK_VAPIC, &vcpu->arch.apic_attention))
> > > > >                 return;
> > > > > 
> > > > > Please use unlikely more carefully, when a gain is measureable:
> > > > > http://lwn.net/Articles/420019/
> > > > 
> > > > Do we have to measure every single thing?
> > > > Sometimes it's obvious: vapic is slow path, isn't it?
> > > 
> > > Just to clarify the question: I think it's obvious this condition is
> > > false more often than true. By how much, depends on the workload.
> > > Do you think this is enough to tag this unlikely?
> > 
> > Depends whether your processor supports flexpriority or not. I don't
> > want to argue in favour/against the particular instance
> > 
> > GCC docs:
> > 
> > "
> > — Built-in Function: long __builtin_expect (long exp, long c)
> > 
> >     You may use __builtin_expect to provide the compiler with branch
> > prediction information. In general, you should prefer to use actual
> > profile feedback for this (-fprofile-arcs), as programmers are
> > notoriously bad at predicting how their programs actually perform.
> > However, there are applications in which this data is hard to collect.
> > "
> > 
> > Lately half of branches in your patches are annotated.
> 
> So if I instrument and show that branch is almost never taken that is enough?
> This citation does not require measuring the perf impact.

Without flexpriority its always taken.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCHv6 8/8] kvm: host side for eoi optimization
  2012-06-13 21:10   ` Marcelo Tosatti
@ 2012-06-14  8:01     ` Michael S. Tsirkin
  0 siblings, 0 replies; 23+ messages in thread
From: Michael S. Tsirkin @ 2012-06-14  8:01 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: x86, kvm, Ingo Molnar, H. Peter Anvin, Avi Kivity, gleb,
	Linus Torvalds, linux-kernel, Thomas Gleixner

On Wed, Jun 13, 2012 at 06:10:15PM -0300, Marcelo Tosatti wrote:
> On Sun, Jun 03, 2012 at 10:28:43AM +0300, Michael S. Tsirkin wrote:
> > Implementation of PV EOI using shared memory.
> > This reduces the number of exits an interrupt
> > causes as much as by half.
> > 
> > The idea is simple: there's a bit, per APIC, in guest memory,
> > that tells the guest that it does not need EOI.
> > We set it before injecting an interrupt and clear
> > before injecting a nested one. Guest tests it using
> > a test and clear operation - this is necessary
> > so that host can detect interrupt nesting -
> > and if set, it can skip the EOI MSR.
> > 
> > There's a new MSR to set the address of said register
> > in guest memory. Otherwise not much changed:
> > - Guest EOI is not required
> > - Register is tested & ISR is automatically cleared on exit
> > 
> > For testing results see description of previous patch
> > 'kvm_para: guest side for eoi avoidance'.
> > 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> >  arch/x86/include/asm/kvm_host.h |   12 ++++
> >  arch/x86/kvm/cpuid.c            |    1 +
> >  arch/x86/kvm/lapic.c            |  140 +++++++++++++++++++++++++++++++++++++-
> >  arch/x86/kvm/lapic.h            |    2 +
> >  arch/x86/kvm/trace.h            |   34 ++++++++++
> >  arch/x86/kvm/x86.c              |    7 ++
> >  6 files changed, 192 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index db7c1f2..dfc066c 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -175,6 +175,13 @@ enum {
> >  
> >  /* apic attention bits */
> >  #define KVM_APIC_CHECK_VAPIC	0
> > +/*
> > + * The following bit is set with PV-EOI, unset on EOI.
> > + * We detect PV-EOI changes by guest by comparing
> > + * this bit with PV-EOI in guest memory.
> > + * See the implementation in apic_update_pv_eoi.
> > + */ 
> > +#define KVM_APIC_PV_EOI_PENDING	1
> >  
> >  /*
> >   * We don't want allocation failures within the mmu code, so we preallocate
> > @@ -484,6 +491,11 @@ struct kvm_vcpu_arch {
> >  		u64 length;
> >  		u64 status;
> >  	} osvw;
> > +
> > +	struct {
> > +		u64 msr_val;
> > +		struct gfn_to_hva_cache data;
> > +	} pv_eoi;
> >  };
> >  
> >  struct kvm_lpage_info {
> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > index 7df1c6d..61ccbdf 100644
> > --- a/arch/x86/kvm/cpuid.c
> > +++ b/arch/x86/kvm/cpuid.c
> > @@ -409,6 +409,7 @@ static int do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
> >  			     (1 << KVM_FEATURE_NOP_IO_DELAY) |
> >  			     (1 << KVM_FEATURE_CLOCKSOURCE2) |
> >  			     (1 << KVM_FEATURE_ASYNC_PF) |
> > +			     (1 << KVM_FEATURE_PV_EOI) |
> >  			     (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT);
> >  
> >  		if (sched_info_on())
> > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > index db54e63..d16cfc5 100644
> > --- a/arch/x86/kvm/lapic.c
> > +++ b/arch/x86/kvm/lapic.c
> > @@ -306,6 +306,54 @@ int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq)
> >  			irq->level, irq->trig_mode);
> >  }
> >  
> > +static int pv_eoi_put_user(struct kvm_vcpu *vcpu, u8 val)
> > +{
> > +
> > +	return kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.pv_eoi.data, &val,
> > +				      sizeof(val));
> > +}
> > +
> > +static int pv_eoi_get_user(struct kvm_vcpu *vcpu, u8 *val)
> > +{
> > +
> > +	return kvm_read_guest_cached(vcpu->kvm, &vcpu->arch.pv_eoi.data, val,
> > +				      sizeof(*val));
> > +}
> > +
> > +static inline bool pv_eoi_enabled(struct kvm_vcpu *vcpu)
> > +{
> > +	return vcpu->arch.pv_eoi.msr_val & KVM_MSR_ENABLED;
> > +}
> > +
> > +static bool pv_eoi_get_pending(struct kvm_vcpu *vcpu)
> > +{
> > +	u8 val;
> > +	if (pv_eoi_get_user(vcpu, &val) < 0)
> > +		apic_debug("Can't read EOI MSR value: 0x%llx\n",
> > +			   (unsigned long long)vcpi->arch.pv_eoi.msr_val);
> > +	return val & 0x1;
> > +}
> > +
> > +static void pv_eoi_set_pending(struct kvm_vcpu *vcpu)
> > +{
> > +	if (pv_eoi_put_user(vcpu, KVM_PV_EOI_ENABLED) < 0) {
> > +		apic_debug("Can't set EOI MSR value: 0x%llx\n",
> > +			   (unsigned long long)vcpi->arch.pv_eoi.msr_val);
> > +		return;
> > +	}
> > +	__set_bit(KVM_APIC_PV_EOI_PENDING, &vcpu->arch.apic_attention);
> > +}
> > +
> > +static void pv_eoi_clr_pending(struct kvm_vcpu *vcpu)
> > +{
> > +	if (pv_eoi_put_user(vcpu, KVM_PV_EOI_DISABLED) < 0) {
> > +		apic_debug("Can't clear EOI MSR value: 0x%llx\n",
> > +			   (unsigned long long)vcpi->arch.pv_eoi.msr_val);
> > +		return;
> > +	}
> > +	__clear_bit(KVM_APIC_PV_EOI_PENDING, &vcpu->arch.apic_attention);
> > +}
> > +
> >  static inline int apic_find_highest_isr(struct kvm_lapic *apic)
> >  {
> >  	int result;
> > @@ -522,15 +570,18 @@ int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1, struct kvm_vcpu *vcpu2)
> >  	return vcpu1->arch.apic_arb_prio - vcpu2->arch.apic_arb_prio;
> >  }
> >  
> > -static void apic_set_eoi(struct kvm_lapic *apic)
> > +static int apic_set_eoi(struct kvm_lapic *apic)
> >  {
> >  	int vector = apic_find_highest_isr(apic);
> > +
> > +	trace_kvm_eoi(apic, vector);
> > +
> >  	/*
> >  	 * Not every write EOI will has corresponding ISR,
> >  	 * one example is when Kernel check timer on setup_IO_APIC
> >  	 */
> >  	if (vector == -1)
> > -		return;
> > +		return vector;
> >  
> >  	apic_clear_isr(vector, apic);
> >  	apic_update_ppr(apic);
> > @@ -545,6 +596,7 @@ static void apic_set_eoi(struct kvm_lapic *apic)
> >  		kvm_ioapic_update_eoi(apic->vcpu->kvm, vector, trigger_mode);
> >  	}
> >  	kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
> > +	return vector;
> >  }
> >  
> >  static void apic_send_ipi(struct kvm_lapic *apic)
> > @@ -1127,6 +1179,7 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu)
> >  	atomic_set(&apic->lapic_timer.pending, 0);
> >  	if (kvm_vcpu_is_bsp(vcpu))
> >  		vcpu->arch.apic_base |= MSR_IA32_APICBASE_BSP;
> > +	vcpu->arch.pv_eoi.msr_val = 0;
> >  	apic_update_ppr(apic);
> >  
> >  	vcpu->arch.apic_arb_prio = 0;
> > @@ -1327,11 +1380,51 @@ void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu)
> >  		hrtimer_start_expires(timer, HRTIMER_MODE_ABS);
> >  }
> >  
> > +/*
> > + * apic_sync_pv_eoi_from_guest - called on vmexit or cancel interrupt
> > + *
> > + * Detect whether guest triggered PV EOI since the
> > + * last entry. If yes, set EOI on guests's behalf.
> > + * Clear PV EOI in guest memory in any case.
> > + */
> > +static void apic_sync_pv_eoi_from_guest(struct kvm_vcpu *vcpu,
> > +					struct kvm_lapic *apic)
> > +{
> > +	bool pending;
> > +	int vector;
> > +	/*
> > +	 * PV EOI state is derived from KVM_APIC_PV_EOI_PENDING in host
> > +	 * and KVM_PV_EOI_ENABLED in guest memory as follows:
> > +	 *
> > +	 * KVM_APIC_PV_EOI_PENDING is unset:
> > +	 * 	-> host disabled PV EOI.
> > +	 * KVM_APIC_PV_EOI_PENDING is set, KVM_PV_EOI_ENABLED is set:
> > +	 * 	-> host enabled PV EOI, guest did not execute EOI yet.
> > +	 * KVM_APIC_PV_EOI_PENDING is set, KVM_PV_EOI_ENABLED is unset:
> > +	 * 	-> host enabled PV EOI, guest executed EOI.
> > +	 */
> > +	BUG_ON(!pv_eoi_enabled(vcpu));
> > +	pending = pv_eoi_get_pending(vcpu);
> > +	/*
> > +	 * Clear pending bit in any case: it will be set again on vmentry.
> > +	 * While this might not be ideal from performance point of view,
> > +	 * this makes sure pv eoi is only enabled when we know it's safe.
> > +	 */
> > +	pv_eoi_clr_pending(vcpu);
> > +	if (pending)
> > +		return;
> > +	vector = apic_set_eoi(apic);
> > +	trace_kvm_pv_eoi(apic, vector);
> > +}
> > +
> >  void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu)
> >  {
> >  	u32 data;
> >  	void *vapic;
> >  
> > +	if (test_bit(KVM_APIC_PV_EOI_PENDING, &vcpu->arch.apic_attention))
> > +		apic_sync_pv_eoi_from_guest(vcpu, vcpu->arch.apic);
> > +
> >  	if (!test_bit(KVM_APIC_CHECK_VAPIC, &vcpu->arch.apic_attention))
> >  		return;
> >  
> > @@ -1342,17 +1435,44 @@ void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu)
> >  	apic_set_tpr(vcpu->arch.apic, data & 0xff);
> >  }
> >  
> > +/*
> > + * apic_sync_pv_eoi_to_guest - called before vmentry
> > + *
> > + * Detect whether it's safe to enable PV EOI and
> > + * if yes do so.
> > + */
> > +static void apic_sync_pv_eoi_to_guest(struct kvm_vcpu *vcpu,
> > +					struct kvm_lapic *apic)
> > +{
> > +	if (!pv_eoi_enabled(vcpu) ||
> > +	    /* IRR set or many bits in ISR: could be nested. */
> > +	    unlikely(apic->irr_pending) ||
> > +	    /* Cache not set: could be safe but we don't bother. */
> > +	    unlikely(apic->isr_cache == -1) ||
> > +	    /* Need EOI to update ioapic. */
> > +	    unlikely(kvm_ioapic_handles_vector(vcpu->kvm, apic->isr_cache))) {
> 
> This adds an lfence.

It's an smp_rmb:
#ifdef CONFIG_X86_PPRO_FENCE
# define smp_rmb()      rmb()
#else
# define smp_rmb()      barrier()
#endif

So only lfence on pentium pro.
Is there a pentium pro CPU with kvm support?

> Can't you move this logic to ioapic configuration
> time?

Not sure how to do this yet.  My testing shows avoiding the exit helps
so much we hardly notice the ioapic lookup cost. The code will
become much less well contained if I need to touch the ioapic.
Really worth the complication?

> Otherwise looks good (clear and very well commented).

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCHv6 6/8] kvm: only sync when attention bits set
  2012-06-13 23:38             ` Marcelo Tosatti
@ 2012-06-14  8:04               ` Michael S. Tsirkin
  0 siblings, 0 replies; 23+ messages in thread
From: Michael S. Tsirkin @ 2012-06-14  8:04 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: x86, kvm, Ingo Molnar, H. Peter Anvin, Avi Kivity, gleb,
	Linus Torvalds, linux-kernel, Thomas Gleixner

On Wed, Jun 13, 2012 at 08:38:46PM -0300, Marcelo Tosatti wrote:
> On Thu, Jun 14, 2012 at 12:04:23AM +0300, Michael S. Tsirkin wrote:
> > On Wed, Jun 13, 2012 at 05:53:36PM -0300, Marcelo Tosatti wrote:
> > > On Wed, Jun 13, 2012 at 11:35:07AM +0300, Michael S. Tsirkin wrote:
> > > > On Wed, Jun 13, 2012 at 11:19:24AM +0300, Michael S. Tsirkin wrote:
> > > > > On Tue, Jun 12, 2012 at 07:27:48PM -0300, Marcelo Tosatti wrote:
> > > > > > On Sun, Jun 03, 2012 at 10:28:29AM +0300, Michael S. Tsirkin wrote:
> > > > > > > Commit eb0dc6d0368072236dcd086d7fdc17fd3c4574d4 introduced apic
> > > > > > > attention bitmask but kvm still syncs lapic unconditionally.
> > > > > > > As that commit suggested and in anticipation of adding more attention
> > > > > > > bits, only sync lapic if(apic_attention).
> > > > > > > 
> > > > > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > > > > > ---
> > > > > > >  arch/x86/kvm/x86.c |    3 ++-
> > > > > > >  1 files changed, 2 insertions(+), 1 deletions(-)
> > > > > > > 
> > > > > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > > > > > index be6d549..2f70861 100644
> > > > > > > --- a/arch/x86/kvm/x86.c
> > > > > > > +++ b/arch/x86/kvm/x86.c
> > > > > > > @@ -5388,7 +5388,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
> > > > > > >  	if (unlikely(vcpu->arch.tsc_always_catchup))
> > > > > > >  		kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
> > > > > > >  
> > > > > > > -	kvm_lapic_sync_from_vapic(vcpu);
> > > > > > > +	if (unlikely(vcpu->arch.apic_attention))
> > > > > > > +		kvm_lapic_sync_from_vapic(vcpu);
> > > > > > 
> > > > > > void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu)
> > > > > > {
> > > > > >         u32 data;
> > > > > >         void *vapic;
> > > > > > 
> > > > > >         if (!test_bit(KVM_APIC_CHECK_VAPIC, &vcpu->arch.apic_attention))
> > > > > >                 return;
> > > > > > 
> > > > > > Please use unlikely more carefully, when a gain is measureable:
> > > > > > http://lwn.net/Articles/420019/
> > > > > 
> > > > > Do we have to measure every single thing?
> > > > > Sometimes it's obvious: vapic is slow path, isn't it?
> > > > 
> > > > Just to clarify the question: I think it's obvious this condition is
> > > > false more often than true. By how much, depends on the workload.
> > > > Do you think this is enough to tag this unlikely?
> > > 
> > > Depends whether your processor supports flexpriority or not. I don't
> > > want to argue in favour/against the particular instance
> > > 
> > > GCC docs:
> > > 
> > > "
> > > — Built-in Function: long __builtin_expect (long exp, long c)
> > > 
> > >     You may use __builtin_expect to provide the compiler with branch
> > > prediction information. In general, you should prefer to use actual
> > > profile feedback for this (-fprofile-arcs), as programmers are
> > > notoriously bad at predicting how their programs actually perform.
> > > However, there are applications in which this data is hard to collect.
> > > "
> > > 
> > > Lately half of branches in your patches are annotated.
> > 
> > So if I instrument and show that branch is almost never taken that is enough?
> > This citation does not require measuring the perf impact.
> 
> Without flexpriority its always taken.

OK I will remove this one and keep around the unlikely
for the cases that my measurements show occur in
less than 1% of cases.


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2012-06-14  8:03 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-03  7:27 [PATCHv6 0/8] kvm: eoi optimization support Michael S. Tsirkin
2012-06-03  7:27 ` [PATCHv6 1/8] kvm: document lapic regs field Michael S. Tsirkin
2012-06-03  7:27 ` [PATCHv6 2/8] kvm: optimize ISR lookups Michael S. Tsirkin
2012-06-12 21:08   ` Marcelo Tosatti
2012-06-13  9:02     ` Michael S. Tsirkin
2012-06-13  9:26       ` Michael S. Tsirkin
2012-06-13 20:21       ` Marcelo Tosatti
2012-06-03  7:28 ` [PATCHv6 3/8] kvm_para: guest side for eoi avoidance Michael S. Tsirkin
2012-06-03  7:28 ` [PATCHv6 4/8] x86/bitops: note on __test_and_clear_bit atomicity Michael S. Tsirkin
2012-06-03  7:28 ` [PATCHv6 5/8] kvm: eoi msi documentation Michael S. Tsirkin
2012-06-12 22:24   ` Marcelo Tosatti
2012-06-03  7:28 ` [PATCHv6 6/8] kvm: only sync when attention bits set Michael S. Tsirkin
2012-06-12 22:27   ` Marcelo Tosatti
2012-06-13  8:19     ` Michael S. Tsirkin
2012-06-13  8:35       ` Michael S. Tsirkin
2012-06-13 20:53         ` Marcelo Tosatti
2012-06-13 21:04           ` Michael S. Tsirkin
2012-06-13 23:38             ` Marcelo Tosatti
2012-06-14  8:04               ` Michael S. Tsirkin
2012-06-03  7:28 ` [PATCHv6 7/8] kvm: rearrange injection cancelling code Michael S. Tsirkin
2012-06-03  7:28 ` [PATCHv6 8/8] kvm: host side for eoi optimization Michael S. Tsirkin
2012-06-13 21:10   ` Marcelo Tosatti
2012-06-14  8:01     ` Michael S. Tsirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.