All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/25] KVM: x86: Speed up emulation of invalid state
@ 2014-06-09 12:58 Paolo Bonzini
  2014-06-09 12:58 ` [PATCH 01/25] KVM: vmx: speed up emulation of invalid guest state Paolo Bonzini
                   ` (24 more replies)
  0 siblings, 25 replies; 28+ messages in thread
From: Paolo Bonzini @ 2014-06-09 12:58 UTC (permalink / raw)
  To: linux-kernel; +Cc: bdas, gleb

This series, done in collaboration with Bandan Das, speeds up
emulation of invalid state by approximately a factor of 4
(as measured by realmode.flat).  It brings together patches sent
as RFC in the past 3 months, and adds a few more on top.

The total speedup achieved is around 3x.  Some changes shave a constant
number of cycles from all instructions; others only affect more complex
instructions that take more clock cycles to run.  Together, these two
different effects make the speedup nicely homogeneous across various kinds
of instructions.  Here are rough numbers (expressed in clock cycles on a
Sandy Bridge Xeon machine, with unrestricted_guest=0) at various points
of the series:

    jump  move  arith load  store RMW
    2300  2600  2500  2800  2800  3200
    1650  1950  1900  2150  2150  2600   KVM: vmx: speed up emulation of invalid guest state
    900   1250  1050  1350  1300  1700   KVM: x86: avoid useless set of KVM_REQ_EVENT after emulation
    900   1050  1050  1350  1300  1700   KVM: emulate: speed up emulated moves
    900   1050  1050  1300  1250  1400   KVM: emulate: extend memory access optimization to stores
    825   1000  1000  1250  1200  1350   KVM: emulate: do not initialize memopp
    750   950   950   1150  1050  1200   KVM: emulate: avoid per-byte copying in instruction fetches
    720   850   850   1075  1000  1100   KVM: x86: use kvm_read_guest_page for emulator accesses

The above only lists the patches where the improvement on kvm-unit-tests
became consistently identifiable and reproducible.  Take these with a
grain of salt, since all the rounding here was done by hand, no stddev
is provided, etc.

I tried to be quite strict and limited this series to patches that obey
the following criteria:

* either the patch is by itself a measurable improvement
(example: patch 6)

* or the patch is a really really obvious improvement (example:
patch 17), the compiler must really screw up for this not to be the
case

* or the patch is just preparatory for a subsequent measurable
improvement.

Quite a few functions disappear from the profile, and others have their
cost cut by a pretty large factor:

   61643         [kvm_intel]       vmx_segment_access_rights
   47504         [kvm]             vcpu_enter_guest
   34610         [kvm_intel]       rmode_segment_valid
   30312  7119   [kvm_intel]       vmx_get_segment
   27371 23363   [kvm]             x86_decode_insn
   20924 21185   [kernel.kallsyms] copy_user_generic_string
   18775  3614   [kvm_intel]       vmx_read_guest_seg_selector
   18040  9580   [kvm]             emulator_get_segment
   16061  5791   [kvm]             do_insn_fetch (__do_insn_fetch_bytes after patches)
   15834  5530   [kvm]             kvm_read_guest (kvm_fetch_guest_virt after patches)
   15721         [kernel.kallsyms] __srcu_read_lock
   15439  4115   [kvm]             init_emulate_ctxt
   14421 11692   [kvm]             x86_emulate_instruction
   12498         [kernel.kallsyms] __srcu_read_unlock
   12385 11779   [kvm]             __linearize
   12385 13194   [kvm]             decode_operand
    7408  5574   [kvm]             x86_emulate_insn
    6447         [kvm]             kvm_lapic_find_highest_irr
    6390         [kvm_intel]       vmx_handle_exit
    5598  3418   [kvm_intel]       vmx_interrupt_allowed

Honorable mentions among things that I tried and didn't have the effect
I hoped for: using __get_user/__put_user to read memory operands, and
simplifying linearize.


Patches 1-6 are various low-hanging fruit, which alone provide a
2-2.5x speedup (higher on simpler instructions).

Patches 7-12 make the emulator cache the host virtual address of memory
operands, thus avoid walking the page table twice.

Patch 13-18 avoid wasting time unnecessarily in the memset call of
x86_emulate_ctxt.

Patches 19-22 speed up operand fetching.

Patches 23-25 are the loose ends.

Bandan Das (6):
  KVM: emulate: move init_decode_cache to emulate.c
  KVM: emulate: Remove ctxt->intercept and ctxt->check_perm checks
  KVM: emulate: cleanup decode_modrm
  KVM: emulate: clean up initializations in init_decode_cache
  KVM: emulate: rework seg_override
  KVM: emulate: do not initialize memopp

Paolo Bonzini (19):
  KVM: vmx: speed up emulation of invalid guest state
  KVM: x86: return all bits from get_interrupt_shadow
  KVM: x86: avoid useless set of KVM_REQ_EVENT after emulation
  KVM: emulate: move around some checks
  KVM: emulate: protect checks on ctxt->d by a common "if (unlikely())"
  KVM: emulate: speed up emulated moves
  KVM: emulate: simplify writeback
  KVM: emulate: abstract handling of memory operands
  KVM: export mark_page_dirty_in_slot
  KVM: emulate: introduce memory_prepare callback to speed up memory access
  KVM: emulate: activate memory access optimization
  KVM: emulate: extend memory access optimization to stores
  KVM: emulate: speed up do_insn_fetch
  KVM: emulate: avoid repeated calls to do_insn_fetch_bytes
  KVM: emulate: avoid per-byte copying in instruction fetches
  KVM: emulate: put pointers in the fetch_cache
  KVM: x86: use kvm_read_guest_page for emulator accesses
  KVM: emulate: simplify BitOp handling
  KVM: emulate: fix harmless typo in MMX decoding

 arch/x86/include/asm/kvm_emulate.h |  59 ++++-
 arch/x86/include/asm/kvm_host.h    |   2 +-
 arch/x86/kvm/emulate.c             | 481 ++++++++++++++++++++++---------------
 arch/x86/kvm/svm.c                 |   6 +-
 arch/x86/kvm/trace.h               |   6 +-
 arch/x86/kvm/vmx.c                 |   9 +-
 arch/x86/kvm/x86.c                 | 147 +++++++++---
 include/linux/kvm_host.h           |   6 +
 virt/kvm/kvm_main.c                |  17 +-
 9 files changed, 473 insertions(+), 260 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 01/25] KVM: vmx: speed up emulation of invalid guest state
  2014-06-09 12:58 [PATCH 00/25] KVM: x86: Speed up emulation of invalid state Paolo Bonzini
@ 2014-06-09 12:58 ` Paolo Bonzini
  2014-06-09 12:58 ` [PATCH 02/25] KVM: x86: return all bits from get_interrupt_shadow Paolo Bonzini
                   ` (23 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: Paolo Bonzini @ 2014-06-09 12:58 UTC (permalink / raw)
  To: linux-kernel; +Cc: bdas, gleb

About 25% of the time spent in emulation of invalid guest state
is wasted in checking whether emulation is required for the next
instruction.  However, this almost never changes except when a
segment register (or TR or LDTR) changes, or when there is a mode
transition (i.e. CR0 changes).

In fact, vmx_set_segment and vmx_set_cr0 already modify
vmx->emulation_required (except that the former for some reason
uses |= instead of just an assignment).  So there is no need to
call guest_state_valid in the emulation loop.

Emulation performance test results indicate 1650-2600 cycles
for common instructions, versus 2300-3200 before this patch on
a Sandy Bridge Xeon.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 801332edefc3..a2ae11d162fe 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3653,7 +3653,7 @@ static void vmx_set_segment(struct kvm_vcpu *vcpu,
 	vmcs_write32(sf->ar_bytes, vmx_segment_access_rights(var));
 
 out:
-	vmx->emulation_required |= emulation_required(vcpu);
+	vmx->emulation_required = emulation_required(vcpu);
 }
 
 static void vmx_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, int *l)
@@ -5621,7 +5621,7 @@ static int handle_invalid_guest_state(struct kvm_vcpu *vcpu)
 	cpu_exec_ctrl = vmcs_read32(CPU_BASED_VM_EXEC_CONTROL);
 	intr_window_requested = cpu_exec_ctrl & CPU_BASED_VIRTUAL_INTR_PENDING;
 
-	while (!guest_state_valid(vcpu) && count-- != 0) {
+	while (vmx->emulation_required && count-- != 0) {
 		if (intr_window_requested && vmx_interrupt_allowed(vcpu))
 			return handle_interrupt_window(&vmx->vcpu);
 
@@ -5655,7 +5655,6 @@ static int handle_invalid_guest_state(struct kvm_vcpu *vcpu)
 			schedule();
 	}
 
-	vmx->emulation_required = emulation_required(vcpu);
 out:
 	return ret;
 }
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 02/25] KVM: x86: return all bits from get_interrupt_shadow
  2014-06-09 12:58 [PATCH 00/25] KVM: x86: Speed up emulation of invalid state Paolo Bonzini
  2014-06-09 12:58 ` [PATCH 01/25] KVM: vmx: speed up emulation of invalid guest state Paolo Bonzini
@ 2014-06-09 12:58 ` Paolo Bonzini
  2014-06-09 12:58 ` [PATCH 03/25] KVM: x86: avoid useless set of KVM_REQ_EVENT after emulation Paolo Bonzini
                   ` (22 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: Paolo Bonzini @ 2014-06-09 12:58 UTC (permalink / raw)
  To: linux-kernel; +Cc: bdas, gleb

For the next patch we will need to know the full state of the
interrupt shadow; we will then set KVM_REQ_EVENT when one bit
is cleared.

However, right now get_interrupt_shadow only returns the one
corresponding to the emulated instruction, or an unconditional
0 if the emulated instruction does not have an interrupt shadow.
This is confusing and does not allow us to check for cleared
bits as mentioned above.

Clean the callback up, and modify toggle_interruptibility to
match the comment above the call.  As a small result, the
call to set_interrupt_shadow will be skipped in the common
case where int_shadow == 0 && mask == 0.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |  2 +-
 arch/x86/kvm/svm.c              |  6 +++---
 arch/x86/kvm/vmx.c              |  4 ++--
 arch/x86/kvm/x86.c              | 10 +++++-----
 4 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 63e020be3da7..0b140dc65bee 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -717,7 +717,7 @@ struct kvm_x86_ops {
 	int (*handle_exit)(struct kvm_vcpu *vcpu);
 	void (*skip_emulated_instruction)(struct kvm_vcpu *vcpu);
 	void (*set_interrupt_shadow)(struct kvm_vcpu *vcpu, int mask);
-	u32 (*get_interrupt_shadow)(struct kvm_vcpu *vcpu, int mask);
+	u32 (*get_interrupt_shadow)(struct kvm_vcpu *vcpu);
 	void (*patch_hypercall)(struct kvm_vcpu *vcpu,
 				unsigned char *hypercall_addr);
 	void (*set_irq)(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index ec8366c5cfea..175702d51176 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -486,14 +486,14 @@ static int is_external_interrupt(u32 info)
 	return info == (SVM_EVTINJ_VALID | SVM_EVTINJ_TYPE_INTR);
 }
 
-static u32 svm_get_interrupt_shadow(struct kvm_vcpu *vcpu, int mask)
+static u32 svm_get_interrupt_shadow(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 	u32 ret = 0;
 
 	if (svm->vmcb->control.int_state & SVM_INTERRUPT_SHADOW_MASK)
-		ret |= KVM_X86_SHADOW_INT_STI | KVM_X86_SHADOW_INT_MOV_SS;
-	return ret & mask;
+		ret = KVM_X86_SHADOW_INT_STI | KVM_X86_SHADOW_INT_MOV_SS;
+	return ret;
 }
 
 static void svm_set_interrupt_shadow(struct kvm_vcpu *vcpu, int mask)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index a2ae11d162fe..57ed3aa0f5d3 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1940,7 +1940,7 @@ static void vmx_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
 	vmcs_writel(GUEST_RFLAGS, rflags);
 }
 
-static u32 vmx_get_interrupt_shadow(struct kvm_vcpu *vcpu, int mask)
+static u32 vmx_get_interrupt_shadow(struct kvm_vcpu *vcpu)
 {
 	u32 interruptibility = vmcs_read32(GUEST_INTERRUPTIBILITY_INFO);
 	int ret = 0;
@@ -1950,7 +1950,7 @@ static u32 vmx_get_interrupt_shadow(struct kvm_vcpu *vcpu, int mask)
 	if (interruptibility & GUEST_INTR_STATE_MOV_SS)
 		ret |= KVM_X86_SHADOW_INT_MOV_SS;
 
-	return ret & mask;
+	return ret;
 }
 
 static void vmx_set_interrupt_shadow(struct kvm_vcpu *vcpu, int mask)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 451d6acea808..70faa1089b75 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2974,9 +2974,7 @@ static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct kvm_vcpu *vcpu,
 		vcpu->arch.interrupt.pending && !vcpu->arch.interrupt.soft;
 	events->interrupt.nr = vcpu->arch.interrupt.nr;
 	events->interrupt.soft = 0;
-	events->interrupt.shadow =
-		kvm_x86_ops->get_interrupt_shadow(vcpu,
-			KVM_X86_SHADOW_INT_MOV_SS | KVM_X86_SHADOW_INT_STI);
+	events->interrupt.shadow = kvm_x86_ops->get_interrupt_shadow(vcpu);
 
 	events->nmi.injected = vcpu->arch.nmi_injected;
 	events->nmi.pending = vcpu->arch.nmi_pending != 0;
@@ -4857,7 +4855,7 @@ static const struct x86_emulate_ops emulate_ops = {
 
 static void toggle_interruptibility(struct kvm_vcpu *vcpu, u32 mask)
 {
-	u32 int_shadow = kvm_x86_ops->get_interrupt_shadow(vcpu, mask);
+	u32 int_shadow = kvm_x86_ops->get_interrupt_shadow(vcpu);
 	/*
 	 * an sti; sti; sequence only disable interrupts for the first
 	 * instruction. So, if the last instruction, be it emulated or
@@ -4865,7 +4863,9 @@ static void toggle_interruptibility(struct kvm_vcpu *vcpu, u32 mask)
 	 * means that the last instruction is an sti. We should not
 	 * leave the flag on in this case. The same goes for mov ss
 	 */
-	if (!(int_shadow & mask))
+	if (int_shadow & mask)
+		mask = 0;
+	if (unlikely(int_shadow || mask))
 		kvm_x86_ops->set_interrupt_shadow(vcpu, mask);
 }
 
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 03/25] KVM: x86: avoid useless set of KVM_REQ_EVENT after emulation
  2014-06-09 12:58 [PATCH 00/25] KVM: x86: Speed up emulation of invalid state Paolo Bonzini
  2014-06-09 12:58 ` [PATCH 01/25] KVM: vmx: speed up emulation of invalid guest state Paolo Bonzini
  2014-06-09 12:58 ` [PATCH 02/25] KVM: x86: return all bits from get_interrupt_shadow Paolo Bonzini
@ 2014-06-09 12:58 ` Paolo Bonzini
  2014-06-09 12:58 ` [PATCH 04/25] KVM: emulate: move around some checks Paolo Bonzini
                   ` (21 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: Paolo Bonzini @ 2014-06-09 12:58 UTC (permalink / raw)
  To: linux-kernel; +Cc: bdas, gleb

Despite the provisions to emulate up to 130 consecutive instructions, in
practice KVM will emulate just one before exiting handle_invalid_guest_state,
because x86_emulate_instruction always sets KVM_REQ_EVENT.

However, we only need to do this if an interrupt could be injected,
which happens a) if an interrupt shadow bit (STI or MOV SS) has gone
away; b) if the interrupt flag has just been set (other instructions
than STI can set it without enabling an interrupt shadow).

This cuts another 700-900 cycles from the cost of emulating an
instruction (measured on a Sandy Bridge Xeon: 1650-2600 cycles
before the patch on kvm-unit-tests, 925-1700 afterwards).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/x86.c | 36 ++++++++++++++++++++++++++----------
 1 file changed, 26 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 70faa1089b75..e64b8b5cb6cd 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -87,6 +87,7 @@ static u64 __read_mostly efer_reserved_bits = ~((u64)EFER_SCE);
 
 static void update_cr8_intercept(struct kvm_vcpu *vcpu);
 static void process_nmi(struct kvm_vcpu *vcpu);
+static void __kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags);
 
 struct kvm_x86_ops *kvm_x86_ops;
 EXPORT_SYMBOL_GPL(kvm_x86_ops);
@@ -4865,8 +4866,11 @@ static void toggle_interruptibility(struct kvm_vcpu *vcpu, u32 mask)
 	 */
 	if (int_shadow & mask)
 		mask = 0;
-	if (unlikely(int_shadow || mask))
+	if (unlikely(int_shadow || mask)) {
 		kvm_x86_ops->set_interrupt_shadow(vcpu, mask);
+		if (!mask)
+			kvm_make_request(KVM_REQ_EVENT, vcpu);
+	}
 }
 
 static void inject_emulated_exception(struct kvm_vcpu *vcpu)
@@ -5092,20 +5096,18 @@ static int kvm_vcpu_check_hw_bp(unsigned long addr, u32 type, u32 dr7,
 	return dr6;
 }
 
-static void kvm_vcpu_check_singlestep(struct kvm_vcpu *vcpu, int *r)
+static void kvm_vcpu_check_singlestep(struct kvm_vcpu *vcpu, unsigned long rflags, int *r)
 {
 	struct kvm_run *kvm_run = vcpu->run;
 
 	/*
-	 * Use the "raw" value to see if TF was passed to the processor.
-	 * Note that the new value of the flags has not been saved yet.
+	 * rflags is the old, "raw" value of the flags.  The new value has
+	 * not been saved yet.
 	 *
 	 * This is correct even for TF set by the guest, because "the
 	 * processor will not generate this exception after the instruction
 	 * that sets the TF flag".
 	 */
-	unsigned long rflags = kvm_x86_ops->get_rflags(vcpu);
-
 	if (unlikely(rflags & X86_EFLAGS_TF)) {
 		if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP) {
 			kvm_run->debug.arch.dr6 = DR6_BS | DR6_FIXED_1;
@@ -5272,13 +5274,22 @@ restart:
 		r = EMULATE_DONE;
 
 	if (writeback) {
+		unsigned long rflags = kvm_x86_ops->get_rflags(vcpu);
 		toggle_interruptibility(vcpu, ctxt->interruptibility);
-		kvm_make_request(KVM_REQ_EVENT, vcpu);
 		vcpu->arch.emulate_regs_need_sync_to_vcpu = false;
 		kvm_rip_write(vcpu, ctxt->eip);
 		if (r == EMULATE_DONE)
-			kvm_vcpu_check_singlestep(vcpu, &r);
-		kvm_set_rflags(vcpu, ctxt->eflags);
+			kvm_vcpu_check_singlestep(vcpu, rflags, &r);
+		__kvm_set_rflags(vcpu, ctxt->eflags);
+
+		/*
+		 * For STI, interrupts are shadowed; so KVM_REQ_EVENT will
+		 * do nothing, and it will be requested again as soon as
+		 * the shadow expires.  But we still need to check here,
+		 * because POPF has no interrupt shadow.
+		 */
+		if (unlikely((ctxt->eflags & ~rflags) & X86_EFLAGS_IF))
+			kvm_make_request(KVM_REQ_EVENT, vcpu);
 	} else
 		vcpu->arch.emulate_regs_need_sync_to_vcpu = true;
 
@@ -7400,12 +7411,17 @@ unsigned long kvm_get_rflags(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_get_rflags);
 
-void kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
+static void __kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
 {
 	if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP &&
 	    kvm_is_linear_rip(vcpu, vcpu->arch.singlestep_rip))
 		rflags |= X86_EFLAGS_TF;
 	kvm_x86_ops->set_rflags(vcpu, rflags);
+}
+
+void kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
+{
+	__kvm_set_rflags(vcpu, rflags);
 	kvm_make_request(KVM_REQ_EVENT, vcpu);
 }
 EXPORT_SYMBOL_GPL(kvm_set_rflags);
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 04/25] KVM: emulate: move around some checks
  2014-06-09 12:58 [PATCH 00/25] KVM: x86: Speed up emulation of invalid state Paolo Bonzini
                   ` (2 preceding siblings ...)
  2014-06-09 12:58 ` [PATCH 03/25] KVM: x86: avoid useless set of KVM_REQ_EVENT after emulation Paolo Bonzini
@ 2014-06-09 12:58 ` Paolo Bonzini
  2014-06-09 12:58 ` [PATCH 05/25] KVM: emulate: protect checks on ctxt->d by a common "if (unlikely())" Paolo Bonzini
                   ` (20 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: Paolo Bonzini @ 2014-06-09 12:58 UTC (permalink / raw)
  To: linux-kernel; +Cc: bdas, gleb

The only purpose of this patch is to make the next patch simpler
to review.  No semantic change.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/emulate.c | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index bc670675223d..63ba8bd82a36 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -4381,12 +4381,15 @@ done_prefixes:
 		ctxt->d |= opcode.flags;
 	}
 
+	/* Unrecognised? */
+	if (ctxt->d == 0)
+		return EMULATION_FAILED;
+
 	ctxt->execute = opcode.u.execute;
 	ctxt->check_perm = opcode.check_perm;
 	ctxt->intercept = opcode.intercept;
 
-	/* Unrecognised? */
-	if (ctxt->d == 0 || (ctxt->d & NotImpl))
+	if (ctxt->d & NotImpl)
 		return EMULATION_FAILED;
 
 	if (!(ctxt->d & EmulateOnUD) && ctxt->ud)
@@ -4528,19 +4531,19 @@ int x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
 
 	ctxt->mem_read.pos = 0;
 
-	if ((ctxt->mode == X86EMUL_MODE_PROT64 && (ctxt->d & No64)) ||
-			(ctxt->d & Undefined)) {
+	/* LOCK prefix is allowed only with some instructions */
+	if (ctxt->lock_prefix && (!(ctxt->d & Lock) || ctxt->dst.type != OP_MEM)) {
 		rc = emulate_ud(ctxt);
 		goto done;
 	}
 
-	/* LOCK prefix is allowed only with some instructions */
-	if (ctxt->lock_prefix && (!(ctxt->d & Lock) || ctxt->dst.type != OP_MEM)) {
+	if ((ctxt->d & SrcMask) == SrcMemFAddr && ctxt->src.type != OP_MEM) {
 		rc = emulate_ud(ctxt);
 		goto done;
 	}
 
-	if ((ctxt->d & SrcMask) == SrcMemFAddr && ctxt->src.type != OP_MEM) {
+	if ((ctxt->mode == X86EMUL_MODE_PROT64 && (ctxt->d & No64)) ||
+			(ctxt->d & Undefined)) {
 		rc = emulate_ud(ctxt);
 		goto done;
 	}
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 05/25] KVM: emulate: protect checks on ctxt->d by a common "if (unlikely())"
  2014-06-09 12:58 [PATCH 00/25] KVM: x86: Speed up emulation of invalid state Paolo Bonzini
                   ` (3 preceding siblings ...)
  2014-06-09 12:58 ` [PATCH 04/25] KVM: emulate: move around some checks Paolo Bonzini
@ 2014-06-09 12:58 ` Paolo Bonzini
  2014-06-09 12:58 ` [PATCH 06/25] KVM: emulate: speed up emulated moves Paolo Bonzini
                   ` (19 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: Paolo Bonzini @ 2014-06-09 12:58 UTC (permalink / raw)
  To: linux-kernel; +Cc: bdas, gleb

There are several checks for "peculiar" aspects of instructions in both
x86_decode_insn and x86_emulate_insn.  Group them together, and guard
them with a single "if" that lets the processor quickly skip them all.
Make this more effective by adding two more flag bits that say whether the
.intercept and .check_perm fields are valid.  We will reuse these
flags later to avoid initializing fields of the emulate_ctxt struct.

This skims about 30 cycles for each emulated instructions, which is
approximately a 3% improvement.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/emulate.c | 175 ++++++++++++++++++++++++++-----------------------
 1 file changed, 94 insertions(+), 81 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 63ba8bd82a36..4c2ae824e89e 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -162,6 +162,8 @@
 #define NoWrite     ((u64)1 << 45)  /* No writeback */
 #define SrcWrite    ((u64)1 << 46)  /* Write back src operand */
 #define NoMod	    ((u64)1 << 47)  /* Mod field is ignored */
+#define Intercept   ((u64)1 << 48)  /* Has valid intercept field */
+#define CheckPerm   ((u64)1 << 49)  /* Has valid check_perm field */
 
 #define DstXacc     (DstAccLo | SrcAccHi | SrcWrite)
 
@@ -3539,9 +3541,9 @@ static int check_perm_out(struct x86_emulate_ctxt *ctxt)
 }
 
 #define D(_y) { .flags = (_y) }
-#define DI(_y, _i) { .flags = (_y), .intercept = x86_intercept_##_i }
-#define DIP(_y, _i, _p) { .flags = (_y), .intercept = x86_intercept_##_i, \
-		      .check_perm = (_p) }
+#define DI(_y, _i) { .flags = (_y)|Intercept, .intercept = x86_intercept_##_i }
+#define DIP(_y, _i, _p) { .flags = (_y)|Intercept|CheckPerm, \
+		      .intercept = x86_intercept_##_i, .check_perm = (_p) }
 #define N    D(NotImpl)
 #define EXT(_f, _e) { .flags = ((_f) | RMExt), .u.group = (_e) }
 #define G(_f, _g) { .flags = ((_f) | Group | ModRM), .u.group = (_g) }
@@ -3550,10 +3552,10 @@ static int check_perm_out(struct x86_emulate_ctxt *ctxt)
 #define I(_f, _e) { .flags = (_f), .u.execute = (_e) }
 #define F(_f, _e) { .flags = (_f) | Fastop, .u.fastop = (_e) }
 #define II(_f, _e, _i) \
-	{ .flags = (_f), .u.execute = (_e), .intercept = x86_intercept_##_i }
+	{ .flags = (_f)|Intercept, .u.execute = (_e), .intercept = x86_intercept_##_i }
 #define IIP(_f, _e, _i, _p) \
-	{ .flags = (_f), .u.execute = (_e), .intercept = x86_intercept_##_i, \
-	  .check_perm = (_p) }
+	{ .flags = (_f)|Intercept|CheckPerm, .u.execute = (_e), \
+	  .intercept = x86_intercept_##_i, .check_perm = (_p) }
 #define GP(_f, _g) { .flags = ((_f) | Prefix), .u.gprefix = (_g) }
 
 #define D2bv(_f)      D((_f) | ByteOp), D(_f)
@@ -4386,29 +4388,37 @@ done_prefixes:
 		return EMULATION_FAILED;
 
 	ctxt->execute = opcode.u.execute;
-	ctxt->check_perm = opcode.check_perm;
-	ctxt->intercept = opcode.intercept;
 
-	if (ctxt->d & NotImpl)
-		return EMULATION_FAILED;
+	if (unlikely(ctxt->d &
+		     (NotImpl|EmulateOnUD|Stack|Op3264|Sse|Mmx|Intercept|CheckPerm))) {
+		/*
+		 * These are copied unconditionally here, and checked unconditionally
+		 * in x86_emulate_insn.
+		 */
+		ctxt->check_perm = opcode.check_perm;
+		ctxt->intercept = opcode.intercept;
 
-	if (!(ctxt->d & EmulateOnUD) && ctxt->ud)
-		return EMULATION_FAILED;
+		if (ctxt->d & NotImpl)
+			return EMULATION_FAILED;
 
-	if (mode == X86EMUL_MODE_PROT64 && (ctxt->d & Stack))
-		ctxt->op_bytes = 8;
+		if (!(ctxt->d & EmulateOnUD) && ctxt->ud)
+			return EMULATION_FAILED;
 
-	if (ctxt->d & Op3264) {
-		if (mode == X86EMUL_MODE_PROT64)
+		if (mode == X86EMUL_MODE_PROT64 && (ctxt->d & Stack))
 			ctxt->op_bytes = 8;
-		else
-			ctxt->op_bytes = 4;
-	}
 
-	if (ctxt->d & Sse)
-		ctxt->op_bytes = 16;
-	else if (ctxt->d & Mmx)
-		ctxt->op_bytes = 8;
+		if (ctxt->d & Op3264) {
+			if (mode == X86EMUL_MODE_PROT64)
+				ctxt->op_bytes = 8;
+			else
+				ctxt->op_bytes = 4;
+		}
+
+		if (ctxt->d & Sse)
+			ctxt->op_bytes = 16;
+		else if (ctxt->d & Mmx)
+			ctxt->op_bytes = 8;
+	}
 
 	/* ModRM and SIB bytes. */
 	if (ctxt->d & ModRM) {
@@ -4542,75 +4552,78 @@ int x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
 		goto done;
 	}
 
-	if ((ctxt->mode == X86EMUL_MODE_PROT64 && (ctxt->d & No64)) ||
-			(ctxt->d & Undefined)) {
-		rc = emulate_ud(ctxt);
-		goto done;
-	}
-
-	if (((ctxt->d & (Sse|Mmx)) && ((ops->get_cr(ctxt, 0) & X86_CR0_EM)))
-	    || ((ctxt->d & Sse) && !(ops->get_cr(ctxt, 4) & X86_CR4_OSFXSR))) {
-		rc = emulate_ud(ctxt);
-		goto done;
-	}
-
-	if ((ctxt->d & (Sse|Mmx)) && (ops->get_cr(ctxt, 0) & X86_CR0_TS)) {
-		rc = emulate_nm(ctxt);
-		goto done;
-	}
+	if (unlikely(ctxt->d &
+		     (No64|Undefined|Sse|Mmx|Intercept|CheckPerm|Priv|Prot|String))) {
+		if ((ctxt->mode == X86EMUL_MODE_PROT64 && (ctxt->d & No64)) ||
+				(ctxt->d & Undefined)) {
+			rc = emulate_ud(ctxt);
+			goto done;
+		}
 
-	if (ctxt->d & Mmx) {
-		rc = flush_pending_x87_faults(ctxt);
-		if (rc != X86EMUL_CONTINUE)
+		if (((ctxt->d & (Sse|Mmx)) && ((ops->get_cr(ctxt, 0) & X86_CR0_EM)))
+		    || ((ctxt->d & Sse) && !(ops->get_cr(ctxt, 4) & X86_CR4_OSFXSR))) {
+			rc = emulate_ud(ctxt);
 			goto done;
-		/*
-		 * Now that we know the fpu is exception safe, we can fetch
-		 * operands from it.
-		 */
-		fetch_possible_mmx_operand(ctxt, &ctxt->src);
-		fetch_possible_mmx_operand(ctxt, &ctxt->src2);
-		if (!(ctxt->d & Mov))
-			fetch_possible_mmx_operand(ctxt, &ctxt->dst);
-	}
+		}
 
-	if (unlikely(ctxt->guest_mode) && ctxt->intercept) {
-		rc = emulator_check_intercept(ctxt, ctxt->intercept,
-					      X86_ICPT_PRE_EXCEPT);
-		if (rc != X86EMUL_CONTINUE)
+		if ((ctxt->d & (Sse|Mmx)) && (ops->get_cr(ctxt, 0) & X86_CR0_TS)) {
+			rc = emulate_nm(ctxt);
 			goto done;
-	}
+		}
 
-	/* Privileged instruction can be executed only in CPL=0 */
-	if ((ctxt->d & Priv) && ops->cpl(ctxt)) {
-		rc = emulate_gp(ctxt, 0);
-		goto done;
-	}
+		if (ctxt->d & Mmx) {
+			rc = flush_pending_x87_faults(ctxt);
+			if (rc != X86EMUL_CONTINUE)
+				goto done;
+			/*
+			 * Now that we know the fpu is exception safe, we can fetch
+			 * operands from it.
+			 */
+			fetch_possible_mmx_operand(ctxt, &ctxt->src);
+			fetch_possible_mmx_operand(ctxt, &ctxt->src2);
+			if (!(ctxt->d & Mov))
+				fetch_possible_mmx_operand(ctxt, &ctxt->dst);
+		}
 
-	/* Instruction can only be executed in protected mode */
-	if ((ctxt->d & Prot) && ctxt->mode < X86EMUL_MODE_PROT16) {
-		rc = emulate_ud(ctxt);
-		goto done;
-	}
+		if (unlikely(ctxt->guest_mode) && ctxt->intercept) {
+			rc = emulator_check_intercept(ctxt, ctxt->intercept,
+						      X86_ICPT_PRE_EXCEPT);
+			if (rc != X86EMUL_CONTINUE)
+				goto done;
+		}
 
-	/* Do instruction specific permission checks */
-	if (ctxt->check_perm) {
-		rc = ctxt->check_perm(ctxt);
-		if (rc != X86EMUL_CONTINUE)
+		/* Privileged instruction can be executed only in CPL=0 */
+		if ((ctxt->d & Priv) && ops->cpl(ctxt)) {
+			rc = emulate_gp(ctxt, 0);
 			goto done;
-	}
+		}
 
-	if (unlikely(ctxt->guest_mode) && ctxt->intercept) {
-		rc = emulator_check_intercept(ctxt, ctxt->intercept,
-					      X86_ICPT_POST_EXCEPT);
-		if (rc != X86EMUL_CONTINUE)
+		/* Instruction can only be executed in protected mode */
+		if ((ctxt->d & Prot) && ctxt->mode < X86EMUL_MODE_PROT16) {
+			rc = emulate_ud(ctxt);
 			goto done;
-	}
+		}
 
-	if (ctxt->rep_prefix && (ctxt->d & String)) {
-		/* All REP prefixes have the same first termination condition */
-		if (address_mask(ctxt, reg_read(ctxt, VCPU_REGS_RCX)) == 0) {
-			ctxt->eip = ctxt->_eip;
-			goto done;
+		/* Do instruction specific permission checks */
+		if (ctxt->check_perm) {
+			rc = ctxt->check_perm(ctxt);
+			if (rc != X86EMUL_CONTINUE)
+				goto done;
+		}
+
+		if (unlikely(ctxt->guest_mode) && ctxt->intercept) {
+			rc = emulator_check_intercept(ctxt, ctxt->intercept,
+						      X86_ICPT_POST_EXCEPT);
+			if (rc != X86EMUL_CONTINUE)
+				goto done;
+		}
+
+		if (ctxt->rep_prefix && (ctxt->d & String)) {
+			/* All REP prefixes have the same first termination condition */
+			if (address_mask(ctxt, reg_read(ctxt, VCPU_REGS_RCX)) == 0) {
+				ctxt->eip = ctxt->_eip;
+				goto done;
+			}
 		}
 	}
 
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 06/25] KVM: emulate: speed up emulated moves
  2014-06-09 12:58 [PATCH 00/25] KVM: x86: Speed up emulation of invalid state Paolo Bonzini
                   ` (4 preceding siblings ...)
  2014-06-09 12:58 ` [PATCH 05/25] KVM: emulate: protect checks on ctxt->d by a common "if (unlikely())" Paolo Bonzini
@ 2014-06-09 12:58 ` Paolo Bonzini
  2014-06-09 12:58 ` [PATCH 07/25] KVM: emulate: simplify writeback Paolo Bonzini
                   ` (18 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: Paolo Bonzini @ 2014-06-09 12:58 UTC (permalink / raw)
  To: linux-kernel; +Cc: bdas, gleb

We can just blindly move all 16 bytes of ctxt->src's value to ctxt->dst.
write_register_operand will take care of writing only the lower bytes.

Avoiding a call to memcpy (the compiler optimizes it out) gains about
200 cycles on kvm-unit-tests for register-to-register moves, and makes
them about as fast as arithmetic instructions.

We could perhaps get a larger speedup by moving all instructions _except_
moves out of x86_emulate_insn, removing opcode_len, and replacing the
switch statement with an inlined em_mov.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/asm/kvm_emulate.h | 2 +-
 arch/x86/kvm/emulate.c             | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h
index ffa2671a7f2f..46725e8aa8cb 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -232,7 +232,7 @@ struct operand {
 	union {
 		unsigned long val;
 		u64 val64;
-		char valptr[sizeof(unsigned long) + 2];
+		char valptr[sizeof(sse128_t)];
 		sse128_t vec_val;
 		u64 mm_val;
 		void *data;
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 4c2ae824e89e..5f902d40740c 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2983,7 +2983,7 @@ static int em_rdpmc(struct x86_emulate_ctxt *ctxt)
 
 static int em_mov(struct x86_emulate_ctxt *ctxt)
 {
-	memcpy(ctxt->dst.valptr, ctxt->src.valptr, ctxt->op_bytes);
+	memcpy(ctxt->dst.valptr, ctxt->src.valptr, sizeof(ctxt->src.valptr));
 	return X86EMUL_CONTINUE;
 }
 
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 07/25] KVM: emulate: simplify writeback
  2014-06-09 12:58 [PATCH 00/25] KVM: x86: Speed up emulation of invalid state Paolo Bonzini
                   ` (5 preceding siblings ...)
  2014-06-09 12:58 ` [PATCH 06/25] KVM: emulate: speed up emulated moves Paolo Bonzini
@ 2014-06-09 12:58 ` Paolo Bonzini
  2014-06-09 12:58 ` [PATCH 08/25] KVM: emulate: abstract handling of memory operands Paolo Bonzini
                   ` (17 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: Paolo Bonzini @ 2014-06-09 12:58 UTC (permalink / raw)
  To: linux-kernel; +Cc: bdas, gleb

The "if/return" checks are useless, because we return X86EMUL_CONTINUE
anyway if we do not return.

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/emulate.c | 28 +++++++++++-----------------
 1 file changed, 11 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 5f902d40740c..a1daf52fae58 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1585,34 +1585,28 @@ static void write_register_operand(struct operand *op)
 
 static int writeback(struct x86_emulate_ctxt *ctxt, struct operand *op)
 {
-	int rc;
-
 	switch (op->type) {
 	case OP_REG:
 		write_register_operand(op);
 		break;
 	case OP_MEM:
 		if (ctxt->lock_prefix)
-			rc = segmented_cmpxchg(ctxt,
+			return segmented_cmpxchg(ctxt,
+						 op->addr.mem,
+						 &op->orig_val,
+						 &op->val,
+						 op->bytes);
+		else
+			return segmented_write(ctxt,
 					       op->addr.mem,
-					       &op->orig_val,
 					       &op->val,
 					       op->bytes);
-		else
-			rc = segmented_write(ctxt,
-					     op->addr.mem,
-					     &op->val,
-					     op->bytes);
-		if (rc != X86EMUL_CONTINUE)
-			return rc;
 		break;
 	case OP_MEM_STR:
-		rc = segmented_write(ctxt,
-				op->addr.mem,
-				op->data,
-				op->bytes * op->count);
-		if (rc != X86EMUL_CONTINUE)
-			return rc;
+		return segmented_write(ctxt,
+				       op->addr.mem,
+				       op->data,
+				       op->bytes * op->count);
 		break;
 	case OP_XMM:
 		write_sse_reg(ctxt, &op->vec_val, op->addr.xmm);
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 08/25] KVM: emulate: abstract handling of memory operands
  2014-06-09 12:58 [PATCH 00/25] KVM: x86: Speed up emulation of invalid state Paolo Bonzini
                   ` (6 preceding siblings ...)
  2014-06-09 12:58 ` [PATCH 07/25] KVM: emulate: simplify writeback Paolo Bonzini
@ 2014-06-09 12:58 ` Paolo Bonzini
  2014-06-09 12:58 ` [PATCH 09/25] KVM: export mark_page_dirty_in_slot Paolo Bonzini
                   ` (16 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: Paolo Bonzini @ 2014-06-09 12:58 UTC (permalink / raw)
  To: linux-kernel; +Cc: bdas, gleb

Abstract the pre-execution processing and writeback of memory operands
in new functions.  We will soon do some work before execution even for
move destination, so call the function in that case too; but not for
the memory operand of lea, invlpg etc.

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/emulate.c | 43 ++++++++++++++++++++++++++++---------------
 1 file changed, 28 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index a1daf52fae58..7e9dc2d6fd44 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1564,6 +1564,29 @@ static int load_segment_descriptor(struct x86_emulate_ctxt *ctxt,
 	return __load_segment_descriptor(ctxt, selector, seg, cpl, false);
 }
 
+static int prepare_memory_operand(struct x86_emulate_ctxt *ctxt,
+				  struct operand *op)
+{
+	return segmented_read(ctxt, op->addr.mem, &op->val, op->bytes);
+}
+
+static int cmpxchg_memory_operand(struct x86_emulate_ctxt *ctxt,
+				  struct operand *op)
+{
+	return segmented_cmpxchg(ctxt, op->addr.mem,
+				 &op->orig_val,
+				 &op->val,
+				 op->bytes);
+}
+
+static int write_memory_operand(struct x86_emulate_ctxt *ctxt,
+				struct operand *op)
+{
+	return segmented_write(ctxt, op->addr.mem,
+			       &op->val,
+			       op->bytes);
+}
+
 static void write_register_operand(struct operand *op)
 {
 	/* The 4-byte case *is* correct: in 64-bit mode we zero-extend. */
@@ -1591,16 +1614,9 @@ static int writeback(struct x86_emulate_ctxt *ctxt, struct operand *op)
 		break;
 	case OP_MEM:
 		if (ctxt->lock_prefix)
-			return segmented_cmpxchg(ctxt,
-						 op->addr.mem,
-						 &op->orig_val,
-						 &op->val,
-						 op->bytes);
+			return cmpxchg_memory_operand(ctxt, op);
 		else
-			return segmented_write(ctxt,
-					       op->addr.mem,
-					       &op->val,
-					       op->bytes);
+			return write_memory_operand(ctxt, op);
 		break;
 	case OP_MEM_STR:
 		return segmented_write(ctxt,
@@ -4622,16 +4638,14 @@ int x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
 	}
 
 	if ((ctxt->src.type == OP_MEM) && !(ctxt->d & NoAccess)) {
-		rc = segmented_read(ctxt, ctxt->src.addr.mem,
-				    ctxt->src.valptr, ctxt->src.bytes);
+		rc = prepare_memory_operand(ctxt, &ctxt->src);
 		if (rc != X86EMUL_CONTINUE)
 			goto done;
 		ctxt->src.orig_val64 = ctxt->src.val64;
 	}
 
 	if (ctxt->src2.type == OP_MEM) {
-		rc = segmented_read(ctxt, ctxt->src2.addr.mem,
-				    &ctxt->src2.val, ctxt->src2.bytes);
+		rc = prepare_memory_operand(ctxt, &ctxt->src2);
 		if (rc != X86EMUL_CONTINUE)
 			goto done;
 	}
@@ -4642,8 +4656,7 @@ int x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
 
 	if ((ctxt->dst.type == OP_MEM) && !(ctxt->d & Mov)) {
 		/* optimisation - avoid slow emulated read if Mov */
-		rc = segmented_read(ctxt, ctxt->dst.addr.mem,
-				   &ctxt->dst.val, ctxt->dst.bytes);
+		rc = prepare_memory_operand(ctxt, &ctxt->dst);
 		if (rc != X86EMUL_CONTINUE)
 			goto done;
 	}
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 09/25] KVM: export mark_page_dirty_in_slot
  2014-06-09 12:58 [PATCH 00/25] KVM: x86: Speed up emulation of invalid state Paolo Bonzini
                   ` (7 preceding siblings ...)
  2014-06-09 12:58 ` [PATCH 08/25] KVM: emulate: abstract handling of memory operands Paolo Bonzini
@ 2014-06-09 12:58 ` Paolo Bonzini
  2014-06-09 12:58 ` [PATCH 10/25] KVM: emulate: introduce memory_prepare callback to speed up memory access Paolo Bonzini
                   ` (15 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: Paolo Bonzini @ 2014-06-09 12:58 UTC (permalink / raw)
  To: linux-kernel; +Cc: bdas, gleb

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 include/linux/kvm_host.h |  1 +
 virt/kvm/kvm_main.c      | 12 +++++-------
 2 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 970c68197c69..5be9805b0aeb 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -583,6 +583,7 @@ struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn);
 int kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn);
 unsigned long kvm_host_page_size(struct kvm *kvm, gfn_t gfn);
 void mark_page_dirty(struct kvm *kvm, gfn_t gfn);
+void mark_page_dirty_in_slot(struct kvm_memory_slot *memslot, gfn_t gfn);
 
 void kvm_vcpu_block(struct kvm_vcpu *vcpu);
 void kvm_vcpu_kick(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index c86be0f983db..2f9bc20ae2a7 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -99,8 +99,6 @@ static void update_memslots(struct kvm_memslots *slots,
 			    struct kvm_memory_slot *new, u64 last_generation);
 
 static void kvm_release_pfn_dirty(pfn_t pfn);
-static void mark_page_dirty_in_slot(struct kvm *kvm,
-				    struct kvm_memory_slot *memslot, gfn_t gfn);
 
 __visible bool kvm_rebooting;
 EXPORT_SYMBOL_GPL(kvm_rebooting);
@@ -1585,7 +1583,7 @@ int kvm_write_guest_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc,
 	r = __copy_to_user((void __user *)ghc->hva, data, len);
 	if (r)
 		return -EFAULT;
-	mark_page_dirty_in_slot(kvm, ghc->memslot, ghc->gpa >> PAGE_SHIFT);
+	mark_page_dirty_in_slot(ghc->memslot, ghc->gpa >> PAGE_SHIFT);
 
 	return 0;
 }
@@ -1643,9 +1641,8 @@ int kvm_clear_guest(struct kvm *kvm, gpa_t gpa, unsigned long len)
 }
 EXPORT_SYMBOL_GPL(kvm_clear_guest);
 
-static void mark_page_dirty_in_slot(struct kvm *kvm,
-				    struct kvm_memory_slot *memslot,
-				    gfn_t gfn)
+void mark_page_dirty_in_slot(struct kvm_memory_slot *memslot,
+			      gfn_t gfn)
 {
 	if (memslot && memslot->dirty_bitmap) {
 		unsigned long rel_gfn = gfn - memslot->base_gfn;
@@ -1653,13 +1650,14 @@ static void mark_page_dirty_in_slot(struct kvm *kvm,
 		set_bit_le(rel_gfn, memslot->dirty_bitmap);
 	}
 }
+EXPORT_SYMBOL_GPL(mark_page_dirty_in_slot);
 
 void mark_page_dirty(struct kvm *kvm, gfn_t gfn)
 {
 	struct kvm_memory_slot *memslot;
 
 	memslot = gfn_to_memslot(kvm, gfn);
-	mark_page_dirty_in_slot(kvm, memslot, gfn);
+	mark_page_dirty_in_slot(memslot, gfn);
 }
 EXPORT_SYMBOL_GPL(mark_page_dirty);
 
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 10/25] KVM: emulate: introduce memory_prepare callback to speed up memory access
  2014-06-09 12:58 [PATCH 00/25] KVM: x86: Speed up emulation of invalid state Paolo Bonzini
                   ` (8 preceding siblings ...)
  2014-06-09 12:58 ` [PATCH 09/25] KVM: export mark_page_dirty_in_slot Paolo Bonzini
@ 2014-06-09 12:58 ` Paolo Bonzini
  2014-06-09 12:58 ` [PATCH 11/25] KVM: emulate: activate memory access optimization Paolo Bonzini
                   ` (14 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: Paolo Bonzini @ 2014-06-09 12:58 UTC (permalink / raw)
  To: linux-kernel; +Cc: bdas, gleb

Emulating a RMW instruction currently walks the page tables
twice.  To avoid this, store the virtual address in the host
and access it directly.

In fact, it turns out that the optimizations we can do
actually benefit all memory accesses.

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/asm/kvm_emulate.h | 26 +++++++++++++++
 arch/x86/kvm/x86.c                 | 67 ++++++++++++++++++++++++++++++++++++++
 include/linux/kvm_host.h           |  5 +++
 virt/kvm/kvm_main.c                |  5 ---
 4 files changed, 98 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h
index 46725e8aa8cb..1aa2adf0bb1a 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -167,6 +167,32 @@ struct x86_emulate_ops {
 				const void *new,
 				unsigned int bytes,
 				struct x86_exception *fault);
+
+	/*
+	 * memory_prepare: Prepare userspace access fastpath.
+	 *   @addr:     [IN ] Linear address to access.
+	 *   @bytes:    [IN ] Number of bytes to access.
+	 *   @write:    [IN ] True if *p_hva will be written to.
+	 *   @p_opaque: [OUT] Value passed back to memory_finish.
+	 *   @p_hva:    [OUT] Host virtual address for __copy_from/to_user.
+	 */
+	int (*memory_prepare)(struct x86_emulate_ctxt *ctxt,
+			      unsigned long addr,
+			      unsigned int bytes,
+			      struct x86_exception *exception,
+			      bool write,
+			      void **p_opaque,
+			      unsigned long *p_hva);
+
+	/*
+	 * memory_finish: Complete userspace access fastpath.
+	 *   @opaque: [OUT] Value passed back from memory_prepare.
+	 *   @hva:    [OUT] Host virtual address computed in memory_prepare.
+	 */
+	void (*memory_finish)(struct x86_emulate_ctxt *ctxt,
+			      void *p_opaque,
+			      unsigned long p_hva);
+
 	void (*invlpg)(struct x86_emulate_ctxt *ctxt, ulong addr);
 
 	int (*pio_in_emulated)(struct x86_emulate_ctxt *ctxt,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e64b8b5cb6cd..02678c2f3721 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4279,6 +4279,71 @@ static const struct read_write_emulator_ops write_emultor = {
 	.write = true,
 };
 
+static int emulator_memory_prepare(struct x86_emulate_ctxt *ctxt,
+				   unsigned long addr,
+				   unsigned int bytes,
+				   struct x86_exception *exception,
+				   bool write,
+				   void **p_opaque,
+				   unsigned long *p_hva)
+{
+	struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt);
+	struct kvm_memory_slot *memslot;
+	int ret;
+	gpa_t gpa;
+	gfn_t gfn;
+	unsigned long hva;
+	
+	if (unlikely(((addr + bytes - 1) ^ addr) & PAGE_MASK))
+		goto no_hva;
+
+	ret = vcpu_mmio_gva_to_gpa(vcpu, addr, &gpa, exception, true);
+	if (ret != 0) {
+		if (ret < 0)
+			return X86EMUL_PROPAGATE_FAULT;
+		goto no_hva;
+	}
+
+	/* A (heavily) simplified version of kvm_gfn_to_hva_cache_init.  */
+	gfn = gpa >> PAGE_SHIFT;
+	memslot = gfn_to_memslot(vcpu->kvm, gfn);
+	if (!memslot)
+		goto no_hva;
+
+	if (write) {
+		if (memslot_is_readonly(memslot))
+			goto no_hva;
+
+		*p_opaque = memslot->dirty_bitmap ? memslot : NULL;
+	}
+
+	hva = __gfn_to_hva_memslot(memslot, gfn);
+	if (kvm_is_error_hva(hva))
+		goto no_hva;
+
+	*p_hva = hva + offset_in_page(gpa);
+	return X86EMUL_CONTINUE;
+
+no_hva:
+	*p_hva = KVM_HVA_ERR_BAD;
+	return X86EMUL_CONTINUE;
+}
+
+static void emulator_memory_finish(struct x86_emulate_ctxt *ctxt,
+				   void *opaque,
+				   unsigned long hva)
+{
+	struct kvm_memory_slot *memslot;
+	gfn_t gfn;
+
+	if (!opaque)
+		return;
+
+	memslot = opaque;
+	gfn = hva_to_gfn_memslot(hva, memslot);
+	mark_page_dirty_in_slot(memslot, gfn);
+}
+
 static int emulator_read_write_onepage(unsigned long addr, void *val,
 				       unsigned int bytes,
 				       struct x86_exception *exception,
@@ -4823,6 +4888,8 @@ static const struct x86_emulate_ops emulate_ops = {
 	.read_std            = kvm_read_guest_virt_system,
 	.write_std           = kvm_write_guest_virt_system,
 	.fetch               = kvm_fetch_guest_virt,
+	.memory_prepare      = emulator_memory_prepare,
+	.memory_finish       = emulator_memory_finish,
 	.read_emulated       = emulator_read_emulated,
 	.write_emulated      = emulator_write_emulated,
 	.cmpxchg_emulated    = emulator_cmpxchg_emulated,
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 5be9805b0aeb..91f303dd0fe5 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -842,6 +842,11 @@ search_memslots(struct kvm_memslots *slots, gfn_t gfn)
 	return NULL;
 }
 
+static inline bool memslot_is_readonly(struct kvm_memory_slot *slot)
+{
+	return slot->flags & KVM_MEM_READONLY;
+}
+
 static inline struct kvm_memory_slot *
 __gfn_to_memslot(struct kvm_memslots *slots, gfn_t gfn)
 {
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 2f9bc20ae2a7..09b19afb2c11 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1028,11 +1028,6 @@ out:
 	return size;
 }
 
-static bool memslot_is_readonly(struct kvm_memory_slot *slot)
-{
-	return slot->flags & KVM_MEM_READONLY;
-}
-
 static unsigned long __gfn_to_hva_many(struct kvm_memory_slot *slot, gfn_t gfn,
 				       gfn_t *nr_pages, bool write)
 {
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 11/25] KVM: emulate: activate memory access optimization
  2014-06-09 12:58 [PATCH 00/25] KVM: x86: Speed up emulation of invalid state Paolo Bonzini
                   ` (9 preceding siblings ...)
  2014-06-09 12:58 ` [PATCH 10/25] KVM: emulate: introduce memory_prepare callback to speed up memory access Paolo Bonzini
@ 2014-06-09 12:58 ` Paolo Bonzini
  2014-06-09 12:59 ` [PATCH 12/25] KVM: emulate: extend memory access optimization to stores Paolo Bonzini
                   ` (13 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: Paolo Bonzini @ 2014-06-09 12:58 UTC (permalink / raw)
  To: linux-kernel; +Cc: bdas, gleb

memory_prepare lets us replace segmented_read/segmented_write with direct
calls to __copy_from_user/__copy_to_user.  For RMW instructions, we also
avoid double walking of the page tables.

This saves about 50 cycles (5%) on arithmetic with a memory source
operand, and up to 300 cycles (20%) on arithmetic with a memory
destination operand.

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/asm/kvm_emulate.h |  2 ++
 arch/x86/kvm/emulate.c             | 72 +++++++++++++++++++++++++++++++++++---
 2 files changed, 69 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h
index 1aa2adf0bb1a..f596976a3ab2 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -263,6 +263,8 @@ struct operand {
 		u64 mm_val;
 		void *data;
 	};
+	unsigned long hva;
+	void *opaque;
 };
 
 struct fetch_cache {
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 7e9dc2d6fd44..594cb560947c 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1260,6 +1260,29 @@ read_cached:
 	return X86EMUL_CONTINUE;
 }
 
+static int read_from_user(struct x86_emulate_ctxt *ctxt,
+			  unsigned long hva, void *dest, unsigned size)
+{
+	int rc;
+	struct read_cache *mc = &ctxt->mem_read;
+
+	if (mc->pos < mc->end)
+		goto read_cached;
+
+	WARN_ON((mc->end + size) >= sizeof(mc->data));
+
+	rc = __copy_from_user(mc->data + mc->end, (void __user *)hva, size);
+	if (rc < 0)
+		return X86EMUL_UNHANDLEABLE;
+
+	mc->end += size;
+
+read_cached:
+	memcpy(dest, mc->data + mc->pos, size);
+	mc->pos += size;
+	return X86EMUL_CONTINUE;
+}
+
 static int segmented_read(struct x86_emulate_ctxt *ctxt,
 			  struct segmented_address addr,
 			  void *data,
@@ -1565,9 +1588,36 @@ static int load_segment_descriptor(struct x86_emulate_ctxt *ctxt,
 }
 
 static int prepare_memory_operand(struct x86_emulate_ctxt *ctxt,
-				  struct operand *op)
+				  struct operand *op,
+				  bool write)
 {
-	return segmented_read(ctxt, op->addr.mem, &op->val, op->bytes);
+	int rc;
+	unsigned long gva;
+	unsigned int size = op->bytes;
+
+	rc = linearize(ctxt, op->addr.mem, size, write, &gva);
+	if (rc != X86EMUL_CONTINUE)
+		return rc;
+
+	rc = ctxt->ops->memory_prepare(ctxt, gva, size,
+					&ctxt->exception, true,
+					&op->opaque, &op->hva);
+	if (rc != X86EMUL_CONTINUE)
+		return rc;
+
+	if (likely(!kvm_is_error_hva(op->hva))) {
+		rc = read_from_user(ctxt, op->hva, &op->val, size);
+		if (!write)
+			ctxt->ops->memory_finish(ctxt, op->opaque, op->hva);
+
+		if (likely(rc == X86EMUL_CONTINUE))
+			return X86EMUL_CONTINUE;
+
+		/* Should not happen.  */
+		op->hva = KVM_HVA_ERR_BAD;
+	}
+
+	return read_emulated(ctxt, gva, &op->val, size);
 }
 
 static int cmpxchg_memory_operand(struct x86_emulate_ctxt *ctxt,
@@ -1582,6 +1632,17 @@ static int cmpxchg_memory_operand(struct x86_emulate_ctxt *ctxt,
 static int write_memory_operand(struct x86_emulate_ctxt *ctxt,
 				struct operand *op)
 {
+	int rc;
+
+	if (likely(!kvm_is_error_hva(op->hva))) {
+		rc = __copy_to_user((void __user *)op->hva, &op->val,
+				    op->bytes);
+		ctxt->ops->memory_finish(ctxt, op->opaque, op->hva);
+
+		if (likely(!rc))
+			return X86EMUL_CONTINUE;
+	}
+
 	return segmented_write(ctxt, op->addr.mem,
 			       &op->val,
 			       op->bytes);
@@ -4638,14 +4699,14 @@ int x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
 	}
 
 	if ((ctxt->src.type == OP_MEM) && !(ctxt->d & NoAccess)) {
-		rc = prepare_memory_operand(ctxt, &ctxt->src);
+		rc = prepare_memory_operand(ctxt, &ctxt->src, false);
 		if (rc != X86EMUL_CONTINUE)
 			goto done;
 		ctxt->src.orig_val64 = ctxt->src.val64;
 	}
 
 	if (ctxt->src2.type == OP_MEM) {
-		rc = prepare_memory_operand(ctxt, &ctxt->src2);
+		rc = prepare_memory_operand(ctxt, &ctxt->src2, false);
 		if (rc != X86EMUL_CONTINUE)
 			goto done;
 	}
@@ -4656,7 +4717,8 @@ int x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
 
 	if ((ctxt->dst.type == OP_MEM) && !(ctxt->d & Mov)) {
 		/* optimisation - avoid slow emulated read if Mov */
-		rc = prepare_memory_operand(ctxt, &ctxt->dst);
+		rc = prepare_memory_operand(ctxt, &ctxt->dst,
+					    !(ctxt->d & NoWrite));
 		if (rc != X86EMUL_CONTINUE)
 			goto done;
 	}
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 12/25] KVM: emulate: extend memory access optimization to stores
  2014-06-09 12:58 [PATCH 00/25] KVM: x86: Speed up emulation of invalid state Paolo Bonzini
                   ` (10 preceding siblings ...)
  2014-06-09 12:58 ` [PATCH 11/25] KVM: emulate: activate memory access optimization Paolo Bonzini
@ 2014-06-09 12:59 ` Paolo Bonzini
  2014-06-09 18:40   ` Bandan Das
  2014-06-09 12:59 ` [PATCH 13/25] KVM: emulate: move init_decode_cache to emulate.c Paolo Bonzini
                   ` (12 subsequent siblings)
  24 siblings, 1 reply; 28+ messages in thread
From: Paolo Bonzini @ 2014-06-09 12:59 UTC (permalink / raw)
  To: linux-kernel; +Cc: bdas, gleb

Even on a store the optimization saves about 50 clock cycles,
mostly because the jump in write_memory_operand becomes much more
predictable.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/emulate.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 594cb560947c..eaf0853ffaf9 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1589,7 +1589,7 @@ static int load_segment_descriptor(struct x86_emulate_ctxt *ctxt,
 
 static int prepare_memory_operand(struct x86_emulate_ctxt *ctxt,
 				  struct operand *op,
-				  bool write)
+				  bool read, bool write)
 {
 	int rc;
 	unsigned long gva;
@@ -1605,6 +1605,10 @@ static int prepare_memory_operand(struct x86_emulate_ctxt *ctxt,
 	if (rc != X86EMUL_CONTINUE)
 		return rc;
 
+	/* optimisation - avoid slow emulated read if Mov */
+	if (!read)
+		return X86EMUL_CONTINUE;
+
 	if (likely(!kvm_is_error_hva(op->hva))) {
 		rc = read_from_user(ctxt, op->hva, &op->val, size);
 		if (!write)
@@ -4699,14 +4703,14 @@ int x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
 	}
 
 	if ((ctxt->src.type == OP_MEM) && !(ctxt->d & NoAccess)) {
-		rc = prepare_memory_operand(ctxt, &ctxt->src, false);
+		rc = prepare_memory_operand(ctxt, &ctxt->src, true, false);
 		if (rc != X86EMUL_CONTINUE)
 			goto done;
 		ctxt->src.orig_val64 = ctxt->src.val64;
 	}
 
 	if (ctxt->src2.type == OP_MEM) {
-		rc = prepare_memory_operand(ctxt, &ctxt->src2, false);
+		rc = prepare_memory_operand(ctxt, &ctxt->src2, true, false);
 		if (rc != X86EMUL_CONTINUE)
 			goto done;
 	}
@@ -4715,9 +4719,9 @@ int x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
 		goto special_insn;
 
 
-	if ((ctxt->dst.type == OP_MEM) && !(ctxt->d & Mov)) {
-		/* optimisation - avoid slow emulated read if Mov */
+	if (ctxt->dst.type == OP_MEM) {
 		rc = prepare_memory_operand(ctxt, &ctxt->dst,
+					    !(ctxt->d & Mov),
 					    !(ctxt->d & NoWrite));
 		if (rc != X86EMUL_CONTINUE)
 			goto done;
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 13/25] KVM: emulate: move init_decode_cache to emulate.c
  2014-06-09 12:58 [PATCH 00/25] KVM: x86: Speed up emulation of invalid state Paolo Bonzini
                   ` (11 preceding siblings ...)
  2014-06-09 12:59 ` [PATCH 12/25] KVM: emulate: extend memory access optimization to stores Paolo Bonzini
@ 2014-06-09 12:59 ` Paolo Bonzini
  2014-06-09 12:59 ` [PATCH 14/25] KVM: emulate: Remove ctxt->intercept and ctxt->check_perm checks Paolo Bonzini
                   ` (11 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: Paolo Bonzini @ 2014-06-09 12:59 UTC (permalink / raw)
  To: linux-kernel; +Cc: bdas, gleb, Bandan Das

From: Bandan Das <bsd@redhat.com>

Core emulator functions all belong in emulator.c,
x86 should have no knowledge of emulator internals

Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/asm/kvm_emulate.h |  1 +
 arch/x86/kvm/emulate.c             | 13 +++++++++++++
 arch/x86/kvm/x86.c                 | 13 -------------
 3 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h
index f596976a3ab2..b8456835eddb 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -436,6 +436,7 @@ bool x86_page_table_writing_insn(struct x86_emulate_ctxt *ctxt);
 #define EMULATION_OK 0
 #define EMULATION_RESTART 1
 #define EMULATION_INTERCEPTED 2
+void init_decode_cache(struct x86_emulate_ctxt *ctxt);
 int x86_emulate_insn(struct x86_emulate_ctxt *ctxt);
 int emulator_task_switch(struct x86_emulate_ctxt *ctxt,
 			 u16 tss_selector, int idt_index, int reason,
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index eaf0853ffaf9..34a04ba26c23 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -4608,6 +4608,19 @@ static int fastop(struct x86_emulate_ctxt *ctxt, void (*fop)(struct fastop *))
 	return X86EMUL_CONTINUE;
 }
 
+void init_decode_cache(struct x86_emulate_ctxt *ctxt)
+{
+	memset(&ctxt->opcode_len, 0,
+	       (void *)&ctxt->_regs - (void *)&ctxt->opcode_len);
+
+	ctxt->fetch.start = 0;
+	ctxt->fetch.end = 0;
+	ctxt->io_read.pos = 0;
+	ctxt->io_read.end = 0;
+	ctxt->mem_read.pos = 0;
+	ctxt->mem_read.end = 0;
+}
+
 int x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
 {
 	const struct x86_emulate_ops *ops = ctxt->ops;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 02678c2f3721..401bdef24d54 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4952,19 +4952,6 @@ static void inject_emulated_exception(struct kvm_vcpu *vcpu)
 		kvm_queue_exception(vcpu, ctxt->exception.vector);
 }
 
-static void init_decode_cache(struct x86_emulate_ctxt *ctxt)
-{
-	memset(&ctxt->opcode_len, 0,
-	       (void *)&ctxt->_regs - (void *)&ctxt->opcode_len);
-
-	ctxt->fetch.start = 0;
-	ctxt->fetch.end = 0;
-	ctxt->io_read.pos = 0;
-	ctxt->io_read.end = 0;
-	ctxt->mem_read.pos = 0;
-	ctxt->mem_read.end = 0;
-}
-
 static void init_emulate_ctxt(struct kvm_vcpu *vcpu)
 {
 	struct x86_emulate_ctxt *ctxt = &vcpu->arch.emulate_ctxt;
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 14/25] KVM: emulate: Remove ctxt->intercept and ctxt->check_perm checks
  2014-06-09 12:58 [PATCH 00/25] KVM: x86: Speed up emulation of invalid state Paolo Bonzini
                   ` (12 preceding siblings ...)
  2014-06-09 12:59 ` [PATCH 13/25] KVM: emulate: move init_decode_cache to emulate.c Paolo Bonzini
@ 2014-06-09 12:59 ` Paolo Bonzini
  2014-06-09 12:59 ` [PATCH 15/25] KVM: emulate: cleanup decode_modrm Paolo Bonzini
                   ` (10 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: Paolo Bonzini @ 2014-06-09 12:59 UTC (permalink / raw)
  To: linux-kernel; +Cc: bdas, gleb, Bandan Das

From: Bandan Das <bsd@redhat.com>

The same information can be gleaned from ctxt->d and avoids having
to zero/NULL initialize intercept and check_perm

Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/emulate.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 34a04ba26c23..75e26bd4a21b 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -4673,7 +4673,7 @@ int x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
 				fetch_possible_mmx_operand(ctxt, &ctxt->dst);
 		}
 
-		if (unlikely(ctxt->guest_mode) && ctxt->intercept) {
+		if (unlikely(ctxt->guest_mode) && (ctxt->d & Intercept)) {
 			rc = emulator_check_intercept(ctxt, ctxt->intercept,
 						      X86_ICPT_PRE_EXCEPT);
 			if (rc != X86EMUL_CONTINUE)
@@ -4693,13 +4693,13 @@ int x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
 		}
 
 		/* Do instruction specific permission checks */
-		if (ctxt->check_perm) {
+		if (ctxt->d & CheckPerm) {
 			rc = ctxt->check_perm(ctxt);
 			if (rc != X86EMUL_CONTINUE)
 				goto done;
 		}
 
-		if (unlikely(ctxt->guest_mode) && ctxt->intercept) {
+		if (unlikely(ctxt->guest_mode) && (ctxt->d & Intercept)) {
 			rc = emulator_check_intercept(ctxt, ctxt->intercept,
 						      X86_ICPT_POST_EXCEPT);
 			if (rc != X86EMUL_CONTINUE)
@@ -4743,7 +4743,7 @@ int x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
 
 special_insn:
 
-	if (unlikely(ctxt->guest_mode) && ctxt->intercept) {
+	if (unlikely(ctxt->guest_mode) && (ctxt->d & Intercept)) {
 		rc = emulator_check_intercept(ctxt, ctxt->intercept,
 					      X86_ICPT_POST_MEMACCESS);
 		if (rc != X86EMUL_CONTINUE)
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 15/25] KVM: emulate: cleanup decode_modrm
  2014-06-09 12:58 [PATCH 00/25] KVM: x86: Speed up emulation of invalid state Paolo Bonzini
                   ` (13 preceding siblings ...)
  2014-06-09 12:59 ` [PATCH 14/25] KVM: emulate: Remove ctxt->intercept and ctxt->check_perm checks Paolo Bonzini
@ 2014-06-09 12:59 ` Paolo Bonzini
  2014-06-09 12:59 ` [PATCH 16/25] KVM: emulate: clean up initializations in init_decode_cache Paolo Bonzini
                   ` (9 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: Paolo Bonzini @ 2014-06-09 12:59 UTC (permalink / raw)
  To: linux-kernel; +Cc: bdas, gleb, Bandan Das

From: Bandan Das <bsd@redhat.com>

Remove the if conditional - that will help us avoid
an "else initialize to 0" Also, rearrange operators
for slightly better code.

Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/emulate.c | 14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 75e26bd4a21b..c92b1cfbb430 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1065,19 +1065,17 @@ static int decode_modrm(struct x86_emulate_ctxt *ctxt,
 			struct operand *op)
 {
 	u8 sib;
-	int index_reg = 0, base_reg = 0, scale;
+	int index_reg, base_reg, scale;
 	int rc = X86EMUL_CONTINUE;
 	ulong modrm_ea = 0;
 
-	if (ctxt->rex_prefix) {
-		ctxt->modrm_reg = (ctxt->rex_prefix & 4) << 1;	/* REX.R */
-		index_reg = (ctxt->rex_prefix & 2) << 2; /* REX.X */
-		ctxt->modrm_rm = base_reg = (ctxt->rex_prefix & 1) << 3; /* REG.B */
-	}
+	ctxt->modrm_reg = ((ctxt->rex_prefix << 1) & 8); /* REX.R */
+	index_reg = (ctxt->rex_prefix << 2) & 8; /* REX.X */
+	base_reg = (ctxt->rex_prefix << 3) & 8; /* REX.B */
 
-	ctxt->modrm_mod |= (ctxt->modrm & 0xc0) >> 6;
+	ctxt->modrm_mod = (ctxt->modrm & 0xc0) >> 6;
 	ctxt->modrm_reg |= (ctxt->modrm & 0x38) >> 3;
-	ctxt->modrm_rm |= (ctxt->modrm & 0x07);
+	ctxt->modrm_rm = base_reg | (ctxt->modrm & 0x07);
 	ctxt->modrm_seg = VCPU_SREG_DS;
 
 	if (ctxt->modrm_mod == 3 || (ctxt->d & NoMod)) {
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 16/25] KVM: emulate: clean up initializations in init_decode_cache
  2014-06-09 12:58 [PATCH 00/25] KVM: x86: Speed up emulation of invalid state Paolo Bonzini
                   ` (14 preceding siblings ...)
  2014-06-09 12:59 ` [PATCH 15/25] KVM: emulate: cleanup decode_modrm Paolo Bonzini
@ 2014-06-09 12:59 ` Paolo Bonzini
  2014-06-09 12:59 ` [PATCH 17/25] KVM: emulate: rework seg_override Paolo Bonzini
                   ` (8 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: Paolo Bonzini @ 2014-06-09 12:59 UTC (permalink / raw)
  To: linux-kernel; +Cc: bdas, gleb, Bandan Das

From: Bandan Das <bsd@redhat.com>

A lot of initializations are unnecessary as they get set to
appropriate values before actually being used. Optimize
placement of fields in x86_emulate_ctxt

Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/asm/kvm_emulate.h | 20 +++++++++++---------
 arch/x86/kvm/emulate.c             |  7 ++-----
 2 files changed, 13 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h
index b8456835eddb..8f03e7158800 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -315,30 +315,32 @@ struct x86_emulate_ctxt {
 	u8 opcode_len;
 	u8 b;
 	u8 intercept;
-	u8 lock_prefix;
-	u8 rep_prefix;
 	u8 op_bytes;
 	u8 ad_bytes;
-	u8 rex_prefix;
 	struct operand src;
 	struct operand src2;
 	struct operand dst;
-	bool has_seg_override;
-	u8 seg_override;
-	u64 d;
 	int (*execute)(struct x86_emulate_ctxt *ctxt);
 	int (*check_perm)(struct x86_emulate_ctxt *ctxt);
+	bool has_seg_override;
+	bool rip_relative;
+	u8 rex_prefix;
+	u8 lock_prefix;
+	u8 rep_prefix;
+	u8 seg_override;
+	/* bitmaps of registers in _regs[] that can be read */
+	u32 regs_valid;
+	/* bitmaps of registers in _regs[] that have been written */
+	u32 regs_dirty;
 	/* modrm */
 	u8 modrm;
 	u8 modrm_mod;
 	u8 modrm_reg;
 	u8 modrm_rm;
 	u8 modrm_seg;
-	bool rip_relative;
+	u64 d;
 	unsigned long _eip;
 	struct operand memop;
-	u32 regs_valid;  /* bitmaps of registers in _regs[] that can be read */
-	u32 regs_dirty;  /* bitmaps of registers in _regs[] that have been written */
 	/* Fields above regs are cleared together. */
 	unsigned long _regs[NR_VCPU_REGS];
 	struct operand *memopp;
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index c92b1cfbb430..cef069e17e2e 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -4608,14 +4608,11 @@ static int fastop(struct x86_emulate_ctxt *ctxt, void (*fop)(struct fastop *))
 
 void init_decode_cache(struct x86_emulate_ctxt *ctxt)
 {
-	memset(&ctxt->opcode_len, 0,
-	       (void *)&ctxt->_regs - (void *)&ctxt->opcode_len);
+	memset(&ctxt->has_seg_override, 0,
+	       (void *)&ctxt->modrm - (void *)&ctxt->has_seg_override);
 
-	ctxt->fetch.start = 0;
-	ctxt->fetch.end = 0;
 	ctxt->io_read.pos = 0;
 	ctxt->io_read.end = 0;
-	ctxt->mem_read.pos = 0;
 	ctxt->mem_read.end = 0;
 }
 
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 17/25] KVM: emulate: rework seg_override
  2014-06-09 12:58 [PATCH 00/25] KVM: x86: Speed up emulation of invalid state Paolo Bonzini
                   ` (15 preceding siblings ...)
  2014-06-09 12:59 ` [PATCH 16/25] KVM: emulate: clean up initializations in init_decode_cache Paolo Bonzini
@ 2014-06-09 12:59 ` Paolo Bonzini
  2014-06-09 12:59 ` [PATCH 18/25] KVM: emulate: do not initialize memopp Paolo Bonzini
                   ` (7 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: Paolo Bonzini @ 2014-06-09 12:59 UTC (permalink / raw)
  To: linux-kernel; +Cc: bdas, gleb, Bandan Das

From: Bandan Das <bsd@redhat.com>

x86_decode_insn already sets a default for seg_override,
so remove it from the zeroed area. Also replace set/get functions
with direct access to the field.

Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/asm/kvm_emulate.h |  3 +--
 arch/x86/kvm/emulate.c             | 41 +++++++++++++++-----------------------
 2 files changed, 17 insertions(+), 27 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h
index 8f03e7158800..268bbd4aae99 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -322,12 +322,10 @@ struct x86_emulate_ctxt {
 	struct operand dst;
 	int (*execute)(struct x86_emulate_ctxt *ctxt);
 	int (*check_perm)(struct x86_emulate_ctxt *ctxt);
-	bool has_seg_override;
 	bool rip_relative;
 	u8 rex_prefix;
 	u8 lock_prefix;
 	u8 rep_prefix;
-	u8 seg_override;
 	/* bitmaps of registers in _regs[] that can be read */
 	u32 regs_valid;
 	/* bitmaps of registers in _regs[] that have been written */
@@ -338,6 +336,7 @@ struct x86_emulate_ctxt {
 	u8 modrm_reg;
 	u8 modrm_rm;
 	u8 modrm_seg;
+	u8 seg_override;
 	u64 d;
 	unsigned long _eip;
 	struct operand memop;
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index cef069e17e2e..05f7bb416fc2 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -513,12 +513,6 @@ static u32 desc_limit_scaled(struct desc_struct *desc)
 	return desc->g ? (limit << 12) | 0xfff : limit;
 }
 
-static void set_seg_override(struct x86_emulate_ctxt *ctxt, int seg)
-{
-	ctxt->has_seg_override = true;
-	ctxt->seg_override = seg;
-}
-
 static unsigned long seg_base(struct x86_emulate_ctxt *ctxt, int seg)
 {
 	if (ctxt->mode == X86EMUL_MODE_PROT64 && seg < VCPU_SREG_FS)
@@ -527,14 +521,6 @@ static unsigned long seg_base(struct x86_emulate_ctxt *ctxt, int seg)
 	return ctxt->ops->get_cached_segment_base(ctxt, seg);
 }
 
-static unsigned seg_override(struct x86_emulate_ctxt *ctxt)
-{
-	if (!ctxt->has_seg_override)
-		return 0;
-
-	return ctxt->seg_override;
-}
-
 static int emulate_exception(struct x86_emulate_ctxt *ctxt, int vec,
 			     u32 error, bool valid)
 {
@@ -4243,7 +4229,7 @@ static int decode_operand(struct x86_emulate_ctxt *ctxt, struct operand *op,
 		op->bytes = (ctxt->d & ByteOp) ? 1 : ctxt->op_bytes;
 		op->addr.mem.ea =
 			register_address(ctxt, reg_read(ctxt, VCPU_REGS_RSI));
-		op->addr.mem.seg = seg_override(ctxt);
+		op->addr.mem.seg = ctxt->seg_override;
 		op->val = 0;
 		op->count = 1;
 		break;
@@ -4254,7 +4240,7 @@ static int decode_operand(struct x86_emulate_ctxt *ctxt, struct operand *op,
 			register_address(ctxt,
 				reg_read(ctxt, VCPU_REGS_RBX) +
 				(reg_read(ctxt, VCPU_REGS_RAX) & 0xff));
-		op->addr.mem.seg = seg_override(ctxt);
+		op->addr.mem.seg = ctxt->seg_override;
 		op->val = 0;
 		break;
 	case OpImmFAddr:
@@ -4301,6 +4287,7 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void *insn, int insn_len)
 	int mode = ctxt->mode;
 	int def_op_bytes, def_ad_bytes, goffset, simd_prefix;
 	bool op_prefix = false;
+	bool has_seg_override = false;
 	struct opcode opcode;
 
 	ctxt->memop.type = OP_NONE;
@@ -4354,11 +4341,13 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void *insn, int insn_len)
 		case 0x2e:	/* CS override */
 		case 0x36:	/* SS override */
 		case 0x3e:	/* DS override */
-			set_seg_override(ctxt, (ctxt->b >> 3) & 3);
+			has_seg_override = true;
+			ctxt->seg_override = (ctxt->b >> 3) & 3;
 			break;
 		case 0x64:	/* FS override */
 		case 0x65:	/* GS override */
-			set_seg_override(ctxt, ctxt->b & 7);
+			has_seg_override = true;
+			ctxt->seg_override = ctxt->b & 7;
 			break;
 		case 0x40 ... 0x4f: /* REX */
 			if (mode != X86EMUL_MODE_PROT64)
@@ -4496,17 +4485,19 @@ done_prefixes:
 	/* ModRM and SIB bytes. */
 	if (ctxt->d & ModRM) {
 		rc = decode_modrm(ctxt, &ctxt->memop);
-		if (!ctxt->has_seg_override)
-			set_seg_override(ctxt, ctxt->modrm_seg);
+		if (!has_seg_override) {
+			has_seg_override = true;
+			ctxt->seg_override = ctxt->modrm_seg;
+		}
 	} else if (ctxt->d & MemAbs)
 		rc = decode_abs(ctxt, &ctxt->memop);
 	if (rc != X86EMUL_CONTINUE)
 		goto done;
 
-	if (!ctxt->has_seg_override)
-		set_seg_override(ctxt, VCPU_SREG_DS);
+	if (!has_seg_override)
+		ctxt->seg_override = VCPU_SREG_DS;
 
-	ctxt->memop.addr.mem.seg = seg_override(ctxt);
+	ctxt->memop.addr.mem.seg = ctxt->seg_override;
 
 	if (ctxt->memop.type == OP_MEM && ctxt->ad_bytes != 8)
 		ctxt->memop.addr.mem.ea = (u32)ctxt->memop.addr.mem.ea;
@@ -4608,8 +4599,8 @@ static int fastop(struct x86_emulate_ctxt *ctxt, void (*fop)(struct fastop *))
 
 void init_decode_cache(struct x86_emulate_ctxt *ctxt)
 {
-	memset(&ctxt->has_seg_override, 0,
-	       (void *)&ctxt->modrm - (void *)&ctxt->has_seg_override);
+	memset(&ctxt->rip_relative, 0,
+	       (void *)&ctxt->modrm - (void *)&ctxt->rip_relative);
 
 	ctxt->io_read.pos = 0;
 	ctxt->io_read.end = 0;
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 18/25] KVM: emulate: do not initialize memopp
  2014-06-09 12:58 [PATCH 00/25] KVM: x86: Speed up emulation of invalid state Paolo Bonzini
                   ` (16 preceding siblings ...)
  2014-06-09 12:59 ` [PATCH 17/25] KVM: emulate: rework seg_override Paolo Bonzini
@ 2014-06-09 12:59 ` Paolo Bonzini
  2014-06-09 12:59 ` [PATCH 19/25] KVM: emulate: speed up do_insn_fetch Paolo Bonzini
                   ` (6 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: Paolo Bonzini @ 2014-06-09 12:59 UTC (permalink / raw)
  To: linux-kernel; +Cc: bdas, gleb, Bandan Das

From: Bandan Das <bsd@redhat.com>

rip_relative is only set if decode_modrm runs, and if you have ModRM
you will also have a memopp.  We can then access memopp unconditionally.
Note that rip_relative cannot be hoisted up to decode_modrm, or you
break "mov $0, xyz(%rip)".

Also, move typecast on "out of range value" of mem.ea to decode_modrm.

Together, all these optimizations save about 50 cycles on each emulated
instructions (4-6%).

Signed-off-by: Bandan Das <bsd@redhat.com>
[Fix immediate operands with rip-relative addressing. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/asm/kvm_emulate.h | 5 +++++
 arch/x86/kvm/emulate.c             | 8 ++++----
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h
index 268bbd4aae99..692b0bfadf91 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -322,6 +322,11 @@ struct x86_emulate_ctxt {
 	struct operand dst;
 	int (*execute)(struct x86_emulate_ctxt *ctxt);
 	int (*check_perm)(struct x86_emulate_ctxt *ctxt);
+	/*
+	 * The following six fields are cleared together,
+	 * the rest are initialized unconditionally in x86_decode_insn
+	 * or elsewhere
+	 */
 	bool rip_relative;
 	u8 rex_prefix;
 	u8 lock_prefix;
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 05f7bb416fc2..32f38a3467a0 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1176,6 +1176,9 @@ static int decode_modrm(struct x86_emulate_ctxt *ctxt,
 		}
 	}
 	op->addr.mem.ea = modrm_ea;
+	if (ctxt->ad_bytes != 8)
+		ctxt->memop.addr.mem.ea = (u32)ctxt->memop.addr.mem.ea;
+
 done:
 	return rc;
 }
@@ -4499,9 +4502,6 @@ done_prefixes:
 
 	ctxt->memop.addr.mem.seg = ctxt->seg_override;
 
-	if (ctxt->memop.type == OP_MEM && ctxt->ad_bytes != 8)
-		ctxt->memop.addr.mem.ea = (u32)ctxt->memop.addr.mem.ea;
-
 	/*
 	 * Decode and fetch the source operand: register, memory
 	 * or immediate.
@@ -4522,7 +4522,7 @@ done_prefixes:
 	rc = decode_operand(ctxt, &ctxt->dst, (ctxt->d >> DstShift) & OpMask);
 
 done:
-	if (ctxt->memopp && ctxt->memopp->type == OP_MEM && ctxt->rip_relative)
+	if (ctxt->rip_relative)
 		ctxt->memopp->addr.mem.ea += ctxt->_eip;
 
 	return (rc != X86EMUL_CONTINUE) ? EMULATION_FAILED : EMULATION_OK;
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 19/25] KVM: emulate: speed up do_insn_fetch
  2014-06-09 12:58 [PATCH 00/25] KVM: x86: Speed up emulation of invalid state Paolo Bonzini
                   ` (17 preceding siblings ...)
  2014-06-09 12:59 ` [PATCH 18/25] KVM: emulate: do not initialize memopp Paolo Bonzini
@ 2014-06-09 12:59 ` Paolo Bonzini
  2014-06-09 12:59 ` [PATCH 20/25] KVM: emulate: avoid repeated calls to do_insn_fetch_bytes Paolo Bonzini
                   ` (5 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: Paolo Bonzini @ 2014-06-09 12:59 UTC (permalink / raw)
  To: linux-kernel; +Cc: bdas, gleb

Hoist the common case up from do_insn_fetch_byte to do_insn_fetch,
and prime the fetch_cache in x86_decode_insn.  This helps a bit the
compiler and the branch predictor, but above all it lays the
ground for further changes in the next few patches.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/emulate.c | 67 +++++++++++++++++++++++++++-----------------------
 1 file changed, 36 insertions(+), 31 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 32f38a3467a0..aee124a4cc6c 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -704,51 +704,51 @@ static int segmented_read_std(struct x86_emulate_ctxt *ctxt,
 }
 
 /*
- * Fetch the next byte of the instruction being emulated which is pointed to
- * by ctxt->_eip, then increment ctxt->_eip.
- *
- * Also prefetch the remaining bytes of the instruction without crossing page
+ * Prefetch the remaining bytes of the instruction without crossing page
  * boundary if they are not in fetch_cache yet.
  */
-static int do_insn_fetch_byte(struct x86_emulate_ctxt *ctxt, u8 *dest)
+static int do_insn_fetch_bytes(struct x86_emulate_ctxt *ctxt)
 {
 	struct fetch_cache *fc = &ctxt->fetch;
 	int rc;
 	int size, cur_size;
-
-	if (ctxt->_eip == fc->end) {
-		unsigned long linear;
-		struct segmented_address addr = { .seg = VCPU_SREG_CS,
-						  .ea  = ctxt->_eip };
-		cur_size = fc->end - fc->start;
-		size = min(15UL - cur_size,
-			   PAGE_SIZE - offset_in_page(ctxt->_eip));
-		rc = __linearize(ctxt, addr, size, false, true, &linear);
-		if (unlikely(rc != X86EMUL_CONTINUE))
-			return rc;
-		rc = ctxt->ops->fetch(ctxt, linear, fc->data + cur_size,
-				      size, &ctxt->exception);
-		if (unlikely(rc != X86EMUL_CONTINUE))
-			return rc;
-		fc->end += size;
-	}
-	*dest = fc->data[ctxt->_eip - fc->start];
-	ctxt->_eip++;
+	unsigned long linear;
+
+	struct segmented_address addr = { .seg = VCPU_SREG_CS,
+					  .ea  = fc->end };
+	cur_size = fc->end - fc->start;
+	size = min(15UL - cur_size,
+		   PAGE_SIZE - offset_in_page(fc->end));
+	if (unlikely(size == 0))
+		return X86EMUL_UNHANDLEABLE;
+	rc = __linearize(ctxt, addr, size, false, true, &linear);
+	if (unlikely(rc != X86EMUL_CONTINUE))
+		return rc;
+	rc = ctxt->ops->fetch(ctxt, linear, fc->data + cur_size,
+			      size, &ctxt->exception);
+	if (unlikely(rc != X86EMUL_CONTINUE))
+		return rc;
+	fc->end += size;
 	return X86EMUL_CONTINUE;
 }
 
 static int do_insn_fetch(struct x86_emulate_ctxt *ctxt,
-			 void *dest, unsigned size)
+			 void *__dest, unsigned size)
 {
 	int rc;
+	struct fetch_cache *fc = &ctxt->fetch;
+	u8 *dest = __dest;
+	u8 *src = &fc->data[ctxt->_eip - fc->start];
 
-	/* x86 instructions are limited to 15 bytes. */
-	if (unlikely(ctxt->_eip + size - ctxt->eip > 15))
-		return X86EMUL_UNHANDLEABLE;
 	while (size--) {
-		rc = do_insn_fetch_byte(ctxt, dest++);
-		if (rc != X86EMUL_CONTINUE)
-			return rc;
+		if (unlikely(ctxt->_eip == fc->end)) {
+			rc = do_insn_fetch_bytes(ctxt);
+			if (rc != X86EMUL_CONTINUE)
+				return rc;
+		}
+		*dest++ = *src++;
+		ctxt->_eip++;
+		continue;
 	}
 	return X86EMUL_CONTINUE;
 }
@@ -4301,6 +4301,11 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void *insn, int insn_len)
 	ctxt->opcode_len = 1;
 	if (insn_len > 0)
 		memcpy(ctxt->fetch.data, insn, insn_len);
+	else {
+		rc = do_insn_fetch_bytes(ctxt);
+		if (rc != X86EMUL_CONTINUE)
+			return rc;
+	}
 
 	switch (mode) {
 	case X86EMUL_MODE_REAL:
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 20/25] KVM: emulate: avoid repeated calls to do_insn_fetch_bytes
  2014-06-09 12:58 [PATCH 00/25] KVM: x86: Speed up emulation of invalid state Paolo Bonzini
                   ` (18 preceding siblings ...)
  2014-06-09 12:59 ` [PATCH 19/25] KVM: emulate: speed up do_insn_fetch Paolo Bonzini
@ 2014-06-09 12:59 ` Paolo Bonzini
  2014-06-09 12:59 ` [PATCH 21/25] KVM: emulate: avoid per-byte copying in instruction fetches Paolo Bonzini
                   ` (4 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: Paolo Bonzini @ 2014-06-09 12:59 UTC (permalink / raw)
  To: linux-kernel; +Cc: bdas, gleb

do_insn_fetch_bytes will only be called once in a given insn_fetch and
insn_fetch_arr, because in fact it will only be called at most twice
for any instruction and the first call is explicit in x86_decode_insn.
This observation lets us hoist the call out of the memory copying loop.
It does not buy performance, because most fetches are one byte long
anyway, but it prepares for the next patch.

The overflow check is tricky, but correct.  Because do_insn_fetch_bytes
has already been called once, we know that fc->end is at least 15.  So
it is okay to subtract the number of bytes we want to read.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/emulate.c | 26 +++++++++++++++++---------
 1 file changed, 17 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index aee124a4cc6c..f81d7a5bc19b 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -707,7 +707,7 @@ static int segmented_read_std(struct x86_emulate_ctxt *ctxt,
  * Prefetch the remaining bytes of the instruction without crossing page
  * boundary if they are not in fetch_cache yet.
  */
-static int do_insn_fetch_bytes(struct x86_emulate_ctxt *ctxt)
+static int do_insn_fetch_bytes(struct x86_emulate_ctxt *ctxt, int op_size)
 {
 	struct fetch_cache *fc = &ctxt->fetch;
 	int rc;
@@ -719,7 +719,14 @@ static int do_insn_fetch_bytes(struct x86_emulate_ctxt *ctxt)
 	cur_size = fc->end - fc->start;
 	size = min(15UL - cur_size,
 		   PAGE_SIZE - offset_in_page(fc->end));
-	if (unlikely(size == 0))
+
+	/*
+	 * One instruction can only straddle two pages,
+	 * and one has been loaded at the beginning of
+	 * x86_decode_insn.  So, if not enough bytes
+	 * still, we must have hit the 15-byte boundary.
+	 */
+	if (unlikely(size < op_size))
 		return X86EMUL_UNHANDLEABLE;
 	rc = __linearize(ctxt, addr, size, false, true, &linear);
 	if (unlikely(rc != X86EMUL_CONTINUE))
@@ -735,17 +742,18 @@ static int do_insn_fetch_bytes(struct x86_emulate_ctxt *ctxt)
 static int do_insn_fetch(struct x86_emulate_ctxt *ctxt,
 			 void *__dest, unsigned size)
 {
-	int rc;
 	struct fetch_cache *fc = &ctxt->fetch;
 	u8 *dest = __dest;
 	u8 *src = &fc->data[ctxt->_eip - fc->start];
 
+	/* We have to be careful about overflow! */
+	if (unlikely(ctxt->_eip > fc->end - size)) {
+		int rc = do_insn_fetch_bytes(ctxt, size);
+		if (rc != X86EMUL_CONTINUE)
+			return rc;
+	}
+
 	while (size--) {
-		if (unlikely(ctxt->_eip == fc->end)) {
-			rc = do_insn_fetch_bytes(ctxt);
-			if (rc != X86EMUL_CONTINUE)
-				return rc;
-		}
 		*dest++ = *src++;
 		ctxt->_eip++;
 		continue;
@@ -4302,7 +4310,7 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void *insn, int insn_len)
 	if (insn_len > 0)
 		memcpy(ctxt->fetch.data, insn, insn_len);
 	else {
-		rc = do_insn_fetch_bytes(ctxt);
+		rc = do_insn_fetch_bytes(ctxt, 1);
 		if (rc != X86EMUL_CONTINUE)
 			return rc;
 	}
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 21/25] KVM: emulate: avoid per-byte copying in instruction fetches
  2014-06-09 12:58 [PATCH 00/25] KVM: x86: Speed up emulation of invalid state Paolo Bonzini
                   ` (19 preceding siblings ...)
  2014-06-09 12:59 ` [PATCH 20/25] KVM: emulate: avoid repeated calls to do_insn_fetch_bytes Paolo Bonzini
@ 2014-06-09 12:59 ` Paolo Bonzini
  2014-06-09 12:59 ` [PATCH 22/25] KVM: emulate: put pointers in the fetch_cache Paolo Bonzini
                   ` (3 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: Paolo Bonzini @ 2014-06-09 12:59 UTC (permalink / raw)
  To: linux-kernel; +Cc: bdas, gleb

We do not need a memory copying loop anymore in insn_fetch; we
can use a byte-aligned pointer to access instruction fields directly
from the fetch_cache.  This eliminates 50-150 cycles (corresponding to
a 5-10% improvement in performance) from each instruction.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/emulate.c | 46 ++++++++++++++++++++++------------------------
 1 file changed, 22 insertions(+), 24 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index f81d7a5bc19b..6ec0be2d2461 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -707,7 +707,7 @@ static int segmented_read_std(struct x86_emulate_ctxt *ctxt,
  * Prefetch the remaining bytes of the instruction without crossing page
  * boundary if they are not in fetch_cache yet.
  */
-static int do_insn_fetch_bytes(struct x86_emulate_ctxt *ctxt, int op_size)
+static int __do_insn_fetch_bytes(struct x86_emulate_ctxt *ctxt, int op_size)
 {
 	struct fetch_cache *fc = &ctxt->fetch;
 	int rc;
@@ -739,41 +739,39 @@ static int do_insn_fetch_bytes(struct x86_emulate_ctxt *ctxt, int op_size)
 	return X86EMUL_CONTINUE;
 }
 
-static int do_insn_fetch(struct x86_emulate_ctxt *ctxt,
-			 void *__dest, unsigned size)
+static __always_inline int do_insn_fetch_bytes(struct x86_emulate_ctxt *ctxt,
+					       unsigned size)
 {
-	struct fetch_cache *fc = &ctxt->fetch;
-	u8 *dest = __dest;
-	u8 *src = &fc->data[ctxt->_eip - fc->start];
-
 	/* We have to be careful about overflow! */
-	if (unlikely(ctxt->_eip > fc->end - size)) {
-		int rc = do_insn_fetch_bytes(ctxt, size);
-		if (rc != X86EMUL_CONTINUE)
-			return rc;
-	}
-
-	while (size--) {
-		*dest++ = *src++;
-		ctxt->_eip++;
-		continue;
-	}
-	return X86EMUL_CONTINUE;
+	if (unlikely(ctxt->_eip > ctxt->fetch.end - size))
+		return __do_insn_fetch_bytes(ctxt, size);
+	else
+		return X86EMUL_CONTINUE;
 }
 
 /* Fetch next part of the instruction being emulated. */
 #define insn_fetch(_type, _ctxt)					\
-({	unsigned long _x;						\
-	rc = do_insn_fetch(_ctxt, &_x, sizeof(_type));			\
+({	_type _x;							\
+	struct fetch_cache *_fc;					\
+									\
+	rc = do_insn_fetch_bytes(_ctxt, sizeof(_type));			\
 	if (rc != X86EMUL_CONTINUE)					\
 		goto done;						\
-	(_type)_x;							\
+	_fc = &ctxt->fetch;						\
+	_x = *(_type __aligned(1) *) &_fc->data[ctxt->_eip - _fc->start]; \
+	ctxt->_eip += sizeof(_type);					\
+	_x;								\
 })
 
 #define insn_fetch_arr(_arr, _size, _ctxt)				\
-({	rc = do_insn_fetch(_ctxt, _arr, (_size));			\
+({									\
+	struct fetch_cache *_fc;					\
+	rc = do_insn_fetch_bytes(_ctxt, _size);				\
 	if (rc != X86EMUL_CONTINUE)					\
 		goto done;						\
+	_fc = &ctxt->fetch;						\
+	memcpy(_arr, &_fc->data[ctxt->_eip - _fc->start], _size);	\
+	ctxt->_eip += (_size);						\
 })
 
 /*
@@ -4310,7 +4308,7 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void *insn, int insn_len)
 	if (insn_len > 0)
 		memcpy(ctxt->fetch.data, insn, insn_len);
 	else {
-		rc = do_insn_fetch_bytes(ctxt, 1);
+		rc = __do_insn_fetch_bytes(ctxt, 1);
 		if (rc != X86EMUL_CONTINUE)
 			return rc;
 	}
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 22/25] KVM: emulate: put pointers in the fetch_cache
  2014-06-09 12:58 [PATCH 00/25] KVM: x86: Speed up emulation of invalid state Paolo Bonzini
                   ` (20 preceding siblings ...)
  2014-06-09 12:59 ` [PATCH 21/25] KVM: emulate: avoid per-byte copying in instruction fetches Paolo Bonzini
@ 2014-06-09 12:59 ` Paolo Bonzini
  2014-06-09 12:59 ` [PATCH 23/25] KVM: x86: use kvm_read_guest_page for emulator accesses Paolo Bonzini
                   ` (2 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: Paolo Bonzini @ 2014-06-09 12:59 UTC (permalink / raw)
  To: linux-kernel; +Cc: bdas, gleb

This simplifies the code a bit, especially the overflow checks.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/asm/kvm_emulate.h |  4 ++--
 arch/x86/kvm/emulate.c             | 34 +++++++++++++++-------------------
 arch/x86/kvm/trace.h               |  6 +++---
 3 files changed, 20 insertions(+), 24 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h
index 692b0bfadf91..012fff3fea0d 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -269,8 +269,8 @@ struct operand {
 
 struct fetch_cache {
 	u8 data[15];
-	unsigned long start;
-	unsigned long end;
+	u8 *ptr;
+	u8 *end;
 };
 
 struct read_cache {
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 6ec0be2d2461..6b44f90a9bb0 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -709,16 +709,15 @@ static int segmented_read_std(struct x86_emulate_ctxt *ctxt,
  */
 static int __do_insn_fetch_bytes(struct x86_emulate_ctxt *ctxt, int op_size)
 {
-	struct fetch_cache *fc = &ctxt->fetch;
 	int rc;
-	int size, cur_size;
+	int size;
 	unsigned long linear;
-
+	int cur_size = ctxt->fetch.end - ctxt->fetch.data;
 	struct segmented_address addr = { .seg = VCPU_SREG_CS,
-					  .ea  = fc->end };
-	cur_size = fc->end - fc->start;
+					   .ea = ctxt->eip + cur_size };
+
 	size = min(15UL - cur_size,
-		   PAGE_SIZE - offset_in_page(fc->end));
+		   PAGE_SIZE - offset_in_page(addr.ea));
 
 	/*
 	 * One instruction can only straddle two pages,
@@ -731,19 +730,18 @@ static int __do_insn_fetch_bytes(struct x86_emulate_ctxt *ctxt, int op_size)
 	rc = __linearize(ctxt, addr, size, false, true, &linear);
 	if (unlikely(rc != X86EMUL_CONTINUE))
 		return rc;
-	rc = ctxt->ops->fetch(ctxt, linear, fc->data + cur_size,
+	rc = ctxt->ops->fetch(ctxt, linear, ctxt->fetch.end,
 			      size, &ctxt->exception);
 	if (unlikely(rc != X86EMUL_CONTINUE))
 		return rc;
-	fc->end += size;
+	ctxt->fetch.end += size;
 	return X86EMUL_CONTINUE;
 }
 
 static __always_inline int do_insn_fetch_bytes(struct x86_emulate_ctxt *ctxt,
 					       unsigned size)
 {
-	/* We have to be careful about overflow! */
-	if (unlikely(ctxt->_eip > ctxt->fetch.end - size))
+	if (unlikely(ctxt->fetch.end - ctxt->fetch.ptr < size))
 		return __do_insn_fetch_bytes(ctxt, size);
 	else
 		return X86EMUL_CONTINUE;
@@ -752,26 +750,24 @@ static __always_inline int do_insn_fetch_bytes(struct x86_emulate_ctxt *ctxt,
 /* Fetch next part of the instruction being emulated. */
 #define insn_fetch(_type, _ctxt)					\
 ({	_type _x;							\
-	struct fetch_cache *_fc;					\
 									\
 	rc = do_insn_fetch_bytes(_ctxt, sizeof(_type));			\
 	if (rc != X86EMUL_CONTINUE)					\
 		goto done;						\
-	_fc = &ctxt->fetch;						\
-	_x = *(_type __aligned(1) *) &_fc->data[ctxt->_eip - _fc->start]; \
 	ctxt->_eip += sizeof(_type);					\
+	_x = *(_type __aligned(1) *) ctxt->fetch.ptr;			\
+	ctxt->fetch.ptr += sizeof(_type);				\
 	_x;								\
 })
 
 #define insn_fetch_arr(_arr, _size, _ctxt)				\
 ({									\
-	struct fetch_cache *_fc;					\
 	rc = do_insn_fetch_bytes(_ctxt, _size);				\
 	if (rc != X86EMUL_CONTINUE)					\
 		goto done;						\
-	_fc = &ctxt->fetch;						\
-	memcpy(_arr, &_fc->data[ctxt->_eip - _fc->start], _size);	\
 	ctxt->_eip += (_size);						\
+	memcpy(_arr, ctxt->fetch.ptr, _size);				\
+	ctxt->fetch.ptr += (_size);					\
 })
 
 /*
@@ -4302,8 +4298,8 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void *insn, int insn_len)
 	ctxt->memop.type = OP_NONE;
 	ctxt->memopp = NULL;
 	ctxt->_eip = ctxt->eip;
-	ctxt->fetch.start = ctxt->_eip;
-	ctxt->fetch.end = ctxt->fetch.start + insn_len;
+	ctxt->fetch.ptr = ctxt->fetch.data;
+	ctxt->fetch.end = ctxt->fetch.data + insn_len;
 	ctxt->opcode_len = 1;
 	if (insn_len > 0)
 		memcpy(ctxt->fetch.data, insn, insn_len);
diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index 33574c95220d..e850a7d332be 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -721,10 +721,10 @@ TRACE_EVENT(kvm_emulate_insn,
 		),
 
 	TP_fast_assign(
-		__entry->rip = vcpu->arch.emulate_ctxt.fetch.start;
 		__entry->csbase = kvm_x86_ops->get_segment_base(vcpu, VCPU_SREG_CS);
-		__entry->len = vcpu->arch.emulate_ctxt._eip
-			       - vcpu->arch.emulate_ctxt.fetch.start;
+		__entry->len = vcpu->arch.emulate_ctxt.fetch.ptr
+			       - vcpu->arch.emulate_ctxt.fetch.data;
+		__entry->rip = vcpu->arch.emulate_ctxt._eip - __entry->len;
 		memcpy(__entry->insn,
 		       vcpu->arch.emulate_ctxt.fetch.data,
 		       15);
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 23/25] KVM: x86: use kvm_read_guest_page for emulator accesses
  2014-06-09 12:58 [PATCH 00/25] KVM: x86: Speed up emulation of invalid state Paolo Bonzini
                   ` (21 preceding siblings ...)
  2014-06-09 12:59 ` [PATCH 22/25] KVM: emulate: put pointers in the fetch_cache Paolo Bonzini
@ 2014-06-09 12:59 ` Paolo Bonzini
  2014-06-09 12:59 ` [PATCH 24/25] KVM: emulate: simplify BitOp handling Paolo Bonzini
  2014-06-09 12:59 ` [PATCH 25/25] KVM: emulate: fix harmless typo in MMX decoding Paolo Bonzini
  24 siblings, 0 replies; 28+ messages in thread
From: Paolo Bonzini @ 2014-06-09 12:59 UTC (permalink / raw)
  To: linux-kernel; +Cc: bdas, gleb

Emulator accesses are always done a page at a time, either by the emulator
itself (for fetches) or because we need to query the MMU for address
translations.  Speed up these accesses by using kvm_read_guest_page
and, in the case of fetches, by inlining kvm_read_guest_virt_helper and
dropping the loop around kvm_read_guest_page.

This final tweak saves 30-100 more clock cycles (4-10%), bringing the
count (as measured by kvm-unit-tests) down to 720-1100 clock cycles on
a Sandy Bridge Xeon host, compared to 2300-3200 before the whole series
and 925-1700 after the first two low-hanging fruit changes.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/x86.c | 23 +++++++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 401bdef24d54..179b7c39c3fe 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4081,7 +4081,8 @@ static int kvm_read_guest_virt_helper(gva_t addr, void *val, unsigned int bytes,
 
 		if (gpa == UNMAPPED_GVA)
 			return X86EMUL_PROPAGATE_FAULT;
-		ret = kvm_read_guest(vcpu->kvm, gpa, data, toread);
+		ret = kvm_read_guest_page(vcpu->kvm, gpa >> PAGE_SHIFT, data,
+					  offset, toread);
 		if (ret < 0) {
 			r = X86EMUL_IO_NEEDED;
 			goto out;
@@ -4102,10 +4103,24 @@ static int kvm_fetch_guest_virt(struct x86_emulate_ctxt *ctxt,
 {
 	struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt);
 	u32 access = (kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0;
+	unsigned offset;
+	int ret;
 
-	return kvm_read_guest_virt_helper(addr, val, bytes, vcpu,
-					  access | PFERR_FETCH_MASK,
-					  exception);
+	/* Inline kvm_read_guest_virt_helper for speed.  */
+	gpa_t gpa = vcpu->arch.walk_mmu->gva_to_gpa(vcpu, addr, access|PFERR_FETCH_MASK,
+						    exception);
+	if (unlikely(gpa == UNMAPPED_GVA))
+		return X86EMUL_PROPAGATE_FAULT;
+
+	offset = addr & (PAGE_SIZE-1);
+	if (WARN_ON(offset + bytes > PAGE_SIZE))
+		bytes = (unsigned)PAGE_SIZE - offset;
+	ret = kvm_read_guest_page(vcpu->kvm, gpa >> PAGE_SHIFT, val,
+				  offset, bytes);
+	if (unlikely(ret < 0))
+		return X86EMUL_IO_NEEDED;
+
+	return X86EMUL_CONTINUE;
 }
 
 int kvm_read_guest_virt(struct x86_emulate_ctxt *ctxt,
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 24/25] KVM: emulate: simplify BitOp handling
  2014-06-09 12:58 [PATCH 00/25] KVM: x86: Speed up emulation of invalid state Paolo Bonzini
                   ` (22 preceding siblings ...)
  2014-06-09 12:59 ` [PATCH 23/25] KVM: x86: use kvm_read_guest_page for emulator accesses Paolo Bonzini
@ 2014-06-09 12:59 ` Paolo Bonzini
  2014-06-09 12:59 ` [PATCH 25/25] KVM: emulate: fix harmless typo in MMX decoding Paolo Bonzini
  24 siblings, 0 replies; 28+ messages in thread
From: Paolo Bonzini @ 2014-06-09 12:59 UTC (permalink / raw)
  To: linux-kernel; +Cc: bdas, gleb

Memory is always the destination for BitOp instructions.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/emulate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 6b44f90a9bb0..2e8f0e674616 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -4145,7 +4145,7 @@ static int decode_operand(struct x86_emulate_ctxt *ctxt, struct operand *op,
 	mem_common:
 		*op = ctxt->memop;
 		ctxt->memopp = op;
-		if ((ctxt->d & BitOp) && op == &ctxt->dst)
+		if (ctxt->d & BitOp)
 			fetch_bit_operand(ctxt);
 		op->orig_val = op->val;
 		break;
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 25/25] KVM: emulate: fix harmless typo in MMX decoding
  2014-06-09 12:58 [PATCH 00/25] KVM: x86: Speed up emulation of invalid state Paolo Bonzini
                   ` (23 preceding siblings ...)
  2014-06-09 12:59 ` [PATCH 24/25] KVM: emulate: simplify BitOp handling Paolo Bonzini
@ 2014-06-09 12:59 ` Paolo Bonzini
  24 siblings, 0 replies; 28+ messages in thread
From: Paolo Bonzini @ 2014-06-09 12:59 UTC (permalink / raw)
  To: linux-kernel; +Cc: bdas, gleb

It was using the wrong member of the union.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/emulate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 2e8f0e674616..b8dc78f4672d 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1081,7 +1081,7 @@ static int decode_modrm(struct x86_emulate_ctxt *ctxt,
 		if (ctxt->d & Mmx) {
 			op->type = OP_MM;
 			op->bytes = 8;
-			op->addr.xmm = ctxt->modrm_rm & 7;
+			op->addr.mm = ctxt->modrm_rm & 7;
 			return rc;
 		}
 		fetch_register_operand(op);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 12/25] KVM: emulate: extend memory access optimization to stores
  2014-06-09 12:59 ` [PATCH 12/25] KVM: emulate: extend memory access optimization to stores Paolo Bonzini
@ 2014-06-09 18:40   ` Bandan Das
  2014-06-19 11:37     ` Paolo Bonzini
  0 siblings, 1 reply; 28+ messages in thread
From: Bandan Das @ 2014-06-09 18:40 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, gleb

Paolo Bonzini <pbonzini@redhat.com> writes:

> Even on a store the optimization saves about 50 clock cycles,
> mostly because the jump in write_memory_operand becomes much more
> predictable.
>
> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Isn't the reviewed-by automatically implied ? :)

> ---
>  arch/x86/kvm/emulate.c | 14 +++++++++-----
>  1 file changed, 9 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index 594cb560947c..eaf0853ffaf9 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -1589,7 +1589,7 @@ static int load_segment_descriptor(struct x86_emulate_ctxt *ctxt,
>  
>  static int prepare_memory_operand(struct x86_emulate_ctxt *ctxt,
>  				  struct operand *op,
> -				  bool write)
> +				  bool read, bool write)
>  {
>  	int rc;
>  	unsigned long gva;
> @@ -1605,6 +1605,10 @@ static int prepare_memory_operand(struct x86_emulate_ctxt *ctxt,
>  	if (rc != X86EMUL_CONTINUE)
>  		return rc;
>  
> +	/* optimisation - avoid slow emulated read if Mov */
> +	if (!read)
> +		return X86EMUL_CONTINUE;
> +
>  	if (likely(!kvm_is_error_hva(op->hva))) {
>  		rc = read_from_user(ctxt, op->hva, &op->val, size);
>  		if (!write)
> @@ -4699,14 +4703,14 @@ int x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
>  	}
>  
>  	if ((ctxt->src.type == OP_MEM) && !(ctxt->d & NoAccess)) {
> -		rc = prepare_memory_operand(ctxt, &ctxt->src, false);
> +		rc = prepare_memory_operand(ctxt, &ctxt->src, true, false);
>  		if (rc != X86EMUL_CONTINUE)
>  			goto done;
>  		ctxt->src.orig_val64 = ctxt->src.val64;
>  	}
>  
>  	if (ctxt->src2.type == OP_MEM) {
> -		rc = prepare_memory_operand(ctxt, &ctxt->src2, false);
> +		rc = prepare_memory_operand(ctxt, &ctxt->src2, true, false);
>  		if (rc != X86EMUL_CONTINUE)
>  			goto done;
>  	}
> @@ -4715,9 +4719,9 @@ int x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
>  		goto special_insn;
>  
>  
> -	if ((ctxt->dst.type == OP_MEM) && !(ctxt->d & Mov)) {
> -		/* optimisation - avoid slow emulated read if Mov */
> +	if (ctxt->dst.type == OP_MEM) {
>  		rc = prepare_memory_operand(ctxt, &ctxt->dst,
> +					    !(ctxt->d & Mov),
>  					    !(ctxt->d & NoWrite));
>  		if (rc != X86EMUL_CONTINUE)
>  			goto done;

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 12/25] KVM: emulate: extend memory access optimization to stores
  2014-06-09 18:40   ` Bandan Das
@ 2014-06-19 11:37     ` Paolo Bonzini
  0 siblings, 0 replies; 28+ messages in thread
From: Paolo Bonzini @ 2014-06-19 11:37 UTC (permalink / raw)
  To: Bandan Das; +Cc: linux-kernel, gleb

Il 09/06/2014 20:40, Bandan Das ha scritto:
> Paolo Bonzini <pbonzini@redhat.com> writes:
>
>> > Even on a store the optimization saves about 50 clock cycles,
>> > mostly because the jump in write_memory_operand becomes much more
>> > predictable.
>> >
>> > Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
>> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> Isn't the reviewed-by automatically implied ? :)
>

This patch breaks x86/vmx.flat, but apart from this the series passes 
Autotest and kvm-unit-tests.  I'm going to leave the memory access 
optimization out, and push the rest to kvm/queue.

Paolo

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2014-06-19 11:37 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-09 12:58 [PATCH 00/25] KVM: x86: Speed up emulation of invalid state Paolo Bonzini
2014-06-09 12:58 ` [PATCH 01/25] KVM: vmx: speed up emulation of invalid guest state Paolo Bonzini
2014-06-09 12:58 ` [PATCH 02/25] KVM: x86: return all bits from get_interrupt_shadow Paolo Bonzini
2014-06-09 12:58 ` [PATCH 03/25] KVM: x86: avoid useless set of KVM_REQ_EVENT after emulation Paolo Bonzini
2014-06-09 12:58 ` [PATCH 04/25] KVM: emulate: move around some checks Paolo Bonzini
2014-06-09 12:58 ` [PATCH 05/25] KVM: emulate: protect checks on ctxt->d by a common "if (unlikely())" Paolo Bonzini
2014-06-09 12:58 ` [PATCH 06/25] KVM: emulate: speed up emulated moves Paolo Bonzini
2014-06-09 12:58 ` [PATCH 07/25] KVM: emulate: simplify writeback Paolo Bonzini
2014-06-09 12:58 ` [PATCH 08/25] KVM: emulate: abstract handling of memory operands Paolo Bonzini
2014-06-09 12:58 ` [PATCH 09/25] KVM: export mark_page_dirty_in_slot Paolo Bonzini
2014-06-09 12:58 ` [PATCH 10/25] KVM: emulate: introduce memory_prepare callback to speed up memory access Paolo Bonzini
2014-06-09 12:58 ` [PATCH 11/25] KVM: emulate: activate memory access optimization Paolo Bonzini
2014-06-09 12:59 ` [PATCH 12/25] KVM: emulate: extend memory access optimization to stores Paolo Bonzini
2014-06-09 18:40   ` Bandan Das
2014-06-19 11:37     ` Paolo Bonzini
2014-06-09 12:59 ` [PATCH 13/25] KVM: emulate: move init_decode_cache to emulate.c Paolo Bonzini
2014-06-09 12:59 ` [PATCH 14/25] KVM: emulate: Remove ctxt->intercept and ctxt->check_perm checks Paolo Bonzini
2014-06-09 12:59 ` [PATCH 15/25] KVM: emulate: cleanup decode_modrm Paolo Bonzini
2014-06-09 12:59 ` [PATCH 16/25] KVM: emulate: clean up initializations in init_decode_cache Paolo Bonzini
2014-06-09 12:59 ` [PATCH 17/25] KVM: emulate: rework seg_override Paolo Bonzini
2014-06-09 12:59 ` [PATCH 18/25] KVM: emulate: do not initialize memopp Paolo Bonzini
2014-06-09 12:59 ` [PATCH 19/25] KVM: emulate: speed up do_insn_fetch Paolo Bonzini
2014-06-09 12:59 ` [PATCH 20/25] KVM: emulate: avoid repeated calls to do_insn_fetch_bytes Paolo Bonzini
2014-06-09 12:59 ` [PATCH 21/25] KVM: emulate: avoid per-byte copying in instruction fetches Paolo Bonzini
2014-06-09 12:59 ` [PATCH 22/25] KVM: emulate: put pointers in the fetch_cache Paolo Bonzini
2014-06-09 12:59 ` [PATCH 23/25] KVM: x86: use kvm_read_guest_page for emulator accesses Paolo Bonzini
2014-06-09 12:59 ` [PATCH 24/25] KVM: emulate: simplify BitOp handling Paolo Bonzini
2014-06-09 12:59 ` [PATCH 25/25] KVM: emulate: fix harmless typo in MMX decoding Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.