All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/5] KVM: speed up invalid guest state emulation
@ 2014-03-27 11:30 Paolo Bonzini
  2014-03-27 11:30 ` [RFC PATCH 1/5] KVM: vmx: speed up emulation of invalid guest state Paolo Bonzini
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Paolo Bonzini @ 2014-03-27 11:30 UTC (permalink / raw)
  To: linux-kernel; +Cc: kvm

This series identifies some low-hanging fruit for speeding up emulation
of invalid guest state.  I am a bit worried about patch 2, while
everything else should be relatively safe.

On the kvm-unit-tests microbenchmarks I get a 1.8-2.5x speedup (from 
740-1100 cycles/instruction to 280-600).

This is not for 3.15, of course.  Even patch 5 is not too useful without
the others; saving 40 clock cycles is a 10% speedup after the previous
patches, but only 4% before.

Please review carefully!

Paolo

Paolo Bonzini (5):
  KVM: vmx: speed up emulation of invalid guest state
  KVM: x86: avoid useless set of KVM_REQ_EVENT after emulation
  KVM: x86: move around some checks
  KVM: x86: protect checks on ctxt->d by a common "if (unlikely())"
  KVM: x86: speed up emulated moves

 arch/x86/include/asm/kvm_emulate.h |   2 +-
 arch/x86/kvm/emulate.c             | 182 ++++++++++++++++++++-----------------
 arch/x86/kvm/vmx.c                 |   5 +-
 arch/x86/kvm/x86.c                 |  28 ++++--
 4 files changed, 120 insertions(+), 97 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC PATCH 1/5] KVM: vmx: speed up emulation of invalid guest state
  2014-03-27 11:30 [RFC PATCH 0/5] KVM: speed up invalid guest state emulation Paolo Bonzini
@ 2014-03-27 11:30 ` Paolo Bonzini
  2014-04-16 22:52   ` Marcelo Tosatti
  2014-03-27 11:30 ` [RFC PATCH 2/5] KVM: x86: avoid useless set of KVM_REQ_EVENT after emulation Paolo Bonzini
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Paolo Bonzini @ 2014-03-27 11:30 UTC (permalink / raw)
  To: linux-kernel; +Cc: kvm

About 25% of the time spent in emulation of invalid guest state
is wasted in checking whether emulation is required for the next
instruction.  However, this almost never changes except when a
segment register (or TR or LDTR) changes, or when there is a mode
transition (i.e. CR0 changes).

In fact, vmx_set_segment and vmx_set_cr0 already modify
vmx->emulation_required (except that the former for some reason
uses |= instead of just an assignment).  So there is no need to
call guest_state_valid in the emulation loop.

Emulation performance test results indicate 530-870 cycles
for common instructions, versus 740-1110 before this patch.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 1320e0f8e611..73aa522db47b 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3629,7 +3629,7 @@ static void vmx_set_segment(struct kvm_vcpu *vcpu,
 	vmcs_write32(sf->ar_bytes, vmx_segment_access_rights(var));
 
 out:
-	vmx->emulation_required |= emulation_required(vcpu);
+	vmx->emulation_required = emulation_required(vcpu);
 }
 
 static void vmx_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, int *l)
@@ -5580,7 +5580,7 @@ static int handle_invalid_guest_state(struct kvm_vcpu *vcpu)
 	cpu_exec_ctrl = vmcs_read32(CPU_BASED_VM_EXEC_CONTROL);
 	intr_window_requested = cpu_exec_ctrl & CPU_BASED_VIRTUAL_INTR_PENDING;
 
-	while (!guest_state_valid(vcpu) && count-- != 0) {
+	while (vmx->emulation_required && count-- != 0) {
 		if (intr_window_requested && vmx_interrupt_allowed(vcpu))
 			return handle_interrupt_window(&vmx->vcpu);
 
@@ -5614,7 +5614,6 @@ static int handle_invalid_guest_state(struct kvm_vcpu *vcpu)
 			schedule();
 	}
 
-	vmx->emulation_required = emulation_required(vcpu);
 out:
 	return ret;
 }
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH 2/5] KVM: x86: avoid useless set of KVM_REQ_EVENT after emulation
  2014-03-27 11:30 [RFC PATCH 0/5] KVM: speed up invalid guest state emulation Paolo Bonzini
  2014-03-27 11:30 ` [RFC PATCH 1/5] KVM: vmx: speed up emulation of invalid guest state Paolo Bonzini
@ 2014-03-27 11:30 ` Paolo Bonzini
  2014-03-28 12:44   ` Paolo Bonzini
  2014-03-27 11:30 ` [RFC PATCH 3/5] KVM: x86: move around some checks Paolo Bonzini
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Paolo Bonzini @ 2014-03-27 11:30 UTC (permalink / raw)
  To: linux-kernel; +Cc: kvm

Despite the provisions to emulate up to 130 consecutive instructions, in
practice KVM will emulate just one before exiting handle_invalid_guest_state,
because x86_emulate_instructionn always sets KVM_REQ_EVENT.

However, we only need to do this if an interrupt could be injected,
which happens a) if an interrupt shadow bit (STI or MOV SS) has gone
away; b) if the interrupt flag has just been set (because other
instructions than STI can set it without enabling an interrupt shadow).

This cuts another 250-300 clock cycles from the cost of emulating an
instruction (530-870 cycles before the patch on kvm-unit-tests,
290-600 afterwards).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/x86.c | 28 ++++++++++++++++++----------
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fd31aada351b..ce9523345f2e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -87,6 +87,7 @@ static u64 __read_mostly efer_reserved_bits = ~((u64)EFER_SCE);
 
 static void update_cr8_intercept(struct kvm_vcpu *vcpu);
 static void process_nmi(struct kvm_vcpu *vcpu);
+static void __kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags);
 
 struct kvm_x86_ops *kvm_x86_ops;
 EXPORT_SYMBOL_GPL(kvm_x86_ops);
@@ -4856,8 +4857,10 @@ static void toggle_interruptibility(struct kvm_vcpu *vcpu, u32 mask)
 	 * means that the last instruction is an sti. We should not
 	 * leave the flag on in this case. The same goes for mov ss
 	 */
-	if (!(int_shadow & mask))
+	if (unlikely(int_shadow) && !(int_shadow & mask)) {
 		kvm_x86_ops->set_interrupt_shadow(vcpu, mask);
+		kvm_make_request(KVM_REQ_EVENT, vcpu);
+	}
 }
 
 static void inject_emulated_exception(struct kvm_vcpu *vcpu)
@@ -5083,20 +5086,18 @@ static int kvm_vcpu_check_hw_bp(unsigned long addr, u32 type, u32 dr7,
 	return dr6;
 }
 
-static void kvm_vcpu_check_singlestep(struct kvm_vcpu *vcpu, int *r)
+static void kvm_vcpu_check_singlestep(struct kvm_vcpu *vcpu, unsigned long rflags, int *r)
 {
 	struct kvm_run *kvm_run = vcpu->run;
 
 	/*
-	 * Use the "raw" value to see if TF was passed to the processor.
-	 * Note that the new value of the flags has not been saved yet.
+	 * rflags is the old, "raw" value of the flags.  The new value has
+	 * not been saved yet.
 	 *
 	 * This is correct even for TF set by the guest, because "the
 	 * processor will not generate this exception after the instruction
 	 * that sets the TF flag".
 	 */
-	unsigned long rflags = kvm_x86_ops->get_rflags(vcpu);
-
 	if (unlikely(rflags & X86_EFLAGS_TF)) {
 		if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP) {
 			kvm_run->debug.arch.dr6 = DR6_BS | DR6_FIXED_1;
@@ -5263,13 +5264,15 @@ restart:
 		r = EMULATE_DONE;
 
 	if (writeback) {
+		unsigned long rflags = kvm_x86_ops->get_rflags(vcpu);
 		toggle_interruptibility(vcpu, ctxt->interruptibility);
-		kvm_make_request(KVM_REQ_EVENT, vcpu);
 		vcpu->arch.emulate_regs_need_sync_to_vcpu = false;
 		kvm_rip_write(vcpu, ctxt->eip);
 		if (r == EMULATE_DONE)
-			kvm_vcpu_check_singlestep(vcpu, &r);
-		kvm_set_rflags(vcpu, ctxt->eflags);
+			kvm_vcpu_check_singlestep(vcpu, rflags, &r);
+		__kvm_set_rflags(vcpu, ctxt->eflags);
+		if (unlikely((ctxt->eflags & ~rflags) & X86_EFLAGS_IF))
+			kvm_make_request(KVM_REQ_EVENT, vcpu);
 	} else
 		vcpu->arch.emulate_regs_need_sync_to_vcpu = true;
 
@@ -7385,12 +7388,17 @@ unsigned long kvm_get_rflags(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_get_rflags);
 
-void kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
+static void __kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
 {
 	if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP &&
 	    kvm_is_linear_rip(vcpu, vcpu->arch.singlestep_rip))
 		rflags |= X86_EFLAGS_TF;
 	kvm_x86_ops->set_rflags(vcpu, rflags);
+}
+
+void kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
+{
+	__kvm_set_rflags(vcpu, rflags);
 	kvm_make_request(KVM_REQ_EVENT, vcpu);
 }
 EXPORT_SYMBOL_GPL(kvm_set_rflags);
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH 3/5] KVM: x86: move around some checks
  2014-03-27 11:30 [RFC PATCH 0/5] KVM: speed up invalid guest state emulation Paolo Bonzini
  2014-03-27 11:30 ` [RFC PATCH 1/5] KVM: vmx: speed up emulation of invalid guest state Paolo Bonzini
  2014-03-27 11:30 ` [RFC PATCH 2/5] KVM: x86: avoid useless set of KVM_REQ_EVENT after emulation Paolo Bonzini
@ 2014-03-27 11:30 ` Paolo Bonzini
  2014-03-27 11:30 ` [RFC PATCH 4/5] KVM: x86: protect checks on ctxt->d by a common "if (unlikely())" Paolo Bonzini
  2014-03-27 11:30 ` [RFC PATCH 5/5] KVM: x86: speed up emulated moves Paolo Bonzini
  4 siblings, 0 replies; 11+ messages in thread
From: Paolo Bonzini @ 2014-03-27 11:30 UTC (permalink / raw)
  To: linux-kernel; +Cc: kvm

The only purpose of this patch is to make the next patch simpler
to review.  No semantic change.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/emulate.c | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index c9f8f61df46c..9e40fbf94dcd 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -4347,12 +4347,15 @@ done_prefixes:
 		ctxt->d |= opcode.flags;
 	}
 
+	/* Unrecognised? */
+	if (ctxt->d == 0)
+		return EMULATION_FAILED;
+
 	ctxt->execute = opcode.u.execute;
 	ctxt->check_perm = opcode.check_perm;
 	ctxt->intercept = opcode.intercept;
 
-	/* Unrecognised? */
-	if (ctxt->d == 0 || (ctxt->d & NotImpl))
+	if (ctxt->d & NotImpl)
 		return EMULATION_FAILED;
 
 	if (!(ctxt->d & EmulateOnUD) && ctxt->ud)
@@ -4494,19 +4497,19 @@ int x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
 
 	ctxt->mem_read.pos = 0;
 
-	if ((ctxt->mode == X86EMUL_MODE_PROT64 && (ctxt->d & No64)) ||
-			(ctxt->d & Undefined)) {
+	/* LOCK prefix is allowed only with some instructions */
+	if (ctxt->lock_prefix && (!(ctxt->d & Lock) || ctxt->dst.type != OP_MEM)) {
 		rc = emulate_ud(ctxt);
 		goto done;
 	}
 
-	/* LOCK prefix is allowed only with some instructions */
-	if (ctxt->lock_prefix && (!(ctxt->d & Lock) || ctxt->dst.type != OP_MEM)) {
+	if ((ctxt->d & SrcMask) == SrcMemFAddr && ctxt->src.type != OP_MEM) {
 		rc = emulate_ud(ctxt);
 		goto done;
 	}
 
-	if ((ctxt->d & SrcMask) == SrcMemFAddr && ctxt->src.type != OP_MEM) {
+	if ((ctxt->mode == X86EMUL_MODE_PROT64 && (ctxt->d & No64)) ||
+			(ctxt->d & Undefined)) {
 		rc = emulate_ud(ctxt);
 		goto done;
 	}
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH 4/5] KVM: x86: protect checks on ctxt->d by a common "if (unlikely())"
  2014-03-27 11:30 [RFC PATCH 0/5] KVM: speed up invalid guest state emulation Paolo Bonzini
                   ` (2 preceding siblings ...)
  2014-03-27 11:30 ` [RFC PATCH 3/5] KVM: x86: move around some checks Paolo Bonzini
@ 2014-03-27 11:30 ` Paolo Bonzini
  2014-03-27 11:30 ` [RFC PATCH 5/5] KVM: x86: speed up emulated moves Paolo Bonzini
  4 siblings, 0 replies; 11+ messages in thread
From: Paolo Bonzini @ 2014-03-27 11:30 UTC (permalink / raw)
  To: linux-kernel; +Cc: kvm

There are several checks for "peculiar" aspects of instructions in both
x86_decode_insn and x86_emulate_insn.  Group them together, and guard
them with a single "if" that lets the processor quickly skip them all.
To do this effectively, add two more flag bits that say whether the
.intercept and .check_perm fields are valid.

This skims about 10 cycles for each emulated instructions, which is
a 2 to 6% improvement.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
	To review, use -b.

 arch/x86/kvm/emulate.c | 175 ++++++++++++++++++++++++++-----------------------
 1 file changed, 94 insertions(+), 81 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 9e40fbf94dcd..94974055d906 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -161,6 +161,8 @@
 #define Fastop      ((u64)1 << 44)  /* Use opcode::u.fastop */
 #define NoWrite     ((u64)1 << 45)  /* No writeback */
 #define SrcWrite    ((u64)1 << 46)  /* Write back src operand */
+#define Intercept   ((u64)1 << 47)  /* Has intercept field */
+#define CheckPerm   ((u64)1 << 48)  /* Has intercept field */
 
 #define DstXacc     (DstAccLo | SrcAccHi | SrcWrite)
 
@@ -3514,9 +3516,9 @@ static int check_perm_out(struct x86_emulate_ctxt *ctxt)
 }
 
 #define D(_y) { .flags = (_y) }
-#define DI(_y, _i) { .flags = (_y), .intercept = x86_intercept_##_i }
-#define DIP(_y, _i, _p) { .flags = (_y), .intercept = x86_intercept_##_i, \
-		      .check_perm = (_p) }
+#define DI(_y, _i) { .flags = (_y)|Intercept, .intercept = x86_intercept_##_i }
+#define DIP(_y, _i, _p) { .flags = (_y)|Intercept|CheckPerm, \
+		      .intercept = x86_intercept_##_i, .check_perm = (_p) }
 #define N    D(NotImpl)
 #define EXT(_f, _e) { .flags = ((_f) | RMExt), .u.group = (_e) }
 #define G(_f, _g) { .flags = ((_f) | Group | ModRM), .u.group = (_g) }
@@ -3525,10 +3527,10 @@ static int check_perm_out(struct x86_emulate_ctxt *ctxt)
 #define I(_f, _e) { .flags = (_f), .u.execute = (_e) }
 #define F(_f, _e) { .flags = (_f) | Fastop, .u.fastop = (_e) }
 #define II(_f, _e, _i) \
-	{ .flags = (_f), .u.execute = (_e), .intercept = x86_intercept_##_i }
+	{ .flags = (_f)|Intercept, .u.execute = (_e), .intercept = x86_intercept_##_i }
 #define IIP(_f, _e, _i, _p) \
-	{ .flags = (_f), .u.execute = (_e), .intercept = x86_intercept_##_i, \
-	  .check_perm = (_p) }
+	{ .flags = (_f)|Intercept|CheckPerm, .u.execute = (_e), \
+	  .intercept = x86_intercept_##_i, .check_perm = (_p) }
 #define GP(_f, _g) { .flags = ((_f) | Prefix), .u.gprefix = (_g) }
 
 #define D2bv(_f)      D((_f) | ByteOp), D(_f)
@@ -4352,29 +4354,37 @@ done_prefixes:
 		return EMULATION_FAILED;
 
 	ctxt->execute = opcode.u.execute;
-	ctxt->check_perm = opcode.check_perm;
-	ctxt->intercept = opcode.intercept;
 
-	if (ctxt->d & NotImpl)
-		return EMULATION_FAILED;
+	if (unlikely(ctxt->d &
+		     (NotImpl|EmulateOnUD|Stack|Op3264|Sse|Mmx|Intercept|CheckPerm))) {
+		/*
+		 * These are copied unconditionally here, and checked unconditionally
+		 * in x86_emulate_insn.
+		 */
+		ctxt->check_perm = opcode.check_perm;
+		ctxt->intercept = opcode.intercept;
 
-	if (!(ctxt->d & EmulateOnUD) && ctxt->ud)
-		return EMULATION_FAILED;
+		if (ctxt->d & NotImpl)
+			return EMULATION_FAILED;
 
-	if (mode == X86EMUL_MODE_PROT64 && (ctxt->d & Stack))
-		ctxt->op_bytes = 8;
+		if (!(ctxt->d & EmulateOnUD) && ctxt->ud)
+			return EMULATION_FAILED;
 
-	if (ctxt->d & Op3264) {
-		if (mode == X86EMUL_MODE_PROT64)
+		if (mode == X86EMUL_MODE_PROT64 && (ctxt->d & Stack))
 			ctxt->op_bytes = 8;
-		else
-			ctxt->op_bytes = 4;
-	}
 
-	if (ctxt->d & Sse)
-		ctxt->op_bytes = 16;
-	else if (ctxt->d & Mmx)
-		ctxt->op_bytes = 8;
+		if (ctxt->d & Op3264) {
+			if (mode == X86EMUL_MODE_PROT64)
+				ctxt->op_bytes = 8;
+			else
+				ctxt->op_bytes = 4;
+		}
+
+		if (ctxt->d & Sse)
+			ctxt->op_bytes = 16;
+		else if (ctxt->d & Mmx)
+			ctxt->op_bytes = 8;
+	}
 
 	/* ModRM and SIB bytes. */
 	if (ctxt->d & ModRM) {
@@ -4508,75 +4518,78 @@ int x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
 		goto done;
 	}
 
-	if ((ctxt->mode == X86EMUL_MODE_PROT64 && (ctxt->d & No64)) ||
-			(ctxt->d & Undefined)) {
-		rc = emulate_ud(ctxt);
-		goto done;
-	}
-
-	if (((ctxt->d & (Sse|Mmx)) && ((ops->get_cr(ctxt, 0) & X86_CR0_EM)))
-	    || ((ctxt->d & Sse) && !(ops->get_cr(ctxt, 4) & X86_CR4_OSFXSR))) {
-		rc = emulate_ud(ctxt);
-		goto done;
-	}
-
-	if ((ctxt->d & (Sse|Mmx)) && (ops->get_cr(ctxt, 0) & X86_CR0_TS)) {
-		rc = emulate_nm(ctxt);
-		goto done;
-	}
+	if (unlikely(ctxt->d &
+		     (No64|Undefined|Sse|Mmx|Intercept|CheckPerm|Priv|Prot|String))) {
+		if ((ctxt->mode == X86EMUL_MODE_PROT64 && (ctxt->d & No64)) ||
+				(ctxt->d & Undefined)) {
+			rc = emulate_ud(ctxt);
+			goto done;
+		}
 
-	if (ctxt->d & Mmx) {
-		rc = flush_pending_x87_faults(ctxt);
-		if (rc != X86EMUL_CONTINUE)
+		if (((ctxt->d & (Sse|Mmx)) && ((ops->get_cr(ctxt, 0) & X86_CR0_EM)))
+		    || ((ctxt->d & Sse) && !(ops->get_cr(ctxt, 4) & X86_CR4_OSFXSR))) {
+			rc = emulate_ud(ctxt);
 			goto done;
-		/*
-		 * Now that we know the fpu is exception safe, we can fetch
-		 * operands from it.
-		 */
-		fetch_possible_mmx_operand(ctxt, &ctxt->src);
-		fetch_possible_mmx_operand(ctxt, &ctxt->src2);
-		if (!(ctxt->d & Mov))
-			fetch_possible_mmx_operand(ctxt, &ctxt->dst);
-	}
+		}
 
-	if (unlikely(ctxt->guest_mode) && ctxt->intercept) {
-		rc = emulator_check_intercept(ctxt, ctxt->intercept,
-					      X86_ICPT_PRE_EXCEPT);
-		if (rc != X86EMUL_CONTINUE)
+		if ((ctxt->d & (Sse|Mmx)) && (ops->get_cr(ctxt, 0) & X86_CR0_TS)) {
+			rc = emulate_nm(ctxt);
 			goto done;
-	}
+		}
 
-	/* Privileged instruction can be executed only in CPL=0 */
-	if ((ctxt->d & Priv) && ops->cpl(ctxt)) {
-		rc = emulate_gp(ctxt, 0);
-		goto done;
-	}
+		if (ctxt->d & Mmx) {
+			rc = flush_pending_x87_faults(ctxt);
+			if (rc != X86EMUL_CONTINUE)
+				goto done;
+			/*
+			 * Now that we know the fpu is exception safe, we can fetch
+			 * operands from it.
+			 */
+			fetch_possible_mmx_operand(ctxt, &ctxt->src);
+			fetch_possible_mmx_operand(ctxt, &ctxt->src2);
+			if (!(ctxt->d & Mov))
+				fetch_possible_mmx_operand(ctxt, &ctxt->dst);
+		}
 
-	/* Instruction can only be executed in protected mode */
-	if ((ctxt->d & Prot) && ctxt->mode < X86EMUL_MODE_PROT16) {
-		rc = emulate_ud(ctxt);
-		goto done;
-	}
+		if (unlikely(ctxt->guest_mode) && ctxt->intercept) {
+			rc = emulator_check_intercept(ctxt, ctxt->intercept,
+						      X86_ICPT_PRE_EXCEPT);
+			if (rc != X86EMUL_CONTINUE)
+				goto done;
+		}
 
-	/* Do instruction specific permission checks */
-	if (ctxt->check_perm) {
-		rc = ctxt->check_perm(ctxt);
-		if (rc != X86EMUL_CONTINUE)
+		/* Privileged instruction can be executed only in CPL=0 */
+		if ((ctxt->d & Priv) && ops->cpl(ctxt)) {
+			rc = emulate_gp(ctxt, 0);
 			goto done;
-	}
+		}
 
-	if (unlikely(ctxt->guest_mode) && ctxt->intercept) {
-		rc = emulator_check_intercept(ctxt, ctxt->intercept,
-					      X86_ICPT_POST_EXCEPT);
-		if (rc != X86EMUL_CONTINUE)
+		/* Instruction can only be executed in protected mode */
+		if ((ctxt->d & Prot) && ctxt->mode < X86EMUL_MODE_PROT16) {
+			rc = emulate_ud(ctxt);
 			goto done;
-	}
+		}
 
-	if (ctxt->rep_prefix && (ctxt->d & String)) {
-		/* All REP prefixes have the same first termination condition */
-		if (address_mask(ctxt, reg_read(ctxt, VCPU_REGS_RCX)) == 0) {
-			ctxt->eip = ctxt->_eip;
-			goto done;
+		/* Do instruction specific permission checks */
+		if (ctxt->check_perm) {
+			rc = ctxt->check_perm(ctxt);
+			if (rc != X86EMUL_CONTINUE)
+				goto done;
+		}
+
+		if (unlikely(ctxt->guest_mode) && ctxt->intercept) {
+			rc = emulator_check_intercept(ctxt, ctxt->intercept,
+						      X86_ICPT_POST_EXCEPT);
+			if (rc != X86EMUL_CONTINUE)
+				goto done;
+		}
+
+		if (ctxt->rep_prefix && (ctxt->d & String)) {
+			/* All REP prefixes have the same first termination condition */
+			if (address_mask(ctxt, reg_read(ctxt, VCPU_REGS_RCX)) == 0) {
+				ctxt->eip = ctxt->_eip;
+				goto done;
+			}
 		}
 	}
 
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH 5/5] KVM: x86: speed up emulated moves
  2014-03-27 11:30 [RFC PATCH 0/5] KVM: speed up invalid guest state emulation Paolo Bonzini
                   ` (3 preceding siblings ...)
  2014-03-27 11:30 ` [RFC PATCH 4/5] KVM: x86: protect checks on ctxt->d by a common "if (unlikely())" Paolo Bonzini
@ 2014-03-27 11:30 ` Paolo Bonzini
  4 siblings, 0 replies; 11+ messages in thread
From: Paolo Bonzini @ 2014-03-27 11:30 UTC (permalink / raw)
  To: linux-kernel; +Cc: kvm

We can just blindly move all 16 bytes of ctxt->src's value to ctxt->dst.
write_register_operand will take care of writing only the lower bytes.

Avoiding a call to memcpy (the compiler optimizes it out) gains about 50
cycles on kvm-unit-tests for register-to-register moves, and makes them
about as fast as arithmetic instructions.

We could perhaps get a larger speedup by moving all instructions _except_
moves out of x86_emulate_insn, removing opcode_len, and replacing the
switch statement with an inlined em_mov.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/asm/kvm_emulate.h | 2 +-
 arch/x86/kvm/emulate.c             | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h
index 24ec1216596e..f7b1e45eb753 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -232,7 +232,7 @@ struct operand {
 	union {
 		unsigned long val;
 		u64 val64;
-		char valptr[sizeof(unsigned long) + 2];
+		char valptr[sizeof(sse128_t)];
 		sse128_t vec_val;
 		u64 mm_val;
 		void *data;
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 94974055d906..4a3584d419e5 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2955,7 +2955,7 @@ static int em_rdpmc(struct x86_emulate_ctxt *ctxt)
 
 static int em_mov(struct x86_emulate_ctxt *ctxt)
 {
-	memcpy(ctxt->dst.valptr, ctxt->src.valptr, ctxt->op_bytes);
+	memcpy(ctxt->dst.valptr, ctxt->src.valptr, sizeof(ctxt->src.valptr));
 	return X86EMUL_CONTINUE;
 }
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH 2/5] KVM: x86: avoid useless set of KVM_REQ_EVENT after emulation
  2014-03-27 11:30 ` [RFC PATCH 2/5] KVM: x86: avoid useless set of KVM_REQ_EVENT after emulation Paolo Bonzini
@ 2014-03-28 12:44   ` Paolo Bonzini
  0 siblings, 0 replies; 11+ messages in thread
From: Paolo Bonzini @ 2014-03-28 12:44 UTC (permalink / raw)
  To: linux-kernel; +Cc: kvm

Il 27/03/2014 12:30, Paolo Bonzini ha scritto:
> Despite the provisions to emulate up to 130 consecutive instructions, in
> practice KVM will emulate just one before exiting handle_invalid_guest_state,
> because x86_emulate_instructionn always sets KVM_REQ_EVENT.
> 
> However, we only need to do this if an interrupt could be injected,
> which happens a) if an interrupt shadow bit (STI or MOV SS) has gone
> away; b) if the interrupt flag has just been set (because other
> instructions than STI can set it without enabling an interrupt shadow).
> 
> This cuts another 250-300 clock cycles from the cost of emulating an
> instruction (530-870 cycles before the patch on kvm-unit-tests,
> 290-600 afterwards).
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/x86/kvm/x86.c | 28 ++++++++++++++++++----------
>  1 file changed, 18 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index fd31aada351b..ce9523345f2e 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -87,6 +87,7 @@ static u64 __read_mostly efer_reserved_bits = ~((u64)EFER_SCE);
>  
>  static void update_cr8_intercept(struct kvm_vcpu *vcpu);
>  static void process_nmi(struct kvm_vcpu *vcpu);
> +static void __kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags);
>  
>  struct kvm_x86_ops *kvm_x86_ops;
>  EXPORT_SYMBOL_GPL(kvm_x86_ops);
> @@ -4856,8 +4857,10 @@ static void toggle_interruptibility(struct kvm_vcpu *vcpu, u32 mask)
>  	 * means that the last instruction is an sti. We should not
>  	 * leave the flag on in this case. The same goes for mov ss
>  	 */
> -	if (!(int_shadow & mask))
> +	if (unlikely(int_shadow) && !(int_shadow & mask)) {
>  		kvm_x86_ops->set_interrupt_shadow(vcpu, mask);
> +		kvm_make_request(KVM_REQ_EVENT, vcpu);
> +	}

Better:

 	 * means that the last instruction is an sti. We should not
 	 * leave the flag on in this case. The same goes for mov ss
 	 */
-	if (!(int_shadow & mask))
+	mask &= ~int_shadow;
+	if (unlikely(mask != int_shadow))
 		kvm_x86_ops->set_interrupt_shadow(vcpu, mask);
+
+	/*
+	 * The interrupt window might have opened if a bit has been cleared.
+	 */
+	if (unlikely(int_shadow & ~mask))
+		kvm_make_request(KVM_REQ_EVENT, vcpu);

Paolo

>  }
>  
>  static void inject_emulated_exception(struct kvm_vcpu *vcpu)
> @@ -5083,20 +5086,18 @@ static int kvm_vcpu_check_hw_bp(unsigned long addr, u32 type, u32 dr7,
>  	return dr6;
>  }
>  
> -static void kvm_vcpu_check_singlestep(struct kvm_vcpu *vcpu, int *r)
> +static void kvm_vcpu_check_singlestep(struct kvm_vcpu *vcpu, unsigned long rflags, int *r)
>  {
>  	struct kvm_run *kvm_run = vcpu->run;
>  
>  	/*
> -	 * Use the "raw" value to see if TF was passed to the processor.
> -	 * Note that the new value of the flags has not been saved yet.
> +	 * rflags is the old, "raw" value of the flags.  The new value has
> +	 * not been saved yet.
>  	 *
>  	 * This is correct even for TF set by the guest, because "the
>  	 * processor will not generate this exception after the instruction
>  	 * that sets the TF flag".
>  	 */
> -	unsigned long rflags = kvm_x86_ops->get_rflags(vcpu);
> -
>  	if (unlikely(rflags & X86_EFLAGS_TF)) {
>  		if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP) {
>  			kvm_run->debug.arch.dr6 = DR6_BS | DR6_FIXED_1;
> @@ -5263,13 +5264,15 @@ restart:
>  		r = EMULATE_DONE;
>  
>  	if (writeback) {
> +		unsigned long rflags = kvm_x86_ops->get_rflags(vcpu);
>  		toggle_interruptibility(vcpu, ctxt->interruptibility);
> -		kvm_make_request(KVM_REQ_EVENT, vcpu);
>  		vcpu->arch.emulate_regs_need_sync_to_vcpu = false;
>  		kvm_rip_write(vcpu, ctxt->eip);
>  		if (r == EMULATE_DONE)
> -			kvm_vcpu_check_singlestep(vcpu, &r);
> -		kvm_set_rflags(vcpu, ctxt->eflags);
> +			kvm_vcpu_check_singlestep(vcpu, rflags, &r);
> +		__kvm_set_rflags(vcpu, ctxt->eflags);
> +		if (unlikely((ctxt->eflags & ~rflags) & X86_EFLAGS_IF))
> +			kvm_make_request(KVM_REQ_EVENT, vcpu);
>  	} else
>  		vcpu->arch.emulate_regs_need_sync_to_vcpu = true;
>  
> @@ -7385,12 +7388,17 @@ unsigned long kvm_get_rflags(struct kvm_vcpu *vcpu)
>  }
>  EXPORT_SYMBOL_GPL(kvm_get_rflags);
>  
> -void kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
> +static void __kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
>  {
>  	if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP &&
>  	    kvm_is_linear_rip(vcpu, vcpu->arch.singlestep_rip))
>  		rflags |= X86_EFLAGS_TF;
>  	kvm_x86_ops->set_rflags(vcpu, rflags);
> +}
> +
> +void kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
> +{
> +	__kvm_set_rflags(vcpu, rflags);
>  	kvm_make_request(KVM_REQ_EVENT, vcpu);
>  }
>  EXPORT_SYMBOL_GPL(kvm_set_rflags);
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH 1/5] KVM: vmx: speed up emulation of invalid guest state
  2014-03-27 11:30 ` [RFC PATCH 1/5] KVM: vmx: speed up emulation of invalid guest state Paolo Bonzini
@ 2014-04-16 22:52   ` Marcelo Tosatti
  2014-04-18  4:19     ` Paolo Bonzini
  0 siblings, 1 reply; 11+ messages in thread
From: Marcelo Tosatti @ 2014-04-16 22:52 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm

On Thu, Mar 27, 2014 at 12:30:34PM +0100, Paolo Bonzini wrote:
> About 25% of the time spent in emulation of invalid guest state
> is wasted in checking whether emulation is required for the next
> instruction.  However, this almost never changes except when a
> segment register (or TR or LDTR) changes, or when there is a mode
> transition (i.e. CR0 changes).
> 
> In fact, vmx_set_segment and vmx_set_cr0 already modify
> vmx->emulation_required (except that the former for some reason
> uses |= instead of just an assignment).  So there is no need to
> call guest_state_valid in the emulation loop.
> 
> Emulation performance test results indicate 530-870 cycles
> for common instructions, versus 740-1110 before this patch.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/x86/kvm/vmx.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 1320e0f8e611..73aa522db47b 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -3629,7 +3629,7 @@ static void vmx_set_segment(struct kvm_vcpu *vcpu,
>  	vmcs_write32(sf->ar_bytes, vmx_segment_access_rights(var));
>  
>  out:
> -	vmx->emulation_required |= emulation_required(vcpu);
> +	vmx->emulation_required = emulation_required(vcpu);
>  }
>  
>  static void vmx_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, int *l)
> @@ -5580,7 +5580,7 @@ static int handle_invalid_guest_state(struct kvm_vcpu *vcpu)
>  	cpu_exec_ctrl = vmcs_read32(CPU_BASED_VM_EXEC_CONTROL);
>  	intr_window_requested = cpu_exec_ctrl & CPU_BASED_VIRTUAL_INTR_PENDING;
>  
> -	while (!guest_state_valid(vcpu) && count-- != 0) {
> +	while (vmx->emulation_required && count-- != 0) {
>  		if (intr_window_requested && vmx_interrupt_allowed(vcpu))
>  			return handle_interrupt_window(&vmx->vcpu);
>  
> @@ -5614,7 +5614,6 @@ static int handle_invalid_guest_state(struct kvm_vcpu *vcpu)
>  			schedule();
>  	}
>  
> -	vmx->emulation_required = emulation_required(vcpu);
>  out:
>  	return ret;
>  }
> -- 
> 1.8.3.1

How about handling VM-entry error due to invalid state with

vmx->emulation_required = true;
continue to main vcpu loop;




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH 1/5] KVM: vmx: speed up emulation of invalid guest state
  2014-04-16 22:52   ` Marcelo Tosatti
@ 2014-04-18  4:19     ` Paolo Bonzini
  2014-04-21  2:13       ` Marcelo Tosatti
  0 siblings, 1 reply; 11+ messages in thread
From: Paolo Bonzini @ 2014-04-18  4:19 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: linux-kernel, kvm

Il 16/04/2014 18:52, Marcelo Tosatti ha scritto:
> How about handling VM-entry error due to invalid state with
>
> vmx->emulation_required = true;
> continue to main vcpu loop;

What would reset it to false though?  None of the places that call 
emulation_required() is a hot path right now, and this patch doesn't add 
any.

Paolo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH 1/5] KVM: vmx: speed up emulation of invalid guest state
  2014-04-18  4:19     ` Paolo Bonzini
@ 2014-04-21  2:13       ` Marcelo Tosatti
  2014-04-22  3:25         ` Paolo Bonzini
  0 siblings, 1 reply; 11+ messages in thread
From: Marcelo Tosatti @ 2014-04-21  2:13 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm

On Fri, Apr 18, 2014 at 12:19:28AM -0400, Paolo Bonzini wrote:
> Il 16/04/2014 18:52, Marcelo Tosatti ha scritto:
> >How about handling VM-entry error due to invalid state with
> >
> >vmx->emulation_required = true;
> >continue to main vcpu loop;
> 
> What would reset it to false though?  None of the places that call
> emulation_required() is a hot path right now, and this patch doesn't
> add any.

The same code which resets it to false inside the
handle_invalid_guest_state loop (so you would stop emulating
at the same point as you do with this patch).

Advantage would be that failure to set vmx->emulation_required to
true would not cause VM-entry failure.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH 1/5] KVM: vmx: speed up emulation of invalid guest state
  2014-04-21  2:13       ` Marcelo Tosatti
@ 2014-04-22  3:25         ` Paolo Bonzini
  0 siblings, 0 replies; 11+ messages in thread
From: Paolo Bonzini @ 2014-04-22  3:25 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: linux-kernel, kvm

Il 20/04/2014 22:13, Marcelo Tosatti ha scritto:
> The same code which resets it to false inside the
> handle_invalid_guest_state loop (so you would stop emulating
> at the same point as you do with this patch).

So (barring any bugs where we fail to set vmx->emulation_required to 
true) the "vmx->emulation_required = true;" on vmentry error would be 
dead code.

> Advantage would be that failure to set vmx->emulation_required to
> true would not cause VM-entry failure.

A place where we fail to set vmx->emulation_required to true is quite 
likely to also be wrong when setting vmx->emulation_required to false. 
Since this is not something that has ever seen much churn, I think it's 
better to code it in a way that shows bugs easily.  The bugs do not 
affect the host anyway.

Paolo

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2014-04-22  3:26 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-03-27 11:30 [RFC PATCH 0/5] KVM: speed up invalid guest state emulation Paolo Bonzini
2014-03-27 11:30 ` [RFC PATCH 1/5] KVM: vmx: speed up emulation of invalid guest state Paolo Bonzini
2014-04-16 22:52   ` Marcelo Tosatti
2014-04-18  4:19     ` Paolo Bonzini
2014-04-21  2:13       ` Marcelo Tosatti
2014-04-22  3:25         ` Paolo Bonzini
2014-03-27 11:30 ` [RFC PATCH 2/5] KVM: x86: avoid useless set of KVM_REQ_EVENT after emulation Paolo Bonzini
2014-03-28 12:44   ` Paolo Bonzini
2014-03-27 11:30 ` [RFC PATCH 3/5] KVM: x86: move around some checks Paolo Bonzini
2014-03-27 11:30 ` [RFC PATCH 4/5] KVM: x86: protect checks on ctxt->d by a common "if (unlikely())" Paolo Bonzini
2014-03-27 11:30 ` [RFC PATCH 5/5] KVM: x86: speed up emulated moves Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.