[RFC PATCH 0/6] KVM: x86: speedups for emulator memory accesses

All of lore.kernel.org
 help / color / mirror / Atom feed

* [RFC PATCH 0/6] KVM: x86: speedups for emulator memory accesses
@ 2014-04-01 15:26 Paolo Bonzini
  2014-04-01 15:26 ` [PATCH 1/6] KVM: emulate: simplify writeback Paolo Bonzini
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: Paolo Bonzini @ 2014-04-01 15:26 UTC (permalink / raw)
  To: linux-kernel; +Cc: kvm

Another emulator speedup series, shaving up to 400 cycles (25%) off
RMW instructions.

The performance of various instructions is now relatively flat:

 jump  919  (down from 2300)
 move  1075 (down from 2700)
 arith 1081 (down from 2600)
 load  1267 (down from 2800, 1400 after previous round)
 store 1213 (down from 2900, 1300 after previous round)
 RMW   1310 (down from 3200, 1700 after previous round)

The next low-hanging fruit is fetching instructions and initializing
the context.  Similar optimizations to those done here could be made
for instruction fetch.

Paolo Bonzini (6):
  KVM: emulate: simplify writeback
  KVM: emulate: abstract handling of memory operands
  KVM: export mark_page_dirty_in_slot
  KVM: emulate: introduce memory_prepare callback to speed up memory access
  KVM: emulate: activate memory access optimization
  KVM: emulate: extend memory access optimization to stores

 arch/x86/include/asm/kvm_emulate.h |  28 ++++++++++
 arch/x86/kvm/emulate.c             | 107 +++++++++++++++++++++++++++----------
 arch/x86/kvm/x86.c                 |  67 +++++++++++++++++++++++
 include/linux/kvm_host.h           |   6 +++
 virt/kvm/kvm_main.c                |  17 ++----
 5 files changed, 185 insertions(+), 40 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/6] KVM: emulate: simplify writeback
  2014-04-01 15:26 [RFC PATCH 0/6] KVM: x86: speedups for emulator memory accesses Paolo Bonzini
@ 2014-04-01 15:26 ` Paolo Bonzini
  2014-04-01 15:26 ` [PATCH 2/6] KVM: emulate: abstract handling of memory operands Paolo Bonzini
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Paolo Bonzini @ 2014-04-01 15:26 UTC (permalink / raw)
  To: linux-kernel; +Cc: kvm

The "if/return" checks are useless, because we return X86EMUL_CONTINUE
anyway if we do not return.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/emulate.c | 28 +++++++++++-----------------
 1 file changed, 11 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 4a3584d419e5..b42184eccbcc 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1566,34 +1566,28 @@ static void write_register_operand(struct operand *op)
 
 static int writeback(struct x86_emulate_ctxt *ctxt, struct operand *op)
 {
-	int rc;
-
 	switch (op->type) {
 	case OP_REG:
 		write_register_operand(op);
 		break;
 	case OP_MEM:
 		if (ctxt->lock_prefix)
-			rc = segmented_cmpxchg(ctxt,
+			return segmented_cmpxchg(ctxt,
+						 op->addr.mem,
+						 &op->orig_val,
+						 &op->val,
+						 op->bytes);
+		else
+			return segmented_write(ctxt,
 					       op->addr.mem,
-					       &op->orig_val,
 					       &op->val,
 					       op->bytes);
-		else
-			rc = segmented_write(ctxt,
-					     op->addr.mem,
-					     &op->val,
-					     op->bytes);
-		if (rc != X86EMUL_CONTINUE)
-			return rc;
 		break;
 	case OP_MEM_STR:
-		rc = segmented_write(ctxt,
-				op->addr.mem,
-				op->data,
-				op->bytes * op->count);
-		if (rc != X86EMUL_CONTINUE)
-			return rc;
+		return segmented_write(ctxt,
+				       op->addr.mem,
+				       op->data,
+				       op->bytes * op->count);
 		break;
 	case OP_XMM:
 		write_sse_reg(ctxt, &op->vec_val, op->addr.xmm);
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/6] KVM: emulate: abstract handling of memory operands
  2014-04-01 15:26 [RFC PATCH 0/6] KVM: x86: speedups for emulator memory accesses Paolo Bonzini
  2014-04-01 15:26 ` [PATCH 1/6] KVM: emulate: simplify writeback Paolo Bonzini
@ 2014-04-01 15:26 ` Paolo Bonzini
  2014-04-01 15:26 ` [PATCH 3/6] KVM: export mark_page_dirty_in_slot Paolo Bonzini
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Paolo Bonzini @ 2014-04-01 15:26 UTC (permalink / raw)
  To: linux-kernel; +Cc: kvm

Abstract the pre-execution processing and writeback of memory operands
in new functions.  We will soon do some work before execution even for
move destination, so call the function in that case too; but not for
the memory operand of lea, invlpg etc.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/emulate.c | 43 ++++++++++++++++++++++++++++---------------
 1 file changed, 28 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index b42184eccbcc..c7ef72c1289e 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1545,6 +1545,29 @@ exception:
 	return X86EMUL_PROPAGATE_FAULT;
 }
 
+static int prepare_memory_operand(struct x86_emulate_ctxt *ctxt,
+				  struct operand *op)
+{
+	return segmented_read(ctxt, op->addr.mem, &op->val, op->bytes);
+}
+
+static int cmpxchg_memory_operand(struct x86_emulate_ctxt *ctxt,
+				  struct operand *op)
+{
+	return segmented_cmpxchg(ctxt, op->addr.mem,
+				 &op->orig_val,
+				 &op->val,
+				 op->bytes);
+}
+
+static int write_memory_operand(struct x86_emulate_ctxt *ctxt,
+				struct operand *op)
+{
+	return segmented_write(ctxt, op->addr.mem,
+			       &op->val,
+			       op->bytes);
+}
+
 static void write_register_operand(struct operand *op)
 {
 	/* The 4-byte case *is* correct: in 64-bit mode we zero-extend. */
@@ -1572,16 +1595,9 @@ static int writeback(struct x86_emulate_ctxt *ctxt, struct operand *op)
 		break;
 	case OP_MEM:
 		if (ctxt->lock_prefix)
-			return segmented_cmpxchg(ctxt,
-						 op->addr.mem,
-						 &op->orig_val,
-						 &op->val,
-						 op->bytes);
+			return cmpxchg_memory_operand(ctxt, op);
 		else
-			return segmented_write(ctxt,
-					       op->addr.mem,
-					       &op->val,
-					       op->bytes);
+			return write_memory_operand(ctxt, op);
 		break;
 	case OP_MEM_STR:
 		return segmented_write(ctxt,
@@ -4588,16 +4604,14 @@ int x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
 	}
 
 	if ((ctxt->src.type == OP_MEM) && !(ctxt->d & NoAccess)) {
-		rc = segmented_read(ctxt, ctxt->src.addr.mem,
-				    ctxt->src.valptr, ctxt->src.bytes);
+		rc = prepare_memory_operand(ctxt, &ctxt->src);
 		if (rc != X86EMUL_CONTINUE)
 			goto done;
 		ctxt->src.orig_val64 = ctxt->src.val64;
 	}
 
 	if (ctxt->src2.type == OP_MEM) {
-		rc = segmented_read(ctxt, ctxt->src2.addr.mem,
-				    &ctxt->src2.val, ctxt->src2.bytes);
+		rc = prepare_memory_operand(ctxt, &ctxt->src2);
 		if (rc != X86EMUL_CONTINUE)
 			goto done;
 	}
@@ -4608,8 +4622,7 @@ int x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
 
 	if ((ctxt->dst.type == OP_MEM) && !(ctxt->d & Mov)) {
 		/* optimisation - avoid slow emulated read if Mov */
-		rc = segmented_read(ctxt, ctxt->dst.addr.mem,
-				   &ctxt->dst.val, ctxt->dst.bytes);
+		rc = prepare_memory_operand(ctxt, &ctxt->dst);
 		if (rc != X86EMUL_CONTINUE)
 			goto done;
 	}
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/6] KVM: export mark_page_dirty_in_slot
  2014-04-01 15:26 [RFC PATCH 0/6] KVM: x86: speedups for emulator memory accesses Paolo Bonzini
  2014-04-01 15:26 ` [PATCH 1/6] KVM: emulate: simplify writeback Paolo Bonzini
  2014-04-01 15:26 ` [PATCH 2/6] KVM: emulate: abstract handling of memory operands Paolo Bonzini
@ 2014-04-01 15:26 ` Paolo Bonzini
  2014-04-01 15:26 ` [PATCH 4/6] KVM: emulate: introduce memory_prepare callback to speed up memory access Paolo Bonzini
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Paolo Bonzini @ 2014-04-01 15:26 UTC (permalink / raw)
  To: linux-kernel; +Cc: kvm

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 include/linux/kvm_host.h |  1 +
 virt/kvm/kvm_main.c      | 12 +++++-------
 2 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ba3180a5e6d3..56289b487ddd 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -582,6 +582,7 @@ struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn);
 int kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn);
 unsigned long kvm_host_page_size(struct kvm *kvm, gfn_t gfn);
 void mark_page_dirty(struct kvm *kvm, gfn_t gfn);
+void mark_page_dirty_in_slot(struct kvm_memory_slot *memslot, gfn_t gfn);
 
 void kvm_vcpu_block(struct kvm_vcpu *vcpu);
 void kvm_vcpu_kick(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index bf2fa29c8beb..98833e21d580 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -99,8 +99,6 @@ static void update_memslots(struct kvm_memslots *slots,
 			    struct kvm_memory_slot *new, u64 last_generation);
 
 static void kvm_release_pfn_dirty(pfn_t pfn);
-static void mark_page_dirty_in_slot(struct kvm *kvm,
-				    struct kvm_memory_slot *memslot, gfn_t gfn);
 
 bool kvm_rebooting;
 EXPORT_SYMBOL_GPL(kvm_rebooting);
@@ -1583,7 +1581,7 @@ int kvm_write_guest_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc,
 	r = __copy_to_user((void __user *)ghc->hva, data, len);
 	if (r)
 		return -EFAULT;
-	mark_page_dirty_in_slot(kvm, ghc->memslot, ghc->gpa >> PAGE_SHIFT);
+	mark_page_dirty_in_slot(ghc->memslot, ghc->gpa >> PAGE_SHIFT);
 
 	return 0;
 }
@@ -1641,9 +1639,8 @@ int kvm_clear_guest(struct kvm *kvm, gpa_t gpa, unsigned long len)
 }
 EXPORT_SYMBOL_GPL(kvm_clear_guest);
 
-static void mark_page_dirty_in_slot(struct kvm *kvm,
-				    struct kvm_memory_slot *memslot,
-				    gfn_t gfn)
+void mark_page_dirty_in_slot(struct kvm_memory_slot *memslot,
+			      gfn_t gfn)
 {
 	if (memslot && memslot->dirty_bitmap) {
 		unsigned long rel_gfn = gfn - memslot->base_gfn;
@@ -1651,13 +1648,14 @@ static void mark_page_dirty_in_slot(struct kvm *kvm,
 		set_bit_le(rel_gfn, memslot->dirty_bitmap);
 	}
 }
+EXPORT_SYMBOL_GPL(mark_page_dirty_in_slot);
 
 void mark_page_dirty(struct kvm *kvm, gfn_t gfn)
 {
 	struct kvm_memory_slot *memslot;
 
 	memslot = gfn_to_memslot(kvm, gfn);
-	mark_page_dirty_in_slot(kvm, memslot, gfn);
+	mark_page_dirty_in_slot(memslot, gfn);
 }
 EXPORT_SYMBOL_GPL(mark_page_dirty);
 
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 4/6] KVM: emulate: introduce memory_prepare callback to speed up memory access
  2014-04-01 15:26 [RFC PATCH 0/6] KVM: x86: speedups for emulator memory accesses Paolo Bonzini
                   ` (2 preceding siblings ...)
  2014-04-01 15:26 ` [PATCH 3/6] KVM: export mark_page_dirty_in_slot Paolo Bonzini
@ 2014-04-01 15:26 ` Paolo Bonzini
  2014-04-01 15:26 ` [PATCH 5/6] KVM: emulate: activate memory access optimization Paolo Bonzini
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Paolo Bonzini @ 2014-04-01 15:26 UTC (permalink / raw)
  To: linux-kernel; +Cc: kvm

Emulating a RMW instruction currently walks the page tables
twice.  To avoid this, store the virtual address in the host
and access it directly.

In fact, it turns out that the optimizations we can do
actually benefit all memory accesses.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/asm/kvm_emulate.h | 26 +++++++++++++++
 arch/x86/kvm/x86.c                 | 67 ++++++++++++++++++++++++++++++++++++++
 include/linux/kvm_host.h           |  5 +++
 virt/kvm/kvm_main.c                |  5 ---
 4 files changed, 98 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h
index f7b1e45eb753..4a580be2553e 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -167,6 +167,32 @@ struct x86_emulate_ops {
 				const void *new,
 				unsigned int bytes,
 				struct x86_exception *fault);
+
+	/*
+	 * memory_prepare: Prepare userspace access fastpath.
+	 *   @addr:     [IN ] Linear address to access.
+	 *   @bytes:    [IN ] Number of bytes to access.
+	 *   @write:    [IN ] True if *p_hva will be written to.
+	 *   @p_opaque: [OUT] Value passed back to memory_finish.
+	 *   @p_hva:    [OUT] Host virtual address for __copy_from/to_user.
+	 */
+	int (*memory_prepare)(struct x86_emulate_ctxt *ctxt,
+			      unsigned long addr,
+			      unsigned int bytes,
+			      struct x86_exception *exception,
+			      bool write,
+			      void **p_opaque,
+			      unsigned long *p_hva);
+
+	/*
+	 * memory_finish: Complete userspace access fastpath.
+	 *   @opaque: [OUT] Value passed back from memory_prepare.
+	 *   @hva:    [OUT] Host virtual address computed in memory_prepare.
+	 */
+	void (*memory_finish)(struct x86_emulate_ctxt *ctxt,
+			      void *p_opaque,
+			      unsigned long p_hva);
+
 	void (*invlpg)(struct x86_emulate_ctxt *ctxt, ulong addr);
 
 	int (*pio_in_emulated)(struct x86_emulate_ctxt *ctxt,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e70a0e3227b1..64d0f171996b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4273,6 +4273,71 @@ static const struct read_write_emulator_ops write_emultor = {
 	.write = true,
 };
 
+static int emulator_memory_prepare(struct x86_emulate_ctxt *ctxt,
+				   unsigned long addr,
+				   unsigned int bytes,
+				   struct x86_exception *exception,
+				   bool write,
+				   void **p_opaque,
+				   unsigned long *p_hva)
+{
+	struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt);
+	struct kvm_memory_slot *memslot;
+	int ret;
+	gpa_t gpa;
+	gfn_t gfn;
+	unsigned long hva;
+	
+	if (unlikely(((addr + bytes - 1) ^ addr) & PAGE_MASK))
+		goto no_hva;
+
+	ret = vcpu_mmio_gva_to_gpa(vcpu, addr, &gpa, exception, true);
+	if (ret != 0) {
+		if (ret < 0)
+			return X86EMUL_PROPAGATE_FAULT;
+		goto no_hva;
+	}
+
+	/* A (heavily) simplified version of kvm_gfn_to_hva_cache_init.  */
+	gfn = gpa >> PAGE_SHIFT;
+	memslot = gfn_to_memslot(vcpu->kvm, gfn);
+	if (!memslot)
+		goto no_hva;
+
+	if (write) {
+		if (memslot_is_readonly(memslot))
+			goto no_hva;
+
+		*p_opaque = memslot->dirty_bitmap ? memslot : NULL;
+	}
+
+	hva = __gfn_to_hva_memslot(memslot, gfn);
+	if (kvm_is_error_hva(hva))
+		goto no_hva;
+
+	*p_hva = hva + offset_in_page(gpa);
+	return X86EMUL_CONTINUE;
+
+no_hva:
+	*p_hva = KVM_HVA_ERR_BAD;
+	return X86EMUL_CONTINUE;
+}
+
+static void emulator_memory_finish(struct x86_emulate_ctxt *ctxt,
+				   void *opaque,
+				   unsigned long hva)
+{
+	struct kvm_memory_slot *memslot;
+	gfn_t gfn;
+
+	if (!opaque)
+		return;
+
+	memslot = opaque;
+	gfn = hva_to_gfn_memslot(hva, memslot);
+	mark_page_dirty_in_slot(memslot, gfn);
+}
+
 static int emulator_read_write_onepage(unsigned long addr, void *val,
 				       unsigned int bytes,
 				       struct x86_exception *exception,
@@ -4816,6 +4881,8 @@ static const struct x86_emulate_ops emulate_ops = {
 	.read_std            = kvm_read_guest_virt_system,
 	.write_std           = kvm_write_guest_virt_system,
 	.fetch               = kvm_fetch_guest_virt,
+	.memory_prepare      = emulator_memory_prepare,
+	.memory_finish       = emulator_memory_finish,
 	.read_emulated       = emulator_read_emulated,
 	.write_emulated      = emulator_write_emulated,
 	.cmpxchg_emulated    = emulator_cmpxchg_emulated,
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 56289b487ddd..d4f213621c40 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -841,6 +841,11 @@ search_memslots(struct kvm_memslots *slots, gfn_t gfn)
 	return NULL;
 }
 
+static inline bool memslot_is_readonly(struct kvm_memory_slot *slot)
+{
+	return slot->flags & KVM_MEM_READONLY;
+}
+
 static inline struct kvm_memory_slot *
 __gfn_to_memslot(struct kvm_memslots *slots, gfn_t gfn)
 {
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 98833e21d580..edfc901c012f 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1026,11 +1026,6 @@ out:
 	return size;
 }
 
-static bool memslot_is_readonly(struct kvm_memory_slot *slot)
-{
-	return slot->flags & KVM_MEM_READONLY;
-}
-
 static unsigned long __gfn_to_hva_many(struct kvm_memory_slot *slot, gfn_t gfn,
 				       gfn_t *nr_pages, bool write)
 {
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 5/6] KVM: emulate: activate memory access optimization
  2014-04-01 15:26 [RFC PATCH 0/6] KVM: x86: speedups for emulator memory accesses Paolo Bonzini
                   ` (3 preceding siblings ...)
  2014-04-01 15:26 ` [PATCH 4/6] KVM: emulate: introduce memory_prepare callback to speed up memory access Paolo Bonzini
@ 2014-04-01 15:26 ` Paolo Bonzini
  2014-04-01 15:26 ` [PATCH 6/6] KVM: emulate: extend memory access optimization to stores Paolo Bonzini
  2014-04-21 20:12 ` [RFC PATCH 0/6] KVM: x86: speedups for emulator memory accesses Marcelo Tosatti
  6 siblings, 0 replies; 8+ messages in thread
From: Paolo Bonzini @ 2014-04-01 15:26 UTC (permalink / raw)
  To: linux-kernel; +Cc: kvm

memory_prepare lets us replace segmented_read/segmented_write with direct
calls to __copy_from_user/__copy_to_user.

This saves about 70 cycles (15%) on arithmetic with a memory source
operand, and about 150 cycles (25%) on arithmetic with a memory
destination operand.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/asm/kvm_emulate.h |  2 ++
 arch/x86/kvm/emulate.c             | 50 ++++++++++++++++++++++++++++++++++----
 2 files changed, 47 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h
index 4a580be2553e..a572d4fabd4f 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -263,6 +263,8 @@ struct operand {
 		u64 mm_val;
 		void *data;
 	};
+	unsigned long hva;
+	void *opaque;
 };
 
 struct fetch_cache {
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index c7ef72c1289e..2c881e5cf5ad 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1546,9 +1546,37 @@ exception:
 }
 
 static int prepare_memory_operand(struct x86_emulate_ctxt *ctxt,
-				  struct operand *op)
+				  struct operand *op,
+				  bool write)
 {
-	return segmented_read(ctxt, op->addr.mem, &op->val, op->bytes);
+	int rc;
+	unsigned long gva;
+	unsigned int size = op->bytes;
+
+	rc = linearize(ctxt, op->addr.mem, size, write, &gva);
+	if (rc != X86EMUL_CONTINUE)
+		return rc;
+
+	rc = ctxt->ops->memory_prepare(ctxt, gva, size,
+					&ctxt->exception, true,
+					&op->opaque, &op->hva);
+	if (rc != X86EMUL_CONTINUE)
+		return rc;
+
+	if (likely(!kvm_is_error_hva(op->hva))) {
+		rc = __copy_from_user(&op->val, (void __user *)op->hva,
+				      size);
+		if (!write)
+			ctxt->ops->memory_finish(ctxt, op->opaque, op->hva);
+
+		if (likely(!rc))
+			return X86EMUL_CONTINUE;
+
+		/* Should not happen.  */
+		op->hva = KVM_HVA_ERR_BAD;
+	}
+
+	return read_emulated(ctxt, gva, &op->val, size);
 }
 
 static int cmpxchg_memory_operand(struct x86_emulate_ctxt *ctxt,
@@ -1563,6 +1591,17 @@ static int cmpxchg_memory_operand(struct x86_emulate_ctxt *ctxt,
 static int write_memory_operand(struct x86_emulate_ctxt *ctxt,
 				struct operand *op)
 {
+	int rc;
+
+	if (likely(!kvm_is_error_hva(op->hva))) {
+		rc = __copy_to_user((void __user *)op->hva, &op->val,
+				    op->bytes);
+		ctxt->ops->memory_finish(ctxt, op->opaque, op->hva);
+
+		if (likely(!rc))
+			return X86EMUL_CONTINUE;
+	}
+
 	return segmented_write(ctxt, op->addr.mem,
 			       &op->val,
 			       op->bytes);
@@ -4604,14 +4643,14 @@ int x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
 	}
 
 	if ((ctxt->src.type == OP_MEM) && !(ctxt->d & NoAccess)) {
-		rc = prepare_memory_operand(ctxt, &ctxt->src);
+		rc = prepare_memory_operand(ctxt, &ctxt->src, false);
 		if (rc != X86EMUL_CONTINUE)
 			goto done;
 		ctxt->src.orig_val64 = ctxt->src.val64;
 	}
 
 	if (ctxt->src2.type == OP_MEM) {
-		rc = prepare_memory_operand(ctxt, &ctxt->src2);
+		rc = prepare_memory_operand(ctxt, &ctxt->src2, false);
 		if (rc != X86EMUL_CONTINUE)
 			goto done;
 	}
@@ -4622,7 +4661,8 @@ int x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
 
 	if ((ctxt->dst.type == OP_MEM) && !(ctxt->d & Mov)) {
 		/* optimisation - avoid slow emulated read if Mov */
-		rc = prepare_memory_operand(ctxt, &ctxt->dst);
+		rc = prepare_memory_operand(ctxt, &ctxt->dst,
+					    !(ctxt->d & NoWrite));
 		if (rc != X86EMUL_CONTINUE)
 			goto done;
 	}
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 6/6] KVM: emulate: extend memory access optimization to stores
  2014-04-01 15:26 [RFC PATCH 0/6] KVM: x86: speedups for emulator memory accesses Paolo Bonzini
                   ` (4 preceding siblings ...)
  2014-04-01 15:26 ` [PATCH 5/6] KVM: emulate: activate memory access optimization Paolo Bonzini
@ 2014-04-01 15:26 ` Paolo Bonzini
  2014-04-21 20:12 ` [RFC PATCH 0/6] KVM: x86: speedups for emulator memory accesses Marcelo Tosatti
  6 siblings, 0 replies; 8+ messages in thread
From: Paolo Bonzini @ 2014-04-01 15:26 UTC (permalink / raw)
  To: linux-kernel; +Cc: kvm

Even on a store, the optimization saves about 20 clock cycles (4-5%),
mostly because the jump in write_memory_operand becomes much more
predictable.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/emulate.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 2c881e5cf5ad..541296f62b40 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1547,7 +1547,7 @@ exception:
 
 static int prepare_memory_operand(struct x86_emulate_ctxt *ctxt,
 				  struct operand *op,
-				  bool write)
+				  bool read, bool write)
 {
 	int rc;
 	unsigned long gva;
@@ -1563,6 +1563,10 @@ static int prepare_memory_operand(struct x86_emulate_ctxt *ctxt,
 	if (rc != X86EMUL_CONTINUE)
 		return rc;
 
+	/* optimisation - avoid slow emulated read if Mov */
+	if (!read)
+		return X86EMUL_CONTINUE;
+
 	if (likely(!kvm_is_error_hva(op->hva))) {
 		rc = __copy_from_user(&op->val, (void __user *)op->hva,
 				      size);
@@ -4643,14 +4647,14 @@ int x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
 	}
 
 	if ((ctxt->src.type == OP_MEM) && !(ctxt->d & NoAccess)) {
-		rc = prepare_memory_operand(ctxt, &ctxt->src, false);
+		rc = prepare_memory_operand(ctxt, &ctxt->src, true, false);
 		if (rc != X86EMUL_CONTINUE)
 			goto done;
 		ctxt->src.orig_val64 = ctxt->src.val64;
 	}
 
 	if (ctxt->src2.type == OP_MEM) {
-		rc = prepare_memory_operand(ctxt, &ctxt->src2, false);
+		rc = prepare_memory_operand(ctxt, &ctxt->src2, true, false);
 		if (rc != X86EMUL_CONTINUE)
 			goto done;
 	}
@@ -4659,9 +4663,9 @@ int x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
 		goto special_insn;
 
 
-	if ((ctxt->dst.type == OP_MEM) && !(ctxt->d & Mov)) {
-		/* optimisation - avoid slow emulated read if Mov */
+	if (ctxt->dst.type == OP_MEM) {
 		rc = prepare_memory_operand(ctxt, &ctxt->dst,
+					    !(ctxt->d & Mov),
 					    !(ctxt->d & NoWrite));
 		if (rc != X86EMUL_CONTINUE)
 			goto done;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH 0/6] KVM: x86: speedups for emulator memory accesses
  2014-04-01 15:26 [RFC PATCH 0/6] KVM: x86: speedups for emulator memory accesses Paolo Bonzini
                   ` (5 preceding siblings ...)
  2014-04-01 15:26 ` [PATCH 6/6] KVM: emulate: extend memory access optimization to stores Paolo Bonzini
@ 2014-04-21 20:12 ` Marcelo Tosatti
  6 siblings, 0 replies; 8+ messages in thread
From: Marcelo Tosatti @ 2014-04-21 20:12 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm

On Tue, Apr 01, 2014 at 05:26:40PM +0200, Paolo Bonzini wrote:
> Another emulator speedup series, shaving up to 400 cycles (25%) off
> RMW instructions.
> 
> The performance of various instructions is now relatively flat:
> 
>  jump  919  (down from 2300)
>  move  1075 (down from 2700)
>  arith 1081 (down from 2600)
>  load  1267 (down from 2800, 1400 after previous round)
>  store 1213 (down from 2900, 1300 after previous round)
>  RMW   1310 (down from 3200, 1700 after previous round)
> 
> The next low-hanging fruit is fetching instructions and initializing
> the context.  Similar optimizations to those done here could be made
> for instruction fetch.
> 
> Paolo Bonzini (6):
>   KVM: emulate: simplify writeback
>   KVM: emulate: abstract handling of memory operands
>   KVM: export mark_page_dirty_in_slot
>   KVM: emulate: introduce memory_prepare callback to speed up memory access
>   KVM: emulate: activate memory access optimization
>   KVM: emulate: extend memory access optimization to stores
> 
>  arch/x86/include/asm/kvm_emulate.h |  28 ++++++++++
>  arch/x86/kvm/emulate.c             | 107 +++++++++++++++++++++++++++----------
>  arch/x86/kvm/x86.c                 |  67 +++++++++++++++++++++++
>  include/linux/kvm_host.h           |   6 +++
>  virt/kvm/kvm_main.c                |  17 ++----
>  5 files changed, 185 insertions(+), 40 deletions(-)
> 
> -- 
> 1.8.3.1

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2014-04-21 21:01 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-01 15:26 [RFC PATCH 0/6] KVM: x86: speedups for emulator memory accesses Paolo Bonzini
2014-04-01 15:26 ` [PATCH 1/6] KVM: emulate: simplify writeback Paolo Bonzini
2014-04-01 15:26 ` [PATCH 2/6] KVM: emulate: abstract handling of memory operands Paolo Bonzini
2014-04-01 15:26 ` [PATCH 3/6] KVM: export mark_page_dirty_in_slot Paolo Bonzini
2014-04-01 15:26 ` [PATCH 4/6] KVM: emulate: introduce memory_prepare callback to speed up memory access Paolo Bonzini
2014-04-01 15:26 ` [PATCH 5/6] KVM: emulate: activate memory access optimization Paolo Bonzini
2014-04-01 15:26 ` [PATCH 6/6] KVM: emulate: extend memory access optimization to stores Paolo Bonzini
2014-04-21 20:12 ` [RFC PATCH 0/6] KVM: x86: speedups for emulator memory accesses Marcelo Tosatti

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.