[RFC PATCH 0/3] Emulator Speedups - Optimize Instruction fetches

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH 0/3] Emulator Speedups - Optimize Instruction fetches
@ 2014-05-06  0:40 Bandan Das
  2014-05-06  0:40 ` [RFC PATCH 1/3] KVM: x86: pass ctxt to fetch helper function Bandan Das
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Bandan Das @ 2014-05-06  0:40 UTC (permalink / raw)
  To: kvm; +Cc: Paolo Bonzini, Marcelo Tosatti, Gleb Natapov, linux-kernel

My initial attempt at caching gva->gpa->hva translations. Pretty straight
forward with details in the individual patches.

I haven't yet looked into if there are other possibilities
to speed things up, just thought of sending these out since 
the numbers are better

567 cycles/emulated jump instruction
718 cycles/emulated move instruction
730 cycles/emulated arithmetic instruction
946 cycles/emulated memory load instruction
956 cycles/emulated memory store instruction
921 cycles/emulated memory RMW instruction

Old realmode.flat numbers from init ctxt changes -
https://lkml.org/lkml/2014/4/16/848

639 cycles/emulated jump instruction (4.3%)
776 cycles/emulated move instruction (7.5%)
791 cycles/emulated arithmetic instruction (11%)
943 cycles/emulated memory load instruction (5.2%)
948 cycles/emulated memory store instruction (7.6%)
929 cycles/emulated memory RMW instruction (9.0%)

Bandan Das (3):
  KVM: x86: pass ctxt to fetch helper function
  KVM: x86: use memory_prepare in fetch helper function
  KVM: x86: cache userspace address for faster fetches

 arch/x86/include/asm/kvm_emulate.h |  7 +++++-
 arch/x86/kvm/x86.c                 | 46 ++++++++++++++++++++++++++++----------
 2 files changed, 40 insertions(+), 13 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [RFC PATCH 1/3] KVM: x86: pass ctxt to fetch helper function
  2014-05-06  0:40 [RFC PATCH 0/3] Emulator Speedups - Optimize Instruction fetches Bandan Das
@ 2014-05-06  0:40 ` Bandan Das
  2014-05-06  0:40 ` [RFC PATCH 2/3] KVM: x86: use memory_prepare in " Bandan Das
  2014-05-06  0:40 ` [RFC PATCH 3/3] KVM: x86: cache userspace address for faster fetches Bandan Das
  2 siblings, 0 replies; 8+ messages in thread
From: Bandan Das @ 2014-05-06  0:40 UTC (permalink / raw)
  To: kvm; +Cc: Paolo Bonzini, Marcelo Tosatti, Gleb Natapov, linux-kernel

In the following patches, our adress caching struct that's
embedded within struct x86_emulate_ctxt will need to be
accessed

Signed-off-by: Bandan Das <bsd@redhat.com>
---
 arch/x86/kvm/x86.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 122410d..17e3d661 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4061,10 +4061,11 @@ gpa_t kvm_mmu_gva_to_gpa_system(struct kvm_vcpu *vcpu, gva_t gva,
 }
 
 static int kvm_read_guest_virt_helper(gva_t addr, void *val, unsigned int bytes,
-				      struct kvm_vcpu *vcpu, u32 access,
+				      struct x86_emulate_ctxt *ctxt, u32 access,
 				      struct x86_exception *exception)
 {
 	void *data = val;
+	struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt);
 	int r = X86EMUL_CONTINUE;
 
 	while (bytes) {
@@ -4098,7 +4099,7 @@ static int kvm_fetch_guest_virt(struct x86_emulate_ctxt *ctxt,
 	struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt);
 	u32 access = (kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0;
 
-	return kvm_read_guest_virt_helper(addr, val, bytes, vcpu,
+	return kvm_read_guest_virt_helper(addr, val, bytes, ctxt,
 					  access | PFERR_FETCH_MASK,
 					  exception);
 }
@@ -4110,7 +4111,7 @@ int kvm_read_guest_virt(struct x86_emulate_ctxt *ctxt,
 	struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt);
 	u32 access = (kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0;
 
-	return kvm_read_guest_virt_helper(addr, val, bytes, vcpu, access,
+	return kvm_read_guest_virt_helper(addr, val, bytes, ctxt, access,
 					  exception);
 }
 EXPORT_SYMBOL_GPL(kvm_read_guest_virt);
@@ -4119,8 +4120,7 @@ static int kvm_read_guest_virt_system(struct x86_emulate_ctxt *ctxt,
 				      gva_t addr, void *val, unsigned int bytes,
 				      struct x86_exception *exception)
 {
-	struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt);
-	return kvm_read_guest_virt_helper(addr, val, bytes, vcpu, 0, exception);
+	return kvm_read_guest_virt_helper(addr, val, bytes, ctxt, 0, exception);
 }
 
 int kvm_write_guest_virt_system(struct x86_emulate_ctxt *ctxt,
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC PATCH 2/3] KVM: x86: use memory_prepare in fetch helper function
  2014-05-06  0:40 [RFC PATCH 0/3] Emulator Speedups - Optimize Instruction fetches Bandan Das
  2014-05-06  0:40 ` [RFC PATCH 1/3] KVM: x86: pass ctxt to fetch helper function Bandan Das
@ 2014-05-06  0:40 ` Bandan Das
  2014-05-06  8:27   ` Paolo Bonzini
  2014-05-06  9:46   ` Paolo Bonzini
  2014-05-06  0:40 ` [RFC PATCH 3/3] KVM: x86: cache userspace address for faster fetches Bandan Das
  2 siblings, 2 replies; 8+ messages in thread
From: Bandan Das @ 2014-05-06  0:40 UTC (permalink / raw)
  To: kvm; +Cc: Paolo Bonzini, Marcelo Tosatti, Gleb Natapov, linux-kernel

Insn fetch fastpath function. Not that
arch.walk_mmu->gva_to_gpa can't be used but let's
piggyback on top of interface meant for our purpose

Signed-off-by: Bandan Das <bsd@redhat.com>
---
 arch/x86/kvm/x86.c | 25 +++++++++++++++++--------
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 17e3d661..cf69e3b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4065,29 +4065,38 @@ static int kvm_read_guest_virt_helper(gva_t addr, void *val, unsigned int bytes,
 				      struct x86_exception *exception)
 {
 	void *data = val;
-	struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt);
 	int r = X86EMUL_CONTINUE;
 
 	while (bytes) {
-		gpa_t gpa = vcpu->arch.walk_mmu->gva_to_gpa(vcpu, addr, access,
-							    exception);
 		unsigned offset = addr & (PAGE_SIZE-1);
 		unsigned toread = min(bytes, (unsigned)PAGE_SIZE - offset);
 		int ret;
+		unsigned long uaddr;
 
-		if (gpa == UNMAPPED_GVA)
-			return X86EMUL_PROPAGATE_FAULT;
-		ret = kvm_read_guest(vcpu->kvm, gpa, data, toread);
+		ret = ctxt->ops->memory_prepare(ctxt, addr, toread,
+						exception, false,
+						NULL, &uaddr);
+		if (ret != X86EMUL_CONTINUE)
+			return ret;
+
+		if (unlikely(kvm_is_error_hva(uaddr))) {
+			r = X86EMUL_PROPAGATE_FAULT;
+			return r;
+		}
+
+		ret = __copy_from_user(data, (void __user *)uaddr, toread);
 		if (ret < 0) {
 			r = X86EMUL_IO_NEEDED;
-			goto out;
+			return r;
 		}
 
+		ctxt->ops->memory_finish(ctxt, NULL, uaddr);
+
 		bytes -= toread;
 		data += toread;
 		addr += toread;
 	}
-out:
+
 	return r;
 }
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC PATCH 3/3] KVM: x86: cache userspace address for faster fetches
  2014-05-06  0:40 [RFC PATCH 0/3] Emulator Speedups - Optimize Instruction fetches Bandan Das
  2014-05-06  0:40 ` [RFC PATCH 1/3] KVM: x86: pass ctxt to fetch helper function Bandan Das
  2014-05-06  0:40 ` [RFC PATCH 2/3] KVM: x86: use memory_prepare in " Bandan Das
@ 2014-05-06  0:40 ` Bandan Das
  2014-05-06  9:40   ` Paolo Bonzini
  2 siblings, 1 reply; 8+ messages in thread
From: Bandan Das @ 2014-05-06  0:40 UTC (permalink / raw)
  To: kvm; +Cc: Paolo Bonzini, Marcelo Tosatti, Gleb Natapov, linux-kernel

On every instruction fetch, kvm_read_guest_virt_helper
does the gva to gpa translation followed by searching for the
memslot. Store the gva hva mapping so that if there's a match
we can directly call __copy_from_user()

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Bandan Das <bsd@redhat.com>
---
 arch/x86/include/asm/kvm_emulate.h |  7 ++++++-
 arch/x86/kvm/x86.c                 | 33 +++++++++++++++++++++++----------
 2 files changed, 29 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h
index 085d688..20ccde4 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -323,10 +323,11 @@ struct x86_emulate_ctxt {
 	int (*execute)(struct x86_emulate_ctxt *ctxt);
 	int (*check_perm)(struct x86_emulate_ctxt *ctxt);
 	/*
-	 * The following five fields are cleared together,
+	 * The following six fields are cleared together,
 	 * the rest are initialized unconditionally in x86_decode_insn
 	 * or elsewhere
 	 */
+	bool addr_cache_valid;
 	u8 rex_prefix;
 	u8 lock_prefix;
 	u8 rep_prefix;
@@ -348,6 +349,10 @@ struct x86_emulate_ctxt {
 	struct fetch_cache fetch;
 	struct read_cache io_read;
 	struct read_cache mem_read;
+	struct {
+		gfn_t gfn;
+		unsigned long uaddr;
+	} addr_cache;
 };
 
 /* Repeat String Operation Prefix */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index cf69e3b..7afcfc7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4072,26 +4072,38 @@ static int kvm_read_guest_virt_helper(gva_t addr, void *val, unsigned int bytes,
 		unsigned toread = min(bytes, (unsigned)PAGE_SIZE - offset);
 		int ret;
 		unsigned long uaddr;
+		gfn_t gfn = addr >> PAGE_SHIFT;
 
-		ret = ctxt->ops->memory_prepare(ctxt, addr, toread,
-						exception, false,
-						NULL, &uaddr);
-		if (ret != X86EMUL_CONTINUE)
-			return ret;
+		if (ctxt->addr_cache_valid &&
+		    (ctxt->addr_cache.gfn == gfn))
+			uaddr = (ctxt->addr_cache.uaddr << PAGE_SHIFT) +
+				offset_in_page(addr);
+		else {
+			ret = ctxt->ops->memory_prepare(ctxt, addr, toread,
+							exception, false,
+							NULL, &uaddr);
+			if (ret != X86EMUL_CONTINUE)
+				return ret;
+
+			if (unlikely(kvm_is_error_hva(uaddr))) {
+				r = X86EMUL_PROPAGATE_FAULT;
+				return r;
+			}
 
-		if (unlikely(kvm_is_error_hva(uaddr))) {
-			r = X86EMUL_PROPAGATE_FAULT;
-			return r;
+			/* Cache gfn and hva */
+			ctxt->addr_cache.gfn = addr >> PAGE_SHIFT;
+			ctxt->addr_cache.uaddr = uaddr >> PAGE_SHIFT;
+			ctxt->addr_cache_valid = true;
 		}
 
 		ret = __copy_from_user(data, (void __user *)uaddr, toread);
 		if (ret < 0) {
 			r = X86EMUL_IO_NEEDED;
+			/* Where else should we invalidate cache ? */
+			ctxt->ops->memory_finish(ctxt, NULL, uaddr);
 			return r;
 		}
 
-		ctxt->ops->memory_finish(ctxt, NULL, uaddr);
-
 		bytes -= toread;
 		data += toread;
 		addr += toread;
@@ -4339,6 +4351,7 @@ static void emulator_memory_finish(struct x86_emulate_ctxt *ctxt,
 	struct kvm_memory_slot *memslot;
 	gfn_t gfn;
 
+	ctxt->addr_cache_valid = false;
 	if (!opaque)
 		return;
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH 2/3] KVM: x86: use memory_prepare in fetch helper function
  2014-05-06  0:40 ` [RFC PATCH 2/3] KVM: x86: use memory_prepare in " Bandan Das
@ 2014-05-06  8:27   ` Paolo Bonzini
  2014-05-06  9:46   ` Paolo Bonzini
  1 sibling, 0 replies; 8+ messages in thread
From: Paolo Bonzini @ 2014-05-06  8:27 UTC (permalink / raw)
  To: Bandan Das, kvm; +Cc: Marcelo Tosatti, Gleb Natapov, linux-kernel

Il 06/05/2014 02:40, Bandan Das ha scritto:
> Insn fetch fastpath function. Not that
> arch.walk_mmu->gva_to_gpa can't be used but let's
> piggyback on top of interface meant for our purpose
>
> Signed-off-by: Bandan Das <bsd@redhat.com>
> ---
>  arch/x86/kvm/x86.c | 25 +++++++++++++++++--------
>  1 file changed, 17 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 17e3d661..cf69e3b 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4065,29 +4065,38 @@ static int kvm_read_guest_virt_helper(gva_t addr, void *val, unsigned int bytes,
>  				      struct x86_exception *exception)
>  {
>  	void *data = val;
> -	struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt);
>  	int r = X86EMUL_CONTINUE;
>
>  	while (bytes) {
> -		gpa_t gpa = vcpu->arch.walk_mmu->gva_to_gpa(vcpu, addr, access,
> -							    exception);
>  		unsigned offset = addr & (PAGE_SIZE-1);
>  		unsigned toread = min(bytes, (unsigned)PAGE_SIZE - offset);
>  		int ret;
> +		unsigned long uaddr;
>
> -		if (gpa == UNMAPPED_GVA)
> -			return X86EMUL_PROPAGATE_FAULT;
> -		ret = kvm_read_guest(vcpu->kvm, gpa, data, toread);
> +		ret = ctxt->ops->memory_prepare(ctxt, addr, toread,
> +						exception, false,
> +						NULL, &uaddr);

You can call the function from x86.c directly, no need to go through 
ctxt->ops.

> +		if (ret != X86EMUL_CONTINUE)
> +			return ret;
> +
> +		if (unlikely(kvm_is_error_hva(uaddr))) {
> +			r = X86EMUL_PROPAGATE_FAULT;
> +			return r;
> +		}
> +
> +		ret = __copy_from_user(data, (void __user *)uaddr, toread);
>  		if (ret < 0) {
>  			r = X86EMUL_IO_NEEDED;
> -			goto out;
> +			return r;
>  		}
>
> +		ctxt->ops->memory_finish(ctxt, NULL, uaddr);

No need to call memory_finish, since you know the implementation 
(perhaps add a comment).

Paolo

>  		bytes -= toread;
>  		data += toread;
>  		addr += toread;
>  	}
> -out:
> +
>  	return r;
>  }
>
>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH 3/3] KVM: x86: cache userspace address for faster fetches
  2014-05-06  0:40 ` [RFC PATCH 3/3] KVM: x86: cache userspace address for faster fetches Bandan Das
@ 2014-05-06  9:40   ` Paolo Bonzini
  2014-05-07  4:45     ` Bandan Das
  0 siblings, 1 reply; 8+ messages in thread
From: Paolo Bonzini @ 2014-05-06  9:40 UTC (permalink / raw)
  To: Bandan Das, kvm; +Cc: Marcelo Tosatti, Gleb Natapov, linux-kernel

Il 06/05/2014 02:40, Bandan Das ha scritto:
> On every instruction fetch, kvm_read_guest_virt_helper
> does the gva to gpa translation followed by searching for the
> memslot. Store the gva hva mapping so that if there's a match
> we can directly call __copy_from_user()
>
> Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Bandan Das <bsd@redhat.com>
> ---
>  arch/x86/include/asm/kvm_emulate.h |  7 ++++++-
>  arch/x86/kvm/x86.c                 | 33 +++++++++++++++++++++++----------
>  2 files changed, 29 insertions(+), 11 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h
> index 085d688..20ccde4 100644
> --- a/arch/x86/include/asm/kvm_emulate.h
> +++ b/arch/x86/include/asm/kvm_emulate.h
> @@ -323,10 +323,11 @@ struct x86_emulate_ctxt {
>  	int (*execute)(struct x86_emulate_ctxt *ctxt);
>  	int (*check_perm)(struct x86_emulate_ctxt *ctxt);
>  	/*
> -	 * The following five fields are cleared together,
> +	 * The following six fields are cleared together,
>  	 * the rest are initialized unconditionally in x86_decode_insn
>  	 * or elsewhere
>  	 */
> +	bool addr_cache_valid;

You can just set gfn to -1 to invalidate the fetch.

>  	u8 rex_prefix;
>  	u8 lock_prefix;
>  	u8 rep_prefix;
> @@ -348,6 +349,10 @@ struct x86_emulate_ctxt {
>  	struct fetch_cache fetch;
>  	struct read_cache io_read;
>  	struct read_cache mem_read;
> +	struct {
> +		gfn_t gfn;
> +		unsigned long uaddr;
> +	} addr_cache;

This is not used by emulate.c, so it should be directly in struct 
kvm_vcpu.  You could invalidate it in init_emulate_ctxt, together with 
emulate_regs_need_sync_from_vcpu.

In the big picture, however, what we really want is to persist the cache 
across multiple instructions, and also cache the linearization of the 
address (where you add RIP and CS.base).  This would be a bigger source 
of improvement.  If you do that, the cache has to be indeed in 
x86_emulate_ctxt, but on the other hand the implementation needs to be 
in emulate.c.

>  };
>
>  /* Repeat String Operation Prefix */
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index cf69e3b..7afcfc7 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4072,26 +4072,38 @@ static int kvm_read_guest_virt_helper(gva_t addr, void *val, unsigned int bytes,
>  		unsigned toread = min(bytes, (unsigned)PAGE_SIZE - offset);
>  		int ret;
>  		unsigned long uaddr;
> +		gfn_t gfn = addr >> PAGE_SHIFT;
>
> -		ret = ctxt->ops->memory_prepare(ctxt, addr, toread,
> -						exception, false,
> -						NULL, &uaddr);
> -		if (ret != X86EMUL_CONTINUE)
> -			return ret;
> +		if (ctxt->addr_cache_valid &&
> +		    (ctxt->addr_cache.gfn == gfn))
> +			uaddr = (ctxt->addr_cache.uaddr << PAGE_SHIFT) +
> +				offset_in_page(addr);
> +		else {
> +			ret = ctxt->ops->memory_prepare(ctxt, addr, toread,
> +							exception, false,
> +							NULL, &uaddr);

Did you measure the hit rate, and the speedup after every patch?  My 
reading of the code is that the fetch is done only once per page, with 
the speedup coming from the microoptimization that patch 2 provides for 
single-page reads.  Single-page reads are indeed a very common case for 
the emulator.

I suggest to start with making patch 2 ready for inclusion.

Paolo

> +			if (ret != X86EMUL_CONTINUE)
> +				return ret;
> +
> +			if (unlikely(kvm_is_error_hva(uaddr))) {
> +				r = X86EMUL_PROPAGATE_FAULT;
> +				return r;
> +			}
>
> -		if (unlikely(kvm_is_error_hva(uaddr))) {
> -			r = X86EMUL_PROPAGATE_FAULT;
> -			return r;
> +			/* Cache gfn and hva */
> +			ctxt->addr_cache.gfn = addr >> PAGE_SHIFT;
> +			ctxt->addr_cache.uaddr = uaddr >> PAGE_SHIFT;
> +			ctxt->addr_cache_valid = true;
>  		}
>
>  		ret = __copy_from_user(data, (void __user *)uaddr, toread);
>  		if (ret < 0) {
>  			r = X86EMUL_IO_NEEDED;
> +			/* Where else should we invalidate cache ? */
> +			ctxt->ops->memory_finish(ctxt, NULL, uaddr);
>  			return r;
>  		}
>
> -		ctxt->ops->memory_finish(ctxt, NULL, uaddr);
> -
>  		bytes -= toread;
>  		data += toread;
>  		addr += toread;
> @@ -4339,6 +4351,7 @@ static void emulator_memory_finish(struct x86_emulate_ctxt *ctxt,
>  	struct kvm_memory_slot *memslot;
>  	gfn_t gfn;
>
> +	ctxt->addr_cache_valid = false;
>  	if (!opaque)
>  		return;
>
>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH 2/3] KVM: x86: use memory_prepare in fetch helper function
  2014-05-06  0:40 ` [RFC PATCH 2/3] KVM: x86: use memory_prepare in " Bandan Das
  2014-05-06  8:27   ` Paolo Bonzini
@ 2014-05-06  9:46   ` Paolo Bonzini
  1 sibling, 0 replies; 8+ messages in thread
From: Paolo Bonzini @ 2014-05-06  9:46 UTC (permalink / raw)
  To: Bandan Das, kvm; +Cc: Marcelo Tosatti, Gleb Natapov, linux-kernel

Il 06/05/2014 02:40, Bandan Das ha scritto:
> +		ret = ctxt->ops->memory_prepare(ctxt, addr, toread,
> +						exception, false,
> +						NULL, &uaddr);
> +		if (ret != X86EMUL_CONTINUE)
> +			return ret;
> +
> +		if (unlikely(kvm_is_error_hva(uaddr))) {
> +			r = X86EMUL_PROPAGATE_FAULT;
> +			return r;
> +		}

What you are doing here is basically optimizing 
kvm_read_guest_virt_helper because you know that all reads will be 
single-page and you do not need the "next_segment" in kvm_read_guest.
Good catch, but you can use kvm_read_guest_page instead of going through 
ctxt->ops->memory_prepare. :)

Paolo

> +		ret = __copy_from_user(data, (void __user *)uaddr, toread);


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH 3/3] KVM: x86: cache userspace address for faster fetches
  2014-05-06  9:40   ` Paolo Bonzini
@ 2014-05-07  4:45     ` Bandan Das
  0 siblings, 0 replies; 8+ messages in thread
From: Bandan Das @ 2014-05-07  4:45 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm, Marcelo Tosatti, Gleb Natapov, linux-kernel

Paolo Bonzini <pbonzini@redhat.com> writes:

> Il 06/05/2014 02:40, Bandan Das ha scritto:
>> On every instruction fetch, kvm_read_guest_virt_helper
>> does the gva to gpa translation followed by searching for the
>> memslot. Store the gva hva mapping so that if there's a match
>> we can directly call __copy_from_user()
>>
>> Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
>> Signed-off-by: Bandan Das <bsd@redhat.com>
>> ---
>>  arch/x86/include/asm/kvm_emulate.h |  7 ++++++-
>>  arch/x86/kvm/x86.c                 | 33 +++++++++++++++++++++++----------
>>  2 files changed, 29 insertions(+), 11 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h
>> index 085d688..20ccde4 100644
>> --- a/arch/x86/include/asm/kvm_emulate.h
>> +++ b/arch/x86/include/asm/kvm_emulate.h
>> @@ -323,10 +323,11 @@ struct x86_emulate_ctxt {
>>  	int (*execute)(struct x86_emulate_ctxt *ctxt);
>>  	int (*check_perm)(struct x86_emulate_ctxt *ctxt);
>>  	/*
>> -	 * The following five fields are cleared together,
>> +	 * The following six fields are cleared together,
>>  	 * the rest are initialized unconditionally in x86_decode_insn
>>  	 * or elsewhere
>>  	 */
>> +	bool addr_cache_valid;
>
> You can just set gfn to -1 to invalidate the fetch.
>
>>  	u8 rex_prefix;
>>  	u8 lock_prefix;
>>  	u8 rep_prefix;
>> @@ -348,6 +349,10 @@ struct x86_emulate_ctxt {
>>  	struct fetch_cache fetch;
>>  	struct read_cache io_read;
>>  	struct read_cache mem_read;
>> +	struct {
>> +		gfn_t gfn;
>> +		unsigned long uaddr;
>> +	} addr_cache;
>
> This is not used by emulate.c, so it should be directly in struct
> kvm_vcpu.  You could invalidate it in init_emulate_ctxt, together with
> emulate_regs_need_sync_from_vcpu.

Ok.

> In the big picture, however, what we really want is to persist the
> cache across multiple instructions, and also cache the linearization
> of the address (where you add RIP and CS.base).  This would be a
> bigger source of improvement.  If you do that, the cache has to be
> indeed in x86_emulate_ctxt, but on the other hand the implementation
> needs to be in emulate.c.
>
>>  };
>>
>>  /* Repeat String Operation Prefix */
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index cf69e3b..7afcfc7 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -4072,26 +4072,38 @@ static int kvm_read_guest_virt_helper(gva_t addr, void *val, unsigned int bytes,
>>  		unsigned toread = min(bytes, (unsigned)PAGE_SIZE - offset);
>>  		int ret;
>>  		unsigned long uaddr;
>> +		gfn_t gfn = addr >> PAGE_SHIFT;
>>
>> -		ret = ctxt->ops->memory_prepare(ctxt, addr, toread,
>> -						exception, false,
>> -						NULL, &uaddr);
>> -		if (ret != X86EMUL_CONTINUE)
>> -			return ret;
>> +		if (ctxt->addr_cache_valid &&
>> +		    (ctxt->addr_cache.gfn == gfn))
>> +			uaddr = (ctxt->addr_cache.uaddr << PAGE_SHIFT) +
>> +				offset_in_page(addr);
>> +		else {
>> +			ret = ctxt->ops->memory_prepare(ctxt, addr, toread,
>> +							exception, false,
>> +							NULL, &uaddr);
>
> Did you measure the hit rate, and the speedup after every patch?  My
> reading of the code is that the fetch is done only once per page, with

Yes, I did. IIRC, patch 3 actually improved the speedup compared to 2.
A rough estimate for jmp - patch 2 reduced it to the low 600s, I guess
around 610, but I could get a fresh set of numbers.

So, not sure where the speed up is coming from, I will try to find out.

> the speedup coming from the microoptimization that patch 2 provides
> for single-page reads.  Single-page reads are indeed a very common
> case for the emulator.
>
> I suggest to start with making patch 2 ready for inclusion.
>
> Paolo
>
>> +			if (ret != X86EMUL_CONTINUE)
>> +				return ret;
>> +
>> +			if (unlikely(kvm_is_error_hva(uaddr))) {
>> +				r = X86EMUL_PROPAGATE_FAULT;
>> +				return r;
>> +			}
>>
>> -		if (unlikely(kvm_is_error_hva(uaddr))) {
>> -			r = X86EMUL_PROPAGATE_FAULT;
>> -			return r;
>> +			/* Cache gfn and hva */
>> +			ctxt->addr_cache.gfn = addr >> PAGE_SHIFT;
>> +			ctxt->addr_cache.uaddr = uaddr >> PAGE_SHIFT;
>> +			ctxt->addr_cache_valid = true;
>>  		}
>>
>>  		ret = __copy_from_user(data, (void __user *)uaddr, toread);
>>  		if (ret < 0) {
>>  			r = X86EMUL_IO_NEEDED;
>> +			/* Where else should we invalidate cache ? */
>> +			ctxt->ops->memory_finish(ctxt, NULL, uaddr);
>>  			return r;
>>  		}
>>
>> -		ctxt->ops->memory_finish(ctxt, NULL, uaddr);
>> -
>>  		bytes -= toread;
>>  		data += toread;
>>  		addr += toread;
>> @@ -4339,6 +4351,7 @@ static void emulator_memory_finish(struct x86_emulate_ctxt *ctxt,
>>  	struct kvm_memory_slot *memslot;
>>  	gfn_t gfn;
>>
>> +	ctxt->addr_cache_valid = false;
>>  	if (!opaque)
>>  		return;
>>
>>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2014-05-07  4:46 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-06  0:40 [RFC PATCH 0/3] Emulator Speedups - Optimize Instruction fetches Bandan Das
2014-05-06  0:40 ` [RFC PATCH 1/3] KVM: x86: pass ctxt to fetch helper function Bandan Das
2014-05-06  0:40 ` [RFC PATCH 2/3] KVM: x86: use memory_prepare in " Bandan Das
2014-05-06  8:27   ` Paolo Bonzini
2014-05-06  9:46   ` Paolo Bonzini
2014-05-06  0:40 ` [RFC PATCH 3/3] KVM: x86: cache userspace address for faster fetches Bandan Das
2014-05-06  9:40   ` Paolo Bonzini
2014-05-07  4:45     ` Bandan Das

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).