All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] x86/HVM: extend LAPIC shortcuts around P2M lookups
@ 2014-08-01 13:43 Jan Beulich
  2014-08-01 14:27 ` Jan Beulich
  2014-08-01 19:03 ` Tim Deegan
  0 siblings, 2 replies; 9+ messages in thread
From: Jan Beulich @ 2014-08-01 13:43 UTC (permalink / raw)
  To: xen-devel; +Cc: Tim Deegan, Keir Fraser

[-- Attachment #1: Type: text/plain, Size: 3691 bytes --]

... to all internally handled MMIO regions. It is in particular the
HPET page that, e.g. on Windows Server 2012 R2, can get heavily
accessed, and hence avoiding the unnecessary lookups is rather
beneficial (in the reported case a 40+-vCPU guest would previously not
have booted at all while with hvm_hap_nested_page_fault() shortcut
alone it was able to boot up in 18 minutes [i.e. still room for
improvement]).

Note the apparently unrelated addition of a is_hvm_vcpu() check to the
__hvm_copy() code: Afaict for PVH this shortcut should never have taken
effect (since there's no LAPIC in that case).

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -2779,11 +2779,14 @@ int hvm_hap_nested_page_fault(paddr_t gp
         }
     }
 
-    /* For the benefit of 32-bit WinXP (& older Windows) on AMD CPUs,
-     * a fast path for LAPIC accesses, skipping the p2m lookup. */
+    /*
+     * No need to do the P2M lookup for internally handled MMIO, benefiting
+     * - 32-bit WinXP (& older Windows) on AMD CPUs for LAPIC accesses,
+     * - newer Windows (like Server 2012) for HPET accesses.
+     */
     if ( !nestedhvm_vcpu_in_guestmode(v)
          && is_hvm_vcpu(v)
-         && gfn == PFN_DOWN(vlapic_base_address(vcpu_vlapic(v))) )
+         && hvm_mmio_internal(gpa) )
     {
         if ( !handle_mmio() )
             hvm_inject_hw_exception(TRAP_gp_fault, 0);
@@ -3892,7 +3895,9 @@ static enum hvm_copy_result __hvm_copy(
 
     while ( todo > 0 )
     {
-        count = min_t(int, PAGE_SIZE - (addr & ~PAGE_MASK), todo);
+        paddr_t gpa = addr & ~PAGE_MASK;
+
+        count = min_t(int, PAGE_SIZE - gpa, todo);
 
         if ( flags & HVMCOPY_virt )
         {
@@ -3907,16 +3912,22 @@ static enum hvm_copy_result __hvm_copy(
                     hvm_inject_page_fault(pfec, addr);
                 return HVMCOPY_bad_gva_to_gfn;
             }
+            gpa |= (paddr_t)gfn << PAGE_SHIFT;
         }
         else
         {
             gfn = addr >> PAGE_SHIFT;
+            gpa = addr;
         }
 
-        /* For the benefit of 32-bit WinXP (& older Windows) on AMD CPUs,
-         * a fast path for LAPIC accesses, skipping the p2m lookup. */
+        /*
+         * No need to do the P2M lookup for internally handled MMIO, benefiting
+         * - 32-bit WinXP (& older Windows) on AMD CPUs for LAPIC accesses,
+         * - newer Windows (like Server 2012) for HPET accesses.
+         */
         if ( !nestedhvm_vcpu_in_guestmode(curr)
-             && gfn == PFN_DOWN(vlapic_base_address(vcpu_vlapic(curr))) )
+             && is_hvm_vcpu(curr)
+             && hvm_mmio_internal(gpa) )
             return HVMCOPY_bad_gfn_to_mfn;
 
         page = get_page_from_gfn(curr->domain, gfn, &p2mt, P2M_UNSHARE);
--- a/xen/arch/x86/hvm/intercept.c
+++ b/xen/arch/x86/hvm/intercept.c
@@ -163,6 +163,18 @@ static int hvm_mmio_access(struct vcpu *
     return rc;
 }
 
+bool_t hvm_mmio_internal(paddr_t gpa)
+{
+    struct vcpu *curr = current;
+    unsigned int i;
+
+    for ( i = 0; i < HVM_MMIO_HANDLER_NR; ++i )
+        if ( hvm_mmio_handlers[i]->check_handler(curr, gpa) )
+            return 1;
+
+    return 0;
+}
+
 int hvm_mmio_intercept(ioreq_t *p)
 {
     struct vcpu *v = current;
--- a/xen/include/asm-x86/hvm/io.h
+++ b/xen/include/asm-x86/hvm/io.h
@@ -91,6 +91,7 @@ static inline int hvm_buffered_io_interc
     return hvm_io_intercept(p, HVM_BUFFERED_IO);
 }
 
+bool_t hvm_mmio_internal(paddr_t gpa);
 int hvm_mmio_intercept(ioreq_t *p);
 int hvm_buffered_io_send(ioreq_t *p);
 




[-- Attachment #2: x86-HAP-shortcut-internal-MMIO.patch --]
[-- Type: text/plain, Size: 3739 bytes --]

x86/HVM: extend LAPIC shortcuts around P2M lookups

... to all internally handled MMIO regions. It is in particular the
HPET page that, e.g. on Windows Server 2012 R2, can get heavily
accessed, and hence avoiding the unnecessary lookups is rather
beneficial (in the reported case a 40+-vCPU guest would previously not
have booted at all while with hvm_hap_nested_page_fault() shortcut
alone it was able to boot up in 18 minutes [i.e. still room for
improvement]).

Note the apparently unrelated addition of a is_hvm_vcpu() check to the
__hvm_copy() code: Afaict for PVH this shortcut should never have taken
effect (since there's no LAPIC in that case).

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -2779,11 +2779,14 @@ int hvm_hap_nested_page_fault(paddr_t gp
         }
     }
 
-    /* For the benefit of 32-bit WinXP (& older Windows) on AMD CPUs,
-     * a fast path for LAPIC accesses, skipping the p2m lookup. */
+    /*
+     * No need to do the P2M lookup for internally handled MMIO, benefiting
+     * - 32-bit WinXP (& older Windows) on AMD CPUs for LAPIC accesses,
+     * - newer Windows (like Server 2012) for HPET accesses.
+     */
     if ( !nestedhvm_vcpu_in_guestmode(v)
          && is_hvm_vcpu(v)
-         && gfn == PFN_DOWN(vlapic_base_address(vcpu_vlapic(v))) )
+         && hvm_mmio_internal(gpa) )
     {
         if ( !handle_mmio() )
             hvm_inject_hw_exception(TRAP_gp_fault, 0);
@@ -3892,7 +3895,9 @@ static enum hvm_copy_result __hvm_copy(
 
     while ( todo > 0 )
     {
-        count = min_t(int, PAGE_SIZE - (addr & ~PAGE_MASK), todo);
+        paddr_t gpa = addr & ~PAGE_MASK;
+
+        count = min_t(int, PAGE_SIZE - gpa, todo);
 
         if ( flags & HVMCOPY_virt )
         {
@@ -3907,16 +3912,22 @@ static enum hvm_copy_result __hvm_copy(
                     hvm_inject_page_fault(pfec, addr);
                 return HVMCOPY_bad_gva_to_gfn;
             }
+            gpa |= (paddr_t)gfn << PAGE_SHIFT;
         }
         else
         {
             gfn = addr >> PAGE_SHIFT;
+            gpa = addr;
         }
 
-        /* For the benefit of 32-bit WinXP (& older Windows) on AMD CPUs,
-         * a fast path for LAPIC accesses, skipping the p2m lookup. */
+        /*
+         * No need to do the P2M lookup for internally handled MMIO, benefiting
+         * - 32-bit WinXP (& older Windows) on AMD CPUs for LAPIC accesses,
+         * - newer Windows (like Server 2012) for HPET accesses.
+         */
         if ( !nestedhvm_vcpu_in_guestmode(curr)
-             && gfn == PFN_DOWN(vlapic_base_address(vcpu_vlapic(curr))) )
+             && is_hvm_vcpu(curr)
+             && hvm_mmio_internal(gpa) )
             return HVMCOPY_bad_gfn_to_mfn;
 
         page = get_page_from_gfn(curr->domain, gfn, &p2mt, P2M_UNSHARE);
--- a/xen/arch/x86/hvm/intercept.c
+++ b/xen/arch/x86/hvm/intercept.c
@@ -163,6 +163,18 @@ static int hvm_mmio_access(struct vcpu *
     return rc;
 }
 
+bool_t hvm_mmio_internal(paddr_t gpa)
+{
+    struct vcpu *curr = current;
+    unsigned int i;
+
+    for ( i = 0; i < HVM_MMIO_HANDLER_NR; ++i )
+        if ( hvm_mmio_handlers[i]->check_handler(curr, gpa) )
+            return 1;
+
+    return 0;
+}
+
 int hvm_mmio_intercept(ioreq_t *p)
 {
     struct vcpu *v = current;
--- a/xen/include/asm-x86/hvm/io.h
+++ b/xen/include/asm-x86/hvm/io.h
@@ -91,6 +91,7 @@ static inline int hvm_buffered_io_interc
     return hvm_io_intercept(p, HVM_BUFFERED_IO);
 }
 
+bool_t hvm_mmio_internal(paddr_t gpa);
 int hvm_mmio_intercept(ioreq_t *p);
 int hvm_buffered_io_send(ioreq_t *p);
 

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86/HVM: extend LAPIC shortcuts around P2M lookups
  2014-08-01 13:43 [PATCH] x86/HVM: extend LAPIC shortcuts around P2M lookups Jan Beulich
@ 2014-08-01 14:27 ` Jan Beulich
  2014-08-01 19:15   ` Tim Deegan
  2014-08-01 19:03 ` Tim Deegan
  1 sibling, 1 reply; 9+ messages in thread
From: Jan Beulich @ 2014-08-01 14:27 UTC (permalink / raw)
  To: xen-devel; +Cc: Tim Deegan, Keir Fraser

>>> On 01.08.14 at 15:43, <JBeulich@suse.com> wrote:
> ... to all internally handled MMIO regions. It is in particular the
> HPET page that, e.g. on Windows Server 2012 R2, can get heavily
> accessed, and hence avoiding the unnecessary lookups is rather
> beneficial (in the reported case a 40+-vCPU guest would previously not
> have booted at all while with hvm_hap_nested_page_fault() shortcut
> alone it was able to boot up in 18 minutes [i.e. still room for
> improvement]).

Btw., while I expect the second shortcut to also help a little (I was
only able to functionality test it, as I don't have a big enough system
around to meaningfully test that big a guest), while going through
all the pCPU-s' stack trace snapshots it occurred to me that for
hvm_hap_nested_page_fault()-induced MMIO emulation it is in many
cases quite pointless to "manually" do the VA->GPA translation, since
the handler already gets passed the offending GPA. Of course some
care would need to be taken to e.g. not use this on instructions
having more than one memory operand, or where the memory
operand crosses page boundaries, but all the information needed
for this would be available after decoding the instruction, i.e. well
in time before evaluating instruction operands. Am I overlooking
any other aspect making such an optimization unsafe?

Jan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86/HVM: extend LAPIC shortcuts around P2M lookups
  2014-08-01 13:43 [PATCH] x86/HVM: extend LAPIC shortcuts around P2M lookups Jan Beulich
  2014-08-01 14:27 ` Jan Beulich
@ 2014-08-01 19:03 ` Tim Deegan
  1 sibling, 0 replies; 9+ messages in thread
From: Tim Deegan @ 2014-08-01 19:03 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Keir Fraser

At 14:43 +0100 on 01 Aug (1406900582), Jan Beulich wrote:
> ... to all internally handled MMIO regions. It is in particular the
> HPET page that, e.g. on Windows Server 2012 R2, can get heavily
> accessed, and hence avoiding the unnecessary lookups is rather
> beneficial (in the reported case a 40+-vCPU guest would previously not
> have booted at all while with hvm_hap_nested_page_fault() shortcut
> alone it was able to boot up in 18 minutes [i.e. still room for
> improvement]).
> 
> Note the apparently unrelated addition of a is_hvm_vcpu() check to the
> __hvm_copy() code: Afaict for PVH this shortcut should never have taken
> effect (since there's no LAPIC in that case).
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Tim Deegan <tim@xen.org>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86/HVM: extend LAPIC shortcuts around P2M lookups
  2014-08-01 14:27 ` Jan Beulich
@ 2014-08-01 19:15   ` Tim Deegan
  2014-08-04  7:12     ` Jan Beulich
  0 siblings, 1 reply; 9+ messages in thread
From: Tim Deegan @ 2014-08-01 19:15 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Keir Fraser

At 15:27 +0100 on 01 Aug (1406903251), Jan Beulich wrote:
> >>> On 01.08.14 at 15:43, <JBeulich@suse.com> wrote:
> > ... to all internally handled MMIO regions. It is in particular the
> > HPET page that, e.g. on Windows Server 2012 R2, can get heavily
> > accessed, and hence avoiding the unnecessary lookups is rather
> > beneficial (in the reported case a 40+-vCPU guest would previously not
> > have booted at all while with hvm_hap_nested_page_fault() shortcut
> > alone it was able to boot up in 18 minutes [i.e. still room for
> > improvement]).
> 
> Btw., while I expect the second shortcut to also help a little (I was
> only able to functionality test it, as I don't have a big enough system
> around to meaningfully test that big a guest), while going through
> all the pCPU-s' stack trace snapshots it occurred to me that for
> hvm_hap_nested_page_fault()-induced MMIO emulation it is in many
> cases quite pointless to "manually" do the VA->GPA translation, since
> the handler already gets passed the offending GPA. Of course some
> care would need to be taken to e.g. not use this on instructions
> having more than one memory operand, or where the memory
> operand crosses page boundaries, but all the information needed
> for this would be available after decoding the instruction, i.e. well
> in time before evaluating instruction operands. Am I overlooking
> any other aspect making such an optimization unsafe?

If Xen does its own instruction fetch and decode, then we have to be
careful about reusing any state from the original exit because of
self-modifying code.  (And yes, that is a serious concern -- I once
spent months trying to debug occasional memory corruption in the
self-modifying license-enforcement code on a system stress test
utility.)

So it would be OK to reuse the GPA from the exit if we could verify
that the GVA we see is the same as the original fault (since there can't
have been a TLB flush).  But IIRC the exit doesn't tell us the
original GVA. :(

Tim.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86/HVM: extend LAPIC shortcuts around P2M lookups
  2014-08-01 19:15   ` Tim Deegan
@ 2014-08-04  7:12     ` Jan Beulich
  2014-08-05 19:53       ` Tim Deegan
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Beulich @ 2014-08-04  7:12 UTC (permalink / raw)
  To: Tim Deegan; +Cc: xen-devel, Keir Fraser

>>> On 01.08.14 at 21:15, <tim@xen.org> wrote:
> At 15:27 +0100 on 01 Aug (1406903251), Jan Beulich wrote:
>> Btw., while I expect the second shortcut to also help a little (I was
>> only able to functionality test it, as I don't have a big enough system
>> around to meaningfully test that big a guest), while going through
>> all the pCPU-s' stack trace snapshots it occurred to me that for
>> hvm_hap_nested_page_fault()-induced MMIO emulation it is in many
>> cases quite pointless to "manually" do the VA->GPA translation, since
>> the handler already gets passed the offending GPA. Of course some
>> care would need to be taken to e.g. not use this on instructions
>> having more than one memory operand, or where the memory
>> operand crosses page boundaries, but all the information needed
>> for this would be available after decoding the instruction, i.e. well
>> in time before evaluating instruction operands. Am I overlooking
>> any other aspect making such an optimization unsafe?
> 
> If Xen does its own instruction fetch and decode, then we have to be
> careful about reusing any state from the original exit because of
> self-modifying code.  (And yes, that is a serious concern -- I once
> spent months trying to debug occasional memory corruption in the
> self-modifying license-enforcement code on a system stress test
> utility.)
> 
> So it would be OK to reuse the GPA from the exit if we could verify
> that the GVA we see is the same as the original fault (since there can't
> have been a TLB flush).  But IIRC the exit doesn't tell us the
> original GVA. :(

I don't think it needs to be as strict as this: For one, I wouldn't
intend to use the known GPA for instruction fetches at all. And
then I think if the instruction got modified between the exit and us
doing the emulation, using the known GPA with the wrong
instruction is as good or as bad as emulating an instruction that
didn't originally cause the exit. Furthermore there are sanity
checks we can do, like validating at least the offset into the page
(but yes, that would make eventual problems resulting from this
optimization even more difficult to reproduce/locate, albeit
failures of any such sanity check should probably have a
[conditional] log message associated, so one can spot that we
_would_ have done the optimization otherwise).

Jan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86/HVM: extend LAPIC shortcuts around P2M lookups
  2014-08-04  7:12     ` Jan Beulich
@ 2014-08-05 19:53       ` Tim Deegan
  2014-08-06  8:34         ` Jan Beulich
  0 siblings, 1 reply; 9+ messages in thread
From: Tim Deegan @ 2014-08-05 19:53 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Keir Fraser

At 08:12 +0100 on 04 Aug (1407136337), Jan Beulich wrote:
> >>> On 01.08.14 at 21:15, <tim@xen.org> wrote:
> > If Xen does its own instruction fetch and decode, then we have to be
> > careful about reusing any state from the original exit because of
> > self-modifying code.  (And yes, that is a serious concern -- I once
> > spent months trying to debug occasional memory corruption in the
> > self-modifying license-enforcement code on a system stress test
> > utility.)
> > 
> > So it would be OK to reuse the GPA from the exit if we could verify
> > that the GVA we see is the same as the original fault (since there can't
> > have been a TLB flush).  But IIRC the exit doesn't tell us the
> > original GVA. :(
> 
> I don't think it needs to be as strict as this: For one, I wouldn't
> intend to use the known GPA for instruction fetches at all. And
> then I think if the instruction got modified between the exit and us
> doing the emulation, using the known GPA with the wrong
> instruction is as good or as bad as emulating an instruction that
> didn't originally cause the exit.

Not at all -- as I said, in the shadow code we did see the case where
we emulated a different instruction, and we do our best to handle it.
And at least there we have a clean failure mode: if we can't emulate
we crash.

Using the wrong GPA will silently corrupt memory and carry on, which
is about the worst failure mode a VMM can have (esp. if skipping the
GVA->GPA walk could allow a guest process to write to a read-only
mapping).  

I'd be extremely uncomfortable with anything like tis unless there's a
way to get either the ifetch buffer or a partial decode out of the CPU
(which IIRC can't be done on x86 though it can on ARM).

Tim.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86/HVM: extend LAPIC shortcuts around P2M lookups
  2014-08-05 19:53       ` Tim Deegan
@ 2014-08-06  8:34         ` Jan Beulich
  2014-08-06  9:38           ` Tim Deegan
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Beulich @ 2014-08-06  8:34 UTC (permalink / raw)
  To: Tim Deegan; +Cc: xen-devel, Keir Fraser

>>> On 05.08.14 at 21:53, <tim@xen.org> wrote:
> At 08:12 +0100 on 04 Aug (1407136337), Jan Beulich wrote:
>> >>> On 01.08.14 at 21:15, <tim@xen.org> wrote:
>> > If Xen does its own instruction fetch and decode, then we have to be
>> > careful about reusing any state from the original exit because of
>> > self-modifying code.  (And yes, that is a serious concern -- I once
>> > spent months trying to debug occasional memory corruption in the
>> > self-modifying license-enforcement code on a system stress test
>> > utility.)
>> > 
>> > So it would be OK to reuse the GPA from the exit if we could verify
>> > that the GVA we see is the same as the original fault (since there can't
>> > have been a TLB flush).  But IIRC the exit doesn't tell us the
>> > original GVA. :(
>> 
>> I don't think it needs to be as strict as this: For one, I wouldn't
>> intend to use the known GPA for instruction fetches at all. And
>> then I think if the instruction got modified between the exit and us
>> doing the emulation, using the known GPA with the wrong
>> instruction is as good or as bad as emulating an instruction that
>> didn't originally cause the exit.
> 
> Not at all -- as I said, in the shadow code we did see the case where
> we emulated a different instruction, and we do our best to handle it.
> And at least there we have a clean failure mode: if we can't emulate
> we crash.
> 
> Using the wrong GPA will silently corrupt memory and carry on, which
> is about the worst failure mode a VMM can have (esp. if skipping the
> GVA->GPA walk could allow a guest process to write to a read-only
> mapping).  

Indeed, thinking about it again I agree. Fortunately it looks like we're
having ways to accelerate this nevertheless: On EPT, the handler
gets the linear address, we just need to make use of it. I just finished
drafting a respective patch - hopefully I'll get to trying it out later
today.

> I'd be extremely uncomfortable with anything like tis unless there's a
> way to get either the ifetch buffer or a partial decode out of the CPU
> (which IIRC can't be done on x86 though it can on ARM).

On NPT we also get the instruction bytes on nested page faults, at
least on newer hardware. So maybe we could cook up something
along the lines you indicate by flagging that the instruction bytes
came from hardware.

Jan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86/HVM: extend LAPIC shortcuts around P2M lookups
  2014-08-06  8:34         ` Jan Beulich
@ 2014-08-06  9:38           ` Tim Deegan
  2014-08-11 12:26             ` Jan Beulich
  0 siblings, 1 reply; 9+ messages in thread
From: Tim Deegan @ 2014-08-06  9:38 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Keir Fraser

At 09:34 +0100 on 06 Aug (1407314042), Jan Beulich wrote:
> >>> On 05.08.14 at 21:53, <tim@xen.org> wrote:
> > At 08:12 +0100 on 04 Aug (1407136337), Jan Beulich wrote:
> >> >>> On 01.08.14 at 21:15, <tim@xen.org> wrote:
> >> > If Xen does its own instruction fetch and decode, then we have to be
> >> > careful about reusing any state from the original exit because of
> >> > self-modifying code.  (And yes, that is a serious concern -- I once
> >> > spent months trying to debug occasional memory corruption in the
> >> > self-modifying license-enforcement code on a system stress test
> >> > utility.)
> >> > 
> >> > So it would be OK to reuse the GPA from the exit if we could verify
> >> > that the GVA we see is the same as the original fault (since there can't
> >> > have been a TLB flush).  But IIRC the exit doesn't tell us the
> >> > original GVA. :(
> >> 
> >> I don't think it needs to be as strict as this: For one, I wouldn't
> >> intend to use the known GPA for instruction fetches at all. And
> >> then I think if the instruction got modified between the exit and us
> >> doing the emulation, using the known GPA with the wrong
> >> instruction is as good or as bad as emulating an instruction that
> >> didn't originally cause the exit.
> > 
> > Not at all -- as I said, in the shadow code we did see the case where
> > we emulated a different instruction, and we do our best to handle it.
> > And at least there we have a clean failure mode: if we can't emulate
> > we crash.
> > 
> > Using the wrong GPA will silently corrupt memory and carry on, which
> > is about the worst failure mode a VMM can have (esp. if skipping the
> > GVA->GPA walk could allow a guest process to write to a read-only
> > mapping).  
> 
> Indeed, thinking about it again I agree. Fortunately it looks like we're
> having ways to accelerate this nevertheless: On EPT, the handler
> gets the linear address, we just need to make use of it. I just finished
> drafting a respective patch - hopefully I'll get to trying it out later
> today.
> 
> > I'd be extremely uncomfortable with anything like tis unless there's a
> > way to get either the ifetch buffer or a partial decode out of the CPU
> > (which IIRC can't be done on x86 though it can on ARM).
> 
> On NPT we also get the instruction bytes on nested page faults, at
> least on newer hardware. So maybe we could cook up something
> along the lines you indicate by flagging that the instruction bytes
> came from hardware.

Oh good -- yes, both of those approaches sound very encouraging.

Tim.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86/HVM: extend LAPIC shortcuts around P2M lookups
  2014-08-06  9:38           ` Tim Deegan
@ 2014-08-11 12:26             ` Jan Beulich
  0 siblings, 0 replies; 9+ messages in thread
From: Jan Beulich @ 2014-08-11 12:26 UTC (permalink / raw)
  To: Tim Deegan; +Cc: xen-devel, Keir Fraser

>>> On 06.08.14 at 11:38, <tim@xen.org> wrote:
> At 09:34 +0100 on 06 Aug (1407314042), Jan Beulich wrote:
>> >>> On 05.08.14 at 21:53, <tim@xen.org> wrote:
>> > I'd be extremely uncomfortable with anything like tis unless there's a
>> > way to get either the ifetch buffer or a partial decode out of the CPU
>> > (which IIRC can't be done on x86 though it can on ARM).
>> 
>> On NPT we also get the instruction bytes on nested page faults, at
>> least on newer hardware. So maybe we could cook up something
>> along the lines you indicate by flagging that the instruction bytes
>> came from hardware.
> 
> Oh good -- yes, both of those approaches sound very encouraging.

Actually after some more thinking I concluded that on NPT we can't
go the outlined route: We can't distinguish primary memory accesses
from implicit ones (descriptor tables, TSS I/O permission bit map), and
hence can't deduce from just the instruction bytes having come from
hardware that a certain GPA can be used without first translating the
GLA.

Anyway, the performance effect of the changes we have so far
seem to help the AMD side enough to now perform better than the
Intel variant, despite the (not yet formally posted) EPT related
patch obviously not possibly having any positive effect (and it is
for the reason of wanting to be really certain that I'm not
introducing a regression making things appear to perform better
that I still didn't post that final patch; of course the conflict with
Tamas's work also makes it less than ideal to post right now).

Jan

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-08-11 12:26 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-01 13:43 [PATCH] x86/HVM: extend LAPIC shortcuts around P2M lookups Jan Beulich
2014-08-01 14:27 ` Jan Beulich
2014-08-01 19:15   ` Tim Deegan
2014-08-04  7:12     ` Jan Beulich
2014-08-05 19:53       ` Tim Deegan
2014-08-06  8:34         ` Jan Beulich
2014-08-06  9:38           ` Tim Deegan
2014-08-11 12:26             ` Jan Beulich
2014-08-01 19:03 ` Tim Deegan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.