All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC for-4.9 0/2] x86/pagewalk: Further bugfixes to pagetable walking
@ 2017-05-26 17:03 Andrew Cooper
  2017-05-26 17:03 ` [PATCH 1/2] Revert "x86/hvm: disable pkeys for guests in non-paging mode" Andrew Cooper
                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Andrew Cooper @ 2017-05-26 17:03 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Julien Grall, Jan Beulich

These bugfixes finally allow my comprehensive XTF test (following several
bugfixes, and the added feature of the 0-level pagetable tests) to complete
successfully on a Skylake Server system, with PKU.

I know this is getting very tight to the line on 4.9, but it would be nice to
get these fixes in.

Andrew Cooper (2):
  Revert "x86/hvm: disable pkeys for guests in non-paging mode"
  x86/pagewalk: Fix pagewalk's handling of instruction fetches

 xen/arch/x86/hvm/vmx/vmx.c   | 11 +++++------
 xen/arch/x86/mm/guest_walk.c | 22 +++++++++-------------
 2 files changed, 14 insertions(+), 19 deletions(-)

-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/2] Revert "x86/hvm: disable pkeys for guests in non-paging mode"
  2017-05-26 17:03 [PATCH RFC for-4.9 0/2] x86/pagewalk: Further bugfixes to pagetable walking Andrew Cooper
@ 2017-05-26 17:03 ` Andrew Cooper
  2017-05-29  8:48   ` Jan Beulich
  2017-05-31  7:09   ` Han, Huaitong
  2017-05-26 17:03 ` [PATCH 2/2] x86/pagewalk: Fix pagewalk's handling of instruction fetches Andrew Cooper
  2017-06-01 17:55 ` [PATCH RFC for-4.9 0/2] x86/pagewalk: Further bugfixes to pagetable walking Julien Grall
  2 siblings, 2 replies; 21+ messages in thread
From: Andrew Cooper @ 2017-05-26 17:03 UTC (permalink / raw)
  To: Xen-devel
  Cc: Andrew Cooper, Kevin Tian, Jun Nakajima, Jan Beulich, Huaitong Han

This reverts commit c41e0266dd59ab50b7a153157e9bd2a3ad114b53.

When determining Access Rights, Protection Keys only take effect when CR4.PKE
it set, and 4-level paging is active.  All other circumstances (notibly, 32bit
PAE paging) skip the Protection Key control mechanism.

Therefore, we do not need to clear CR4.PKE behind the back of a guest which is
not using paging, as such a guest is necesserily running with EFER.LME
disabled.

The {RD,WR}PKRU instructions are specified as being legal for use in any
operating mode, but only if CR4.PKE is set.  By clearing CR4.PKE behind the
back of an unpaged guest, these instructions yield #UD despite the guest
seeing PKE set if it reads CR4, and OSPKE being visible in CPUID.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Jun Nakajima <jun.nakajima@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
CC: Huaitong Han <huaitong.han@intel.com>
---
 xen/arch/x86/hvm/vmx/vmx.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index c8ef18a..58552c3 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1673,13 +1673,12 @@ static void vmx_update_guest_cr(struct vcpu *v, unsigned int cr)
         if ( !hvm_paging_enabled(v) )
         {
             /*
-             * SMEP/SMAP/PKU is disabled if CPU is in non-paging mode in
-             * hardware. However Xen always uses paging mode to emulate guest
-             * non-paging mode. To emulate this behavior, SMEP/SMAP/PKU needs
-             * to be manually disabled when guest VCPU is in non-paging mode.
+             * SMEP/SMAP is disabled if CPU is in non-paging mode in hardware.
+             * However Xen always uses paging mode to emulate guest non-paging
+             * mode. To emulate this behavior, SMEP/SMAP needs to be manually
+             * disabled when guest VCPU is in non-paging mode.
              */
-            v->arch.hvm_vcpu.hw_cr[4] &=
-                ~(X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_PKE);
+            v->arch.hvm_vcpu.hw_cr[4] &= ~(X86_CR4_SMEP | X86_CR4_SMAP);
         }
         __vmwrite(GUEST_CR4, v->arch.hvm_vcpu.hw_cr[4]);
         break;
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/2] x86/pagewalk: Fix pagewalk's handling of instruction fetches
  2017-05-26 17:03 [PATCH RFC for-4.9 0/2] x86/pagewalk: Further bugfixes to pagetable walking Andrew Cooper
  2017-05-26 17:03 ` [PATCH 1/2] Revert "x86/hvm: disable pkeys for guests in non-paging mode" Andrew Cooper
@ 2017-05-26 17:03 ` Andrew Cooper
  2017-05-29  8:58   ` Jan Beulich
  2017-06-01 17:55 ` [PATCH RFC for-4.9 0/2] x86/pagewalk: Further bugfixes to pagetable walking Julien Grall
  2 siblings, 1 reply; 21+ messages in thread
From: Andrew Cooper @ 2017-05-26 17:03 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Tim Deegan, Jan Beulich

Despite the claim in the comment (which was based partly on the code already
being like that, and mistaken reasoning because of Xen leaking NX into guest
context), reality differs.

Use of the SMAP feature without NX, or in a 2-level guest, demonstrate an
observable difference between reads and instruction fetches, despite
PFEC_insn_fetch not being reported in the #PF error code.  This demonstrates
that instruction fetches are distinguished from data reads even without
PFEC_insn_fetch being reported.

Alter the pagewalk logic to keep the pagewalk insn_fetch input intact, but
only conditionally report insn_fetch in the error code.  This logic is more
in line with the Intel SDM text:

 * I/D flag (bit 4).
   This flag is 1 if (1) the access causing the page-fault exception was an
   instruction fetch; and (2) either (a) CR4.SMEP = 1; or (b) both (i) CR4.PAE
   = 1 (either PAE paging or 4-level paging is in use); and (ii) IA32_EFER.NXE
   = 1. Otherwise, the flag is 0. This flag describes the access causing the
   page-fault exception, not the access rights specified by paging.

and the AMD SDM text:

 * I/D - Bit 4. If this bit is set to 1, it indicates that the access that
   caused the page fault was an instruction fetch. Otherwise, this bit is
   cleared to 0. This bit is only defined if no-execute feature is enabled
   (EFER.NXE=1 && CR4.PAE=1).

Curiously, the AMD manual doesn't mention SMEP despite some Fam16h processors
and all Fam17h processors supporting it.  Experimentally, it behaves as
described by Intel.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Tim Deegan <tim@xen.org>
CC: George Dunlap <george.dunlap@eu.citrix.com>
---
 xen/arch/x86/mm/guest_walk.c | 22 +++++++++-------------
 1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/xen/arch/x86/mm/guest_walk.c b/xen/arch/x86/mm/guest_walk.c
index 5c6a85b..972364f 100644
--- a/xen/arch/x86/mm/guest_walk.c
+++ b/xen/arch/x86/mm/guest_walk.c
@@ -114,22 +114,18 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
     ASSERT(!(walk & PFEC_implicit) ||
            !(walk & (PFEC_insn_fetch | PFEC_user_mode)));
 
-    /*
-     * PFEC_insn_fetch is only used as an input to pagetable walking if NX or
-     * SMEP are enabled.  Otherwise, instruction fetches are indistinguishable
-     * from data reads.
-     *
-     * This property can be demonstrated on real hardware by having NX and
-     * SMEP inactive, but SMAP active, and observing that EFLAGS.AC determines
-     * whether a pagefault occures for supervisor execution on user mappings.
-     */
-    if ( !(guest_nx_enabled(v) || guest_smep_enabled(v)) )
-        walk &= ~PFEC_insn_fetch;
-
     perfc_incr(guest_walk);
     memset(gw, 0, sizeof(*gw));
     gw->va = va;
-    gw->pfec = walk & (PFEC_insn_fetch | PFEC_user_mode | PFEC_write_access);
+    gw->pfec = walk & (PFEC_user_mode | PFEC_write_access);
+
+    /*
+     * PFEC_insn_fetch is only reported if NX or SMEP are enabled.  Hardware
+     * still distingueses instruction fetches during determination of access
+     * rights.
+     */
+    if ( guest_nx_enabled(v) || guest_smep_enabled(v) )
+        gw->pfec |= (walk & PFEC_insn_fetch);
 
 #if GUEST_PAGING_LEVELS >= 3 /* PAE or 64... */
 #if GUEST_PAGING_LEVELS >= 4 /* 64-bit only... */
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] Revert "x86/hvm: disable pkeys for guests in non-paging mode"
  2017-05-26 17:03 ` [PATCH 1/2] Revert "x86/hvm: disable pkeys for guests in non-paging mode" Andrew Cooper
@ 2017-05-29  8:48   ` Jan Beulich
  2017-05-31  7:09   ` Han, Huaitong
  1 sibling, 0 replies; 21+ messages in thread
From: Jan Beulich @ 2017-05-29  8:48 UTC (permalink / raw)
  To: Andrew Cooper, Huaitong Han; +Cc: Kevin Tian, Jun Nakajima, Xen-devel

>>> On 26.05.17 at 19:03, <andrew.cooper3@citrix.com> wrote:
> This reverts commit c41e0266dd59ab50b7a153157e9bd2a3ad114b53.
> 
> When determining Access Rights, Protection Keys only take effect when CR4.PKE
> it set, and 4-level paging is active.  All other circumstances (notibly, 32bit
> PAE paging) skip the Protection Key control mechanism.
> 
> Therefore, we do not need to clear CR4.PKE behind the back of a guest which is
> not using paging, as such a guest is necesserily running with EFER.LME
> disabled.

DYM EFER.LMA here?

> The {RD,WR}PKRU instructions are specified as being legal for use in any
> operating mode, but only if CR4.PKE is set.  By clearing CR4.PKE behind the
> back of an unpaged guest, these instructions yield #UD despite the guest
> seeing PKE set if it reads CR4, and OSPKE being visible in CPUID.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

I would like to get clarification from Huaitong, however, on the
reason for the original change: According to the reasoning here,
it shouldn't have been an observed failure of some kind, but
merely the thinking that something may be wrong (but really
wasn't).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] x86/pagewalk: Fix pagewalk's handling of instruction fetches
  2017-05-26 17:03 ` [PATCH 2/2] x86/pagewalk: Fix pagewalk's handling of instruction fetches Andrew Cooper
@ 2017-05-29  8:58   ` Jan Beulich
  2017-05-29  9:03     ` Andrew Cooper
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Beulich @ 2017-05-29  8:58 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: George Dunlap, Tim Deegan, Xen-devel

>>> On 26.05.17 at 19:03, <andrew.cooper3@citrix.com> wrote:
> --- a/xen/arch/x86/mm/guest_walk.c
> +++ b/xen/arch/x86/mm/guest_walk.c
> @@ -114,22 +114,18 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
>      ASSERT(!(walk & PFEC_implicit) ||
>             !(walk & (PFEC_insn_fetch | PFEC_user_mode)));
>  
> -    /*
> -     * PFEC_insn_fetch is only used as an input to pagetable walking if NX or
> -     * SMEP are enabled.  Otherwise, instruction fetches are indistinguishable
> -     * from data reads.
> -     *
> -     * This property can be demonstrated on real hardware by having NX and
> -     * SMEP inactive, but SMAP active, and observing that EFLAGS.AC determines
> -     * whether a pagefault occures for supervisor execution on user mappings.
> -     */
> -    if ( !(guest_nx_enabled(v) || guest_smep_enabled(v)) )
> -        walk &= ~PFEC_insn_fetch;
> -
>      perfc_incr(guest_walk);
>      memset(gw, 0, sizeof(*gw));
>      gw->va = va;
> -    gw->pfec = walk & (PFEC_insn_fetch | PFEC_user_mode | PFEC_write_access);
> +    gw->pfec = walk & (PFEC_user_mode | PFEC_write_access);
> +
> +    /*
> +     * PFEC_insn_fetch is only reported if NX or SMEP are enabled.  Hardware
> +     * still distingueses instruction fetches during determination of access
> +     * rights.
> +     */
> +    if ( guest_nx_enabled(v) || guest_smep_enabled(v) )
> +        gw->pfec |= (walk & PFEC_insn_fetch);
>  
>  #if GUEST_PAGING_LEVELS >= 3 /* PAE or 64... */
>  #if GUEST_PAGING_LEVELS >= 4 /* 64-bit only... */

Don't you another adjustment to

    if ( (walk & PFEC_insn_fetch) && (ar & _PAGE_NX_BIT) )
        /* Requested an instruction fetch and found NX? Fail. */
        goto out;

I can't see anything that would keep _PAGE_NX_BIT out of
ar if NX is not enabled.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] x86/pagewalk: Fix pagewalk's handling of instruction fetches
  2017-05-29  8:58   ` Jan Beulich
@ 2017-05-29  9:03     ` Andrew Cooper
  2017-05-29  9:15       ` Jan Beulich
  0 siblings, 1 reply; 21+ messages in thread
From: Andrew Cooper @ 2017-05-29  9:03 UTC (permalink / raw)
  To: Jan Beulich; +Cc: George Dunlap, Tim Deegan, Xen-devel

On 29/05/2017 09:58, Jan Beulich wrote:
>>>> On 26.05.17 at 19:03, <andrew.cooper3@citrix.com> wrote:
>> --- a/xen/arch/x86/mm/guest_walk.c
>> +++ b/xen/arch/x86/mm/guest_walk.c
>> @@ -114,22 +114,18 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
>>      ASSERT(!(walk & PFEC_implicit) ||
>>             !(walk & (PFEC_insn_fetch | PFEC_user_mode)));
>>  
>> -    /*
>> -     * PFEC_insn_fetch is only used as an input to pagetable walking if NX or
>> -     * SMEP are enabled.  Otherwise, instruction fetches are indistinguishable
>> -     * from data reads.
>> -     *
>> -     * This property can be demonstrated on real hardware by having NX and
>> -     * SMEP inactive, but SMAP active, and observing that EFLAGS.AC determines
>> -     * whether a pagefault occures for supervisor execution on user mappings.
>> -     */
>> -    if ( !(guest_nx_enabled(v) || guest_smep_enabled(v)) )
>> -        walk &= ~PFEC_insn_fetch;
>> -
>>      perfc_incr(guest_walk);
>>      memset(gw, 0, sizeof(*gw));
>>      gw->va = va;
>> -    gw->pfec = walk & (PFEC_insn_fetch | PFEC_user_mode | PFEC_write_access);
>> +    gw->pfec = walk & (PFEC_user_mode | PFEC_write_access);
>> +
>> +    /*
>> +     * PFEC_insn_fetch is only reported if NX or SMEP are enabled.  Hardware
>> +     * still distingueses instruction fetches during determination of access
>> +     * rights.
>> +     */
>> +    if ( guest_nx_enabled(v) || guest_smep_enabled(v) )
>> +        gw->pfec |= (walk & PFEC_insn_fetch);
>>  
>>  #if GUEST_PAGING_LEVELS >= 3 /* PAE or 64... */
>>  #if GUEST_PAGING_LEVELS >= 4 /* 64-bit only... */
> Don't you another adjustment to
>
>     if ( (walk & PFEC_insn_fetch) && (ar & _PAGE_NX_BIT) )
>         /* Requested an instruction fetch and found NX? Fail. */
>         goto out;
>
> I can't see anything that would keep _PAGE_NX_BIT out of
> ar if NX is not enabled.

_PAGE_NX_BIT is reserved if NX is not enabled, and is accounted for in
guest_rsvd_bits() in guest_pt.h, and we never hit the access rights logic.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] x86/pagewalk: Fix pagewalk's handling of instruction fetches
  2017-05-29  9:03     ` Andrew Cooper
@ 2017-05-29  9:15       ` Jan Beulich
  2017-06-01 10:19         ` Andrew Cooper
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Beulich @ 2017-05-29  9:15 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: George Dunlap, Tim Deegan, Xen-devel

>>> On 29.05.17 at 11:03, <andrew.cooper3@citrix.com> wrote:
> On 29/05/2017 09:58, Jan Beulich wrote:
>>>>> On 26.05.17 at 19:03, <andrew.cooper3@citrix.com> wrote:
>>> --- a/xen/arch/x86/mm/guest_walk.c
>>> +++ b/xen/arch/x86/mm/guest_walk.c
>>> @@ -114,22 +114,18 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
>>>      ASSERT(!(walk & PFEC_implicit) ||
>>>             !(walk & (PFEC_insn_fetch | PFEC_user_mode)));
>>>  
>>> -    /*
>>> -     * PFEC_insn_fetch is only used as an input to pagetable walking if NX or
>>> -     * SMEP are enabled.  Otherwise, instruction fetches are indistinguishable
>>> -     * from data reads.
>>> -     *
>>> -     * This property can be demonstrated on real hardware by having NX and
>>> -     * SMEP inactive, but SMAP active, and observing that EFLAGS.AC determines
>>> -     * whether a pagefault occures for supervisor execution on user mappings.
>>> -     */
>>> -    if ( !(guest_nx_enabled(v) || guest_smep_enabled(v)) )
>>> -        walk &= ~PFEC_insn_fetch;
>>> -
>>>      perfc_incr(guest_walk);
>>>      memset(gw, 0, sizeof(*gw));
>>>      gw->va = va;
>>> -    gw->pfec = walk & (PFEC_insn_fetch | PFEC_user_mode | PFEC_write_access);
>>> +    gw->pfec = walk & (PFEC_user_mode | PFEC_write_access);
>>> +
>>> +    /*
>>> +     * PFEC_insn_fetch is only reported if NX or SMEP are enabled.  Hardware
>>> +     * still distingueses instruction fetches during determination of access
>>> +     * rights.
>>> +     */
>>> +    if ( guest_nx_enabled(v) || guest_smep_enabled(v) )
>>> +        gw->pfec |= (walk & PFEC_insn_fetch);
>>>  
>>>  #if GUEST_PAGING_LEVELS >= 3 /* PAE or 64... */
>>>  #if GUEST_PAGING_LEVELS >= 4 /* 64-bit only... */
>> Don't you another adjustment to
>>
>>     if ( (walk & PFEC_insn_fetch) && (ar & _PAGE_NX_BIT) )
>>         /* Requested an instruction fetch and found NX? Fail. */
>>         goto out;
>>
>> I can't see anything that would keep _PAGE_NX_BIT out of
>> ar if NX is not enabled.
> 
> _PAGE_NX_BIT is reserved if NX is not enabled, and is accounted for in
> guest_rsvd_bits() in guest_pt.h, and we never hit the access rights logic.

Ah, right. But perhaps worth having a respective ASSERT()
here, at once serving as documentation? In any event
Reviewed-by: Jan Beulich <jbeulich@suse.com>

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] Revert "x86/hvm: disable pkeys for guests in non-paging mode"
  2017-05-26 17:03 ` [PATCH 1/2] Revert "x86/hvm: disable pkeys for guests in non-paging mode" Andrew Cooper
  2017-05-29  8:48   ` Jan Beulich
@ 2017-05-31  7:09   ` Han, Huaitong
  2017-05-31  7:44     ` Andrew Cooper
  1 sibling, 1 reply; 21+ messages in thread
From: Han, Huaitong @ 2017-05-31  7:09 UTC (permalink / raw)
  To: andrew.cooper3; +Cc: Tian, Kevin, Nakajima, Jun, JBeulich, xen-devel

On Fri, 2017-05-26 at 18:03 +0100, Andrew Cooper wrote:
> This reverts commit c41e0266dd59ab50b7a153157e9bd2a3ad114b53.
> 
> When determining Access Rights, Protection Keys only take effect when CR4.PKE
> it set, and 4-level paging is active.  All other circumstances (notibly, 32bit
> PAE paging) skip the Protection Key control mechanism.
> 
> Therefore, we do not need to clear CR4.PKE behind the back of a guest which is
> not using paging, as such a guest is necesserily running with EFER.LME
> disabled.

Yes, if EFER.LME = 0, Protection Keys would take no effect too, so it
isn't necessary to clear CR4.PKE in non-paging mode.

> 
> The {RD,WR}PKRU instructions are specified as being legal for use in any
> operating mode, but only if CR4.PKE is set.  By clearing CR4.PKE behind the
> back of an unpaged guest, these instructions yield #UD despite the guest
> seeing PKE set if it reads CR4, and OSPKE being visible in CPUID.
If CR4.PKE is cleared, OSPKE would be invisible at the same time. When
guest does set CR4_PKE in non-paging mode, then CR4_PKE would be cleared
in vmcs loading, so, OSPKE should be always invisible, and #UD should
not be yielded too.

 
Reviewed-by: Huaitong Han <huaitong.han@intel.com>

> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Jun Nakajima <jun.nakajima@intel.com>
> CC: Kevin Tian <kevin.tian@intel.com>
> CC: Huaitong Han <huaitong.han@intel.com>
> ---
>  xen/arch/x86/hvm/vmx/vmx.c | 11 +++++------
>  1 file changed, 5 insertions(+), 6 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index c8ef18a..58552c3 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -1673,13 +1673,12 @@ static void vmx_update_guest_cr(struct vcpu *v, unsigned int cr)
>          if ( !hvm_paging_enabled(v) )
>          {
>              /*
> -             * SMEP/SMAP/PKU is disabled if CPU is in non-paging mode in
> -             * hardware. However Xen always uses paging mode to emulate guest
> -             * non-paging mode. To emulate this behavior, SMEP/SMAP/PKU needs
> -             * to be manually disabled when guest VCPU is in non-paging mode.
> +             * SMEP/SMAP is disabled if CPU is in non-paging mode in hardware.
> +             * However Xen always uses paging mode to emulate guest non-paging
> +             * mode. To emulate this behavior, SMEP/SMAP needs to be manually
> +             * disabled when guest VCPU is in non-paging mode.
>               */
> -            v->arch.hvm_vcpu.hw_cr[4] &=
> -                ~(X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_PKE);
> +            v->arch.hvm_vcpu.hw_cr[4] &= ~(X86_CR4_SMEP | X86_CR4_SMAP);
>          }
>          __vmwrite(GUEST_CR4, v->arch.hvm_vcpu.hw_cr[4]);
>          break;

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] Revert "x86/hvm: disable pkeys for guests in non-paging mode"
  2017-05-31  7:09   ` Han, Huaitong
@ 2017-05-31  7:44     ` Andrew Cooper
  2017-05-31  7:56       ` Jan Beulich
  2017-05-31  8:14       ` Han, Huaitong
  0 siblings, 2 replies; 21+ messages in thread
From: Andrew Cooper @ 2017-05-31  7:44 UTC (permalink / raw)
  To: Han, Huaitong; +Cc: Tian, Kevin, Nakajima, Jun, JBeulich, xen-devel

On 31/05/2017 08:09, Han, Huaitong wrote:
> On Fri, 2017-05-26 at 18:03 +0100, Andrew Cooper wrote:
>> This reverts commit c41e0266dd59ab50b7a153157e9bd2a3ad114b53.
>>
>> When determining Access Rights, Protection Keys only take effect when CR4.PKE
>> it set, and 4-level paging is active.  All other circumstances (notibly, 32bit
>> PAE paging) skip the Protection Key control mechanism.
>>
>> Therefore, we do not need to clear CR4.PKE behind the back of a guest which is
>> not using paging, as such a guest is necesserily running with EFER.LME
>> disabled.
> Yes, if EFER.LME = 0, Protection Keys would take no effect too, so it
> isn't necessary to clear CR4.PKE in non-paging mode.
>
>> The {RD,WR}PKRU instructions are specified as being legal for use in any
>> operating mode, but only if CR4.PKE is set.  By clearing CR4.PKE behind the
>> back of an unpaged guest, these instructions yield #UD despite the guest
>> seeing PKE set if it reads CR4, and OSPKE being visible in CPUID.
> If CR4.PKE is cleared, OSPKE would be invisible at the same time. When
> guest does set CR4_PKE in non-paging mode, then CR4_PKE would be cleared
> in vmcs loading, so, OSPKE should be always invisible, and #UD should
> not be yielded too.

Remember that for HVM guests, Xen calculates OSPKE in software; it never
comes from hardware, as CPUID is an automatic VMEXIT.

The CPUID code uses the same source of information as a read from cr4,
so comes to the conclusion that OSPKE should be visible.

Therefore, when the guest looks at CPUID, it sees OSPKE set even though
hardware would come to the opposite conclusion.

> Reviewed-by: Huaitong Han <huaitong.han@intel.com>

Does this stand despite the OSPKE issue?

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] Revert "x86/hvm: disable pkeys for guests in non-paging mode"
  2017-05-31  7:44     ` Andrew Cooper
@ 2017-05-31  7:56       ` Jan Beulich
  2017-05-31  8:06         ` Andrew Cooper
  2017-05-31  8:14       ` Han, Huaitong
  1 sibling, 1 reply; 21+ messages in thread
From: Jan Beulich @ 2017-05-31  7:56 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Huaitong Han, Kevin Tian, Jun Nakajima, xen-devel

>>> On 31.05.17 at 09:44, <andrew.cooper3@citrix.com> wrote:
> On 31/05/2017 08:09, Han, Huaitong wrote:
>> On Fri, 2017-05-26 at 18:03 +0100, Andrew Cooper wrote:
>>> This reverts commit c41e0266dd59ab50b7a153157e9bd2a3ad114b53.
>>>
>>> When determining Access Rights, Protection Keys only take effect when 
> CR4.PKE
>>> it set, and 4-level paging is active.  All other circumstances (notibly, 
> 32bit
>>> PAE paging) skip the Protection Key control mechanism.
>>>
>>> Therefore, we do not need to clear CR4.PKE behind the back of a guest which 
> is
>>> not using paging, as such a guest is necesserily running with EFER.LME
>>> disabled.
>> Yes, if EFER.LME = 0, Protection Keys would take no effect too, so it
>> isn't necessary to clear CR4.PKE in non-paging mode.
>>
>>> The {RD,WR}PKRU instructions are specified as being legal for use in any
>>> operating mode, but only if CR4.PKE is set.  By clearing CR4.PKE behind the
>>> back of an unpaged guest, these instructions yield #UD despite the guest
>>> seeing PKE set if it reads CR4, and OSPKE being visible in CPUID.
>> If CR4.PKE is cleared, OSPKE would be invisible at the same time. When
>> guest does set CR4_PKE in non-paging mode, then CR4_PKE would be cleared
>> in vmcs loading, so, OSPKE should be always invisible, and #UD should
>> not be yielded too.
> 
> Remember that for HVM guests, Xen calculates OSPKE in software; it never
> comes from hardware, as CPUID is an automatic VMEXIT.
> 
> The CPUID code uses the same source of information as a read from cr4,
> so comes to the conclusion that OSPKE should be visible.
> 
> Therefore, when the guest looks at CPUID, it sees OSPKE set even though
> hardware would come to the opposite conclusion.

Shouldn't we correct this (independent of the patch here)?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] Revert "x86/hvm: disable pkeys for guests in non-paging mode"
  2017-05-31  7:56       ` Jan Beulich
@ 2017-05-31  8:06         ` Andrew Cooper
  2017-05-31  8:12           ` Jan Beulich
  0 siblings, 1 reply; 21+ messages in thread
From: Andrew Cooper @ 2017-05-31  8:06 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Huaitong Han, Kevin Tian, Jun Nakajima, xen-devel

On 31/05/2017 08:56, Jan Beulich wrote:
>>>> On 31.05.17 at 09:44, <andrew.cooper3@citrix.com> wrote:
>> On 31/05/2017 08:09, Han, Huaitong wrote:
>>> On Fri, 2017-05-26 at 18:03 +0100, Andrew Cooper wrote:
>>>> This reverts commit c41e0266dd59ab50b7a153157e9bd2a3ad114b53.
>>>>
>>>> When determining Access Rights, Protection Keys only take effect when 
>> CR4.PKE
>>>> it set, and 4-level paging is active.  All other circumstances (notibly, 
>> 32bit
>>>> PAE paging) skip the Protection Key control mechanism.
>>>>
>>>> Therefore, we do not need to clear CR4.PKE behind the back of a guest which 
>> is
>>>> not using paging, as such a guest is necesserily running with EFER.LME
>>>> disabled.
>>> Yes, if EFER.LME = 0, Protection Keys would take no effect too, so it
>>> isn't necessary to clear CR4.PKE in non-paging mode.
>>>
>>>> The {RD,WR}PKRU instructions are specified as being legal for use in any
>>>> operating mode, but only if CR4.PKE is set.  By clearing CR4.PKE behind the
>>>> back of an unpaged guest, these instructions yield #UD despite the guest
>>>> seeing PKE set if it reads CR4, and OSPKE being visible in CPUID.
>>> If CR4.PKE is cleared, OSPKE would be invisible at the same time. When
>>> guest does set CR4_PKE in non-paging mode, then CR4_PKE would be cleared
>>> in vmcs loading, so, OSPKE should be always invisible, and #UD should
>>> not be yielded too.
>> Remember that for HVM guests, Xen calculates OSPKE in software; it never
>> comes from hardware, as CPUID is an automatic VMEXIT.
>>
>> The CPUID code uses the same source of information as a read from cr4,
>> so comes to the conclusion that OSPKE should be visible.
>>
>> Therefore, when the guest looks at CPUID, it sees OSPKE set even though
>> hardware would come to the opposite conclusion.
> Shouldn't we correct this (independent of the patch here)?

No, I don't think so.  That would involve the generic cpuid code looking
at GUEST_CR4 and making decisions contrary to what is described in the
manuals.

Besides, it very definitely should be visible in a read of CR4 (because
the guest did really set it), which means OSPKE should be visible in CPUID.

This corner case (along with others) will soon be regression tested in
the (newly-introduced) 0-level paging pagetable-emulation XTF test,
which I put together after stumbling blindly into this particular #UD
while investigating a different issue.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] Revert "x86/hvm: disable pkeys for guests in non-paging mode"
  2017-05-31  8:06         ` Andrew Cooper
@ 2017-05-31  8:12           ` Jan Beulich
  0 siblings, 0 replies; 21+ messages in thread
From: Jan Beulich @ 2017-05-31  8:12 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Huaitong Han, Kevin Tian, Jun Nakajima, xen-devel

>>> On 31.05.17 at 10:06, <andrew.cooper3@citrix.com> wrote:
> On 31/05/2017 08:56, Jan Beulich wrote:
>>>>> On 31.05.17 at 09:44, <andrew.cooper3@citrix.com> wrote:
>>> On 31/05/2017 08:09, Han, Huaitong wrote:
>>>> On Fri, 2017-05-26 at 18:03 +0100, Andrew Cooper wrote:
>>>>> This reverts commit c41e0266dd59ab50b7a153157e9bd2a3ad114b53.
>>>>>
>>>>> When determining Access Rights, Protection Keys only take effect when 
>>> CR4.PKE
>>>>> it set, and 4-level paging is active.  All other circumstances (notibly, 
>>> 32bit
>>>>> PAE paging) skip the Protection Key control mechanism.
>>>>>
>>>>> Therefore, we do not need to clear CR4.PKE behind the back of a guest which 
>>> is
>>>>> not using paging, as such a guest is necesserily running with EFER.LME
>>>>> disabled.
>>>> Yes, if EFER.LME = 0, Protection Keys would take no effect too, so it
>>>> isn't necessary to clear CR4.PKE in non-paging mode.
>>>>
>>>>> The {RD,WR}PKRU instructions are specified as being legal for use in any
>>>>> operating mode, but only if CR4.PKE is set.  By clearing CR4.PKE behind the
>>>>> back of an unpaged guest, these instructions yield #UD despite the guest
>>>>> seeing PKE set if it reads CR4, and OSPKE being visible in CPUID.
>>>> If CR4.PKE is cleared, OSPKE would be invisible at the same time. When
>>>> guest does set CR4_PKE in non-paging mode, then CR4_PKE would be cleared
>>>> in vmcs loading, so, OSPKE should be always invisible, and #UD should
>>>> not be yielded too.
>>> Remember that for HVM guests, Xen calculates OSPKE in software; it never
>>> comes from hardware, as CPUID is an automatic VMEXIT.
>>>
>>> The CPUID code uses the same source of information as a read from cr4,
>>> so comes to the conclusion that OSPKE should be visible.
>>>
>>> Therefore, when the guest looks at CPUID, it sees OSPKE set even though
>>> hardware would come to the opposite conclusion.
>> Shouldn't we correct this (independent of the patch here)?
> 
> No, I don't think so.  That would involve the generic cpuid code looking
> at GUEST_CR4 and making decisions contrary to what is described in the
> manuals.
> 
> Besides, it very definitely should be visible in a read of CR4 (because
> the guest did really set it), which means OSPKE should be visible in CPUID.

Oh, then I misunderstood your earlier reply, taking it that we
wrongly show the flag as set to the guest.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] Revert "x86/hvm: disable pkeys for guests in non-paging mode"
  2017-05-31  7:44     ` Andrew Cooper
  2017-05-31  7:56       ` Jan Beulich
@ 2017-05-31  8:14       ` Han, Huaitong
  2017-06-01  2:15         ` Tian, Kevin
  1 sibling, 1 reply; 21+ messages in thread
From: Han, Huaitong @ 2017-05-31  8:14 UTC (permalink / raw)
  To: andrew.cooper3; +Cc: Tian, Kevin, Nakajima, Jun, JBeulich, xen-devel

On Wed, 2017-05-31 at 08:44 +0100, Andrew Cooper wrote:
> On 31/05/2017 08:09, Han, Huaitong wrote:
> > On Fri, 2017-05-26 at 18:03 +0100, Andrew Cooper wrote:
> >> This reverts commit c41e0266dd59ab50b7a153157e9bd2a3ad114b53.
> >>
> >> When determining Access Rights, Protection Keys only take effect when CR4.PKE
> >> it set, and 4-level paging is active.  All other circumstances (notibly, 32bit
> >> PAE paging) skip the Protection Key control mechanism.
> >>
> >> Therefore, we do not need to clear CR4.PKE behind the back of a guest which is
> >> not using paging, as such a guest is necesserily running with EFER.LME
> >> disabled.
> > Yes, if EFER.LME = 0, Protection Keys would take no effect too, so it
> > isn't necessary to clear CR4.PKE in non-paging mode.
> >
> >> The {RD,WR}PKRU instructions are specified as being legal for use in any
> >> operating mode, but only if CR4.PKE is set.  By clearing CR4.PKE behind the
> >> back of an unpaged guest, these instructions yield #UD despite the guest
> >> seeing PKE set if it reads CR4, and OSPKE being visible in CPUID.
> > If CR4.PKE is cleared, OSPKE would be invisible at the same time. When
> > guest does set CR4_PKE in non-paging mode, then CR4_PKE would be cleared
> > in vmcs loading, so, OSPKE should be always invisible, and #UD should
> > not be yielded too.
> 
> Remember that for HVM guests, Xen calculates OSPKE in software; it never
> comes from hardware, as CPUID is an automatic VMEXIT.
> 
> The CPUID code uses the same source of information as a read from cr4,
> so comes to the conclusion that OSPKE should be visible.
> 
> Therefore, when the guest looks at CPUID, it sees OSPKE set even though
> hardware would come to the opposite conclusion.

Yes, I get the reason: the hw_cr4 is updated, but the guest_cr4 is not
updated, so the OSPKE is visible.

> 
> > Reviewed-by: Huaitong Han <huaitong.han@intel.com>
> 
> Does this stand despite the OSPKE issue?
Yes, I have no comments now.

> 
> ~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] Revert "x86/hvm: disable pkeys for guests in non-paging mode"
  2017-05-31  8:14       ` Han, Huaitong
@ 2017-06-01  2:15         ` Tian, Kevin
  0 siblings, 0 replies; 21+ messages in thread
From: Tian, Kevin @ 2017-06-01  2:15 UTC (permalink / raw)
  To: Han, Huaitong, andrew.cooper3; +Cc: Nakajima, Jun, JBeulich, xen-devel

> From: Han, Huaitong
> Sent: Wednesday, May 31, 2017 4:15 PM
> 
> On Wed, 2017-05-31 at 08:44 +0100, Andrew Cooper wrote:
> > On 31/05/2017 08:09, Han, Huaitong wrote:
> > > On Fri, 2017-05-26 at 18:03 +0100, Andrew Cooper wrote:
> > >> This reverts commit c41e0266dd59ab50b7a153157e9bd2a3ad114b53.
> > >>
> > >> When determining Access Rights, Protection Keys only take effect when
> CR4.PKE
> > >> it set, and 4-level paging is active.  All other circumstances (notibly, 32bit
> > >> PAE paging) skip the Protection Key control mechanism.
> > >>
> > >> Therefore, we do not need to clear CR4.PKE behind the back of a guest
> which is
> > >> not using paging, as such a guest is necesserily running with EFER.LME
> > >> disabled.
> > > Yes, if EFER.LME = 0, Protection Keys would take no effect too, so it
> > > isn't necessary to clear CR4.PKE in non-paging mode.
> > >
> > >> The {RD,WR}PKRU instructions are specified as being legal for use in any
> > >> operating mode, but only if CR4.PKE is set.  By clearing CR4.PKE behind
> the
> > >> back of an unpaged guest, these instructions yield #UD despite the guest
> > >> seeing PKE set if it reads CR4, and OSPKE being visible in CPUID.
> > > If CR4.PKE is cleared, OSPKE would be invisible at the same time. When
> > > guest does set CR4_PKE in non-paging mode, then CR4_PKE would be
> cleared
> > > in vmcs loading, so, OSPKE should be always invisible, and #UD should
> > > not be yielded too.
> >
> > Remember that for HVM guests, Xen calculates OSPKE in software; it never
> > comes from hardware, as CPUID is an automatic VMEXIT.
> >
> > The CPUID code uses the same source of information as a read from cr4,
> > so comes to the conclusion that OSPKE should be visible.
> >
> > Therefore, when the guest looks at CPUID, it sees OSPKE set even though
> > hardware would come to the opposite conclusion.
> 
> Yes, I get the reason: the hw_cr4 is updated, but the guest_cr4 is not
> updated, so the OSPKE is visible.
> 
> >
> > > Reviewed-by: Huaitong Han <huaitong.han@intel.com>
> >
> > Does this stand despite the OSPKE issue?
> Yes, I have no comments now.
> 

Acked-by: Kevin Tian <kevin.tian@intel.com>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] x86/pagewalk: Fix pagewalk's handling of instruction fetches
  2017-05-29  9:15       ` Jan Beulich
@ 2017-06-01 10:19         ` Andrew Cooper
  2017-06-01 10:51           ` Jan Beulich
  0 siblings, 1 reply; 21+ messages in thread
From: Andrew Cooper @ 2017-06-01 10:19 UTC (permalink / raw)
  To: Jan Beulich; +Cc: George Dunlap, Tim Deegan, Xen-devel

On 29/05/17 10:15, Jan Beulich wrote:
>>>> On 29.05.17 at 11:03, <andrew.cooper3@citrix.com> wrote:
>> On 29/05/2017 09:58, Jan Beulich wrote:
>>>>>> On 26.05.17 at 19:03, <andrew.cooper3@citrix.com> wrote:
>>>> --- a/xen/arch/x86/mm/guest_walk.c
>>>> +++ b/xen/arch/x86/mm/guest_walk.c
>>>> @@ -114,22 +114,18 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
>>>>      ASSERT(!(walk & PFEC_implicit) ||
>>>>             !(walk & (PFEC_insn_fetch | PFEC_user_mode)));
>>>>  
>>>> -    /*
>>>> -     * PFEC_insn_fetch is only used as an input to pagetable walking if NX or
>>>> -     * SMEP are enabled.  Otherwise, instruction fetches are indistinguishable
>>>> -     * from data reads.
>>>> -     *
>>>> -     * This property can be demonstrated on real hardware by having NX and
>>>> -     * SMEP inactive, but SMAP active, and observing that EFLAGS.AC determines
>>>> -     * whether a pagefault occures for supervisor execution on user mappings.
>>>> -     */
>>>> -    if ( !(guest_nx_enabled(v) || guest_smep_enabled(v)) )
>>>> -        walk &= ~PFEC_insn_fetch;
>>>> -
>>>>      perfc_incr(guest_walk);
>>>>      memset(gw, 0, sizeof(*gw));
>>>>      gw->va = va;
>>>> -    gw->pfec = walk & (PFEC_insn_fetch | PFEC_user_mode | PFEC_write_access);
>>>> +    gw->pfec = walk & (PFEC_user_mode | PFEC_write_access);
>>>> +
>>>> +    /*
>>>> +     * PFEC_insn_fetch is only reported if NX or SMEP are enabled.  Hardware
>>>> +     * still distingueses instruction fetches during determination of access
>>>> +     * rights.
>>>> +     */
>>>> +    if ( guest_nx_enabled(v) || guest_smep_enabled(v) )
>>>> +        gw->pfec |= (walk & PFEC_insn_fetch);
>>>>  
>>>>  #if GUEST_PAGING_LEVELS >= 3 /* PAE or 64... */
>>>>  #if GUEST_PAGING_LEVELS >= 4 /* 64-bit only... */
>>> Don't you another adjustment to
>>>
>>>     if ( (walk & PFEC_insn_fetch) && (ar & _PAGE_NX_BIT) )
>>>         /* Requested an instruction fetch and found NX? Fail. */
>>>         goto out;
>>>
>>> I can't see anything that would keep _PAGE_NX_BIT out of
>>> ar if NX is not enabled.
>> _PAGE_NX_BIT is reserved if NX is not enabled, and is accounted for in
>> guest_rsvd_bits() in guest_pt.h, and we never hit the access rights logic.
> Ah, right. But perhaps worth having a respective ASSERT()
> here, at once serving as documentation?

I could, but it would feel be out of place.  NX being incorrectly set is
a translation failure, and by definition, the translation needs to have
succeeded before permissions get considered.

Would this clarification be acceptable?

index 5c6a85b..6d6b454 100644
--- a/xen/arch/x86/mm/guest_walk.c
+++ b/xen/arch/x86/mm/guest_walk.c
@@ -360,8 +360,9 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain
*p2m,
     gw->pfec |= PFEC_page_present;
 
     /*
-     * The pagetable walk has returned a successful translation.  Now check
-     * access rights to see whether the access should succeed.
+     * The pagetable walk has returned a successful translation (i.e. All
+     * PTEs are present and have no reserved bits set).  Now check access
+     * rights to see whether the access should succeed.
      */
     ar = (ar_and & AR_ACCUM_AND) | (ar_or & AR_ACCUM_OR);
 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] x86/pagewalk: Fix pagewalk's handling of instruction fetches
  2017-06-01 10:19         ` Andrew Cooper
@ 2017-06-01 10:51           ` Jan Beulich
  2017-06-01 11:22             ` Andrew Cooper
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Beulich @ 2017-06-01 10:51 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: George Dunlap, Tim Deegan, Xen-devel

>>> On 01.06.17 at 12:19, <andrew.cooper3@citrix.com> wrote:
> On 29/05/17 10:15, Jan Beulich wrote:
>>>>> On 29.05.17 at 11:03, <andrew.cooper3@citrix.com> wrote:
>>> On 29/05/2017 09:58, Jan Beulich wrote:
>>>>>>> On 26.05.17 at 19:03, <andrew.cooper3@citrix.com> wrote:
>>>>> --- a/xen/arch/x86/mm/guest_walk.c
>>>>> +++ b/xen/arch/x86/mm/guest_walk.c
>>>>> @@ -114,22 +114,18 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain 
> *p2m,
>>>>>      ASSERT(!(walk & PFEC_implicit) ||
>>>>>             !(walk & (PFEC_insn_fetch | PFEC_user_mode)));
>>>>>  
>>>>> -    /*
>>>>> -     * PFEC_insn_fetch is only used as an input to pagetable walking if NX 
> or
>>>>> -     * SMEP are enabled.  Otherwise, instruction fetches are 
> indistinguishable
>>>>> -     * from data reads.
>>>>> -     *
>>>>> -     * This property can be demonstrated on real hardware by having NX and
>>>>> -     * SMEP inactive, but SMAP active, and observing that EFLAGS.AC 
> determines
>>>>> -     * whether a pagefault occures for supervisor execution on user 
> mappings.
>>>>> -     */
>>>>> -    if ( !(guest_nx_enabled(v) || guest_smep_enabled(v)) )
>>>>> -        walk &= ~PFEC_insn_fetch;
>>>>> -
>>>>>      perfc_incr(guest_walk);
>>>>>      memset(gw, 0, sizeof(*gw));
>>>>>      gw->va = va;
>>>>> -    gw->pfec = walk & (PFEC_insn_fetch | PFEC_user_mode | PFEC_write_access);
>>>>> +    gw->pfec = walk & (PFEC_user_mode | PFEC_write_access);
>>>>> +
>>>>> +    /*
>>>>> +     * PFEC_insn_fetch is only reported if NX or SMEP are enabled.  
> Hardware
>>>>> +     * still distingueses instruction fetches during determination of 
> access
>>>>> +     * rights.
>>>>> +     */
>>>>> +    if ( guest_nx_enabled(v) || guest_smep_enabled(v) )
>>>>> +        gw->pfec |= (walk & PFEC_insn_fetch);
>>>>>  
>>>>>  #if GUEST_PAGING_LEVELS >= 3 /* PAE or 64... */
>>>>>  #if GUEST_PAGING_LEVELS >= 4 /* 64-bit only... */
>>>> Don't you another adjustment to
>>>>
>>>>     if ( (walk & PFEC_insn_fetch) && (ar & _PAGE_NX_BIT) )
>>>>         /* Requested an instruction fetch and found NX? Fail. */
>>>>         goto out;
>>>>
>>>> I can't see anything that would keep _PAGE_NX_BIT out of
>>>> ar if NX is not enabled.
>>> _PAGE_NX_BIT is reserved if NX is not enabled, and is accounted for in
>>> guest_rsvd_bits() in guest_pt.h, and we never hit the access rights logic.
>> Ah, right. But perhaps worth having a respective ASSERT()
>> here, at once serving as documentation?
> 
> I could, but it would feel be out of place.  NX being incorrectly set is
> a translation failure, and by definition, the translation needs to have
> succeeded before permissions get considered.
> 
> Would this clarification be acceptable?
> 
> index 5c6a85b..6d6b454 100644
> --- a/xen/arch/x86/mm/guest_walk.c
> +++ b/xen/arch/x86/mm/guest_walk.c
> @@ -360,8 +360,9 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
>      gw->pfec |= PFEC_page_present;
>  
>      /*
> -     * The pagetable walk has returned a successful translation.  Now check
> -     * access rights to see whether the access should succeed.
> +     * The pagetable walk has returned a successful translation (i.e. All
> +     * PTEs are present and have no reserved bits set).  Now check access
> +     * rights to see whether the access should succeed.
>       */

While this perhaps is a worthwhile addition, my original request
really was to make more visible around the place where it matters
that the NX bit is part of the reserved ones when NX is off. Hence
I'm not sure the comment change is worthwhile, and if you dislike
adding the suggested ASSERT() I won't the patch be left as is.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] x86/pagewalk: Fix pagewalk's handling of instruction fetches
  2017-06-01 10:51           ` Jan Beulich
@ 2017-06-01 11:22             ` Andrew Cooper
  2017-06-01 12:06               ` Jan Beulich
  0 siblings, 1 reply; 21+ messages in thread
From: Andrew Cooper @ 2017-06-01 11:22 UTC (permalink / raw)
  To: Jan Beulich; +Cc: George Dunlap, Tim Deegan, Xen-devel

On 01/06/17 11:51, Jan Beulich wrote:
>>>> On 01.06.17 at 12:19, <andrew.cooper3@citrix.com> wrote:
>> On 29/05/17 10:15, Jan Beulich wrote:
>>>>>> On 29.05.17 at 11:03, <andrew.cooper3@citrix.com> wrote:
>>>> On 29/05/2017 09:58, Jan Beulich wrote:
>>>>>>>> On 26.05.17 at 19:03, <andrew.cooper3@citrix.com> wrote:
>>>>>> --- a/xen/arch/x86/mm/guest_walk.c
>>>>>> +++ b/xen/arch/x86/mm/guest_walk.c
>>>>>> @@ -114,22 +114,18 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain 
>> *p2m,
>>>>>>      ASSERT(!(walk & PFEC_implicit) ||
>>>>>>             !(walk & (PFEC_insn_fetch | PFEC_user_mode)));
>>>>>>  
>>>>>> -    /*
>>>>>> -     * PFEC_insn_fetch is only used as an input to pagetable walking if NX 
>> or
>>>>>> -     * SMEP are enabled.  Otherwise, instruction fetches are 
>> indistinguishable
>>>>>> -     * from data reads.
>>>>>> -     *
>>>>>> -     * This property can be demonstrated on real hardware by having NX and
>>>>>> -     * SMEP inactive, but SMAP active, and observing that EFLAGS.AC 
>> determines
>>>>>> -     * whether a pagefault occures for supervisor execution on user 
>> mappings.
>>>>>> -     */
>>>>>> -    if ( !(guest_nx_enabled(v) || guest_smep_enabled(v)) )
>>>>>> -        walk &= ~PFEC_insn_fetch;
>>>>>> -
>>>>>>      perfc_incr(guest_walk);
>>>>>>      memset(gw, 0, sizeof(*gw));
>>>>>>      gw->va = va;
>>>>>> -    gw->pfec = walk & (PFEC_insn_fetch | PFEC_user_mode | PFEC_write_access);
>>>>>> +    gw->pfec = walk & (PFEC_user_mode | PFEC_write_access);
>>>>>> +
>>>>>> +    /*
>>>>>> +     * PFEC_insn_fetch is only reported if NX or SMEP are enabled.  
>> Hardware
>>>>>> +     * still distingueses instruction fetches during determination of 
>> access
>>>>>> +     * rights.
>>>>>> +     */
>>>>>> +    if ( guest_nx_enabled(v) || guest_smep_enabled(v) )
>>>>>> +        gw->pfec |= (walk & PFEC_insn_fetch);
>>>>>>  
>>>>>>  #if GUEST_PAGING_LEVELS >= 3 /* PAE or 64... */
>>>>>>  #if GUEST_PAGING_LEVELS >= 4 /* 64-bit only... */
>>>>> Don't you another adjustment to
>>>>>
>>>>>     if ( (walk & PFEC_insn_fetch) && (ar & _PAGE_NX_BIT) )
>>>>>         /* Requested an instruction fetch and found NX? Fail. */
>>>>>         goto out;
>>>>>
>>>>> I can't see anything that would keep _PAGE_NX_BIT out of
>>>>> ar if NX is not enabled.
>>>> _PAGE_NX_BIT is reserved if NX is not enabled, and is accounted for in
>>>> guest_rsvd_bits() in guest_pt.h, and we never hit the access rights logic.
>>> Ah, right. But perhaps worth having a respective ASSERT()
>>> here, at once serving as documentation?
>> I could, but it would feel be out of place.  NX being incorrectly set is
>> a translation failure, and by definition, the translation needs to have
>> succeeded before permissions get considered.
>>
>> Would this clarification be acceptable?
>>
>> index 5c6a85b..6d6b454 100644
>> --- a/xen/arch/x86/mm/guest_walk.c
>> +++ b/xen/arch/x86/mm/guest_walk.c
>> @@ -360,8 +360,9 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
>>      gw->pfec |= PFEC_page_present;
>>  
>>      /*
>> -     * The pagetable walk has returned a successful translation.  Now check
>> -     * access rights to see whether the access should succeed.
>> +     * The pagetable walk has returned a successful translation (i.e. All
>> +     * PTEs are present and have no reserved bits set).  Now check access
>> +     * rights to see whether the access should succeed.
>>       */
> While this perhaps is a worthwhile addition, my original request
> really was to make more visible around the place where it matters
> that the NX bit is part of the reserved ones when NX is off. Hence
> I'm not sure the comment change is worthwhile, and if you dislike
> adding the suggested ASSERT() I won't the patch be left as is.

I presume you means something like you won't mind if the patch is left
as-is?

How about this?

diff --git a/xen/arch/x86/mm/guest_walk.c b/xen/arch/x86/mm/guest_walk.c
index 972364f..6055fec 100644
--- a/xen/arch/x86/mm/guest_walk.c
+++ b/xen/arch/x86/mm/guest_walk.c
@@ -356,11 +356,19 @@ guest_walk_tables(struct vcpu *v, struct
p2m_domain *p2m,
     gw->pfec |= PFEC_page_present;
 
     /*
-     * The pagetable walk has returned a successful translation.  Now check
-     * access rights to see whether the access should succeed.
+     * The pagetable walk has returned a successful translation (i.e.
All PTEs
+     * are present and have no reserved bits set).  Now check access
rights to
+     * see whether the access should succeed.
      */
     ar = (ar_and & AR_ACCUM_AND) | (ar_or & AR_ACCUM_OR);
 
+    /*
+     * Sanity check.  If EFER.NX is disabled, _PAGE_NX_BIT is reserved and
+     * should have caused a translation failure before we get here.
+     */
+    if ( ar & _PAGE_NX_BIT )
+        ASSERT(guest_nx_enabled(v));
+
 #if GUEST_PAGING_LEVELS >= 4 /* 64-bit only... */
     /*
      * If all access checks are thus far ok, check Protection Key for 64bit


One problem I have with an ASSERT beside the "if ( (walk &
PFEC_insn_fetch) && (ar & _PAGE_NX_BIT) )" is that it is mid-way through
the permissions checks, rather than at the start, which is likely to get
missed if future access checks get introduced ahead of the protection
key checks.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] x86/pagewalk: Fix pagewalk's handling of instruction fetches
  2017-06-01 11:22             ` Andrew Cooper
@ 2017-06-01 12:06               ` Jan Beulich
  0 siblings, 0 replies; 21+ messages in thread
From: Jan Beulich @ 2017-06-01 12:06 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: George Dunlap, Tim Deegan, Xen-devel

>>> On 01.06.17 at 13:22, <andrew.cooper3@citrix.com> wrote:
> On 01/06/17 11:51, Jan Beulich wrote:
>> While this perhaps is a worthwhile addition, my original request
>> really was to make more visible around the place where it matters
>> that the NX bit is part of the reserved ones when NX is off. Hence
>> I'm not sure the comment change is worthwhile, and if you dislike
>> adding the suggested ASSERT() I won't the patch be left as is.
> 
> I presume you means something like you won't mind if the patch is left
> as-is?

Oop, yes.

> How about this?
> 
> diff --git a/xen/arch/x86/mm/guest_walk.c b/xen/arch/x86/mm/guest_walk.c
> index 972364f..6055fec 100644
> --- a/xen/arch/x86/mm/guest_walk.c
> +++ b/xen/arch/x86/mm/guest_walk.c
> @@ -356,11 +356,19 @@ guest_walk_tables(struct vcpu *v, struct
> p2m_domain *p2m,
>      gw->pfec |= PFEC_page_present;
>  
>      /*
> -     * The pagetable walk has returned a successful translation.  Now check
> -     * access rights to see whether the access should succeed.
> +     * The pagetable walk has returned a successful translation (i.e.
> All PTEs
> +     * are present and have no reserved bits set).  Now check access
> rights to
> +     * see whether the access should succeed.
>       */
>      ar = (ar_and & AR_ACCUM_AND) | (ar_or & AR_ACCUM_OR);
>  
> +    /*
> +     * Sanity check.  If EFER.NX is disabled, _PAGE_NX_BIT is reserved and
> +     * should have caused a translation failure before we get here.
> +     */
> +    if ( ar & _PAGE_NX_BIT )
> +        ASSERT(guest_nx_enabled(v));
> +
>  #if GUEST_PAGING_LEVELS >= 4 /* 64-bit only... */
>      /*
>       * If all access checks are thus far ok, check Protection Key for 64bit

That's fine, thanks.

> One problem I have with an ASSERT beside the "if ( (walk &
> PFEC_insn_fetch) && (ar & _PAGE_NX_BIT) )" is that it is mid-way through
> the permissions checks, rather than at the start, which is likely to get
> missed if future access checks get introduced ahead of the protection
> key checks.

I can understand this.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC for-4.9 0/2] x86/pagewalk: Further bugfixes to pagetable walking
  2017-05-26 17:03 [PATCH RFC for-4.9 0/2] x86/pagewalk: Further bugfixes to pagetable walking Andrew Cooper
  2017-05-26 17:03 ` [PATCH 1/2] Revert "x86/hvm: disable pkeys for guests in non-paging mode" Andrew Cooper
  2017-05-26 17:03 ` [PATCH 2/2] x86/pagewalk: Fix pagewalk's handling of instruction fetches Andrew Cooper
@ 2017-06-01 17:55 ` Julien Grall
  2017-06-01 17:57   ` Andrew Cooper
  2 siblings, 1 reply; 21+ messages in thread
From: Julien Grall @ 2017-06-01 17:55 UTC (permalink / raw)
  To: Andrew Cooper, Xen-devel; +Cc: Jan Beulich

Hi Andrew,

On 26/05/17 18:03, Andrew Cooper wrote:
> These bugfixes finally allow my comprehensive XTF test (following several
> bugfixes, and the added feature of the 0-level pagetable tests) to complete
> successfully on a Skylake Server system, with PKU.
>
> I know this is getting very tight to the line on 4.9, but it would be nice to
> get these fixes in.

Is there any risk to take this patch in Xen 4.9?

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC for-4.9 0/2] x86/pagewalk: Further bugfixes to pagetable walking
  2017-06-01 17:55 ` [PATCH RFC for-4.9 0/2] x86/pagewalk: Further bugfixes to pagetable walking Julien Grall
@ 2017-06-01 17:57   ` Andrew Cooper
  2017-06-01 18:00     ` Julien Grall
  0 siblings, 1 reply; 21+ messages in thread
From: Andrew Cooper @ 2017-06-01 17:57 UTC (permalink / raw)
  To: Julien Grall, Xen-devel; +Cc: Jan Beulich

On 01/06/17 18:55, Julien Grall wrote:
> Hi Andrew,
>
> On 26/05/17 18:03, Andrew Cooper wrote:
>> These bugfixes finally allow my comprehensive XTF test (following
>> several
>> bugfixes, and the added feature of the 0-level pagetable tests) to
>> complete
>> successfully on a Skylake Server system, with PKU.
>>
>> I know this is getting very tight to the line on 4.9, but it would be
>> nice to
>> get these fixes in.
>
> Is there any risk to take this patch in Xen 4.9?

Very little risk (if I do say so myself).

I have an XTF test which can prove the correctness of both patches.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC for-4.9 0/2] x86/pagewalk: Further bugfixes to pagetable walking
  2017-06-01 17:57   ` Andrew Cooper
@ 2017-06-01 18:00     ` Julien Grall
  0 siblings, 0 replies; 21+ messages in thread
From: Julien Grall @ 2017-06-01 18:00 UTC (permalink / raw)
  To: Andrew Cooper, Xen-devel; +Cc: Jan Beulich

On 01/06/17 18:57, Andrew Cooper wrote:
> On 01/06/17 18:55, Julien Grall wrote:
>> Hi Andrew,
>>
>> On 26/05/17 18:03, Andrew Cooper wrote:
>>> These bugfixes finally allow my comprehensive XTF test (following
>>> several
>>> bugfixes, and the added feature of the 0-level pagetable tests) to
>>> complete
>>> successfully on a Skylake Server system, with PKU.
>>>
>>> I know this is getting very tight to the line on 4.9, but it would be
>>> nice to
>>> get these fixes in.
>>
>> Is there any risk to take this patch in Xen 4.9?
>
> Very little risk (if I do say so myself).
>
> I have an XTF test which can prove the correctness of both patches.

It is nice to have XTF around for that :).

Release-acked-by: Julien Grall <julien.grall@arm.com>

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2017-06-01 18:00 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-26 17:03 [PATCH RFC for-4.9 0/2] x86/pagewalk: Further bugfixes to pagetable walking Andrew Cooper
2017-05-26 17:03 ` [PATCH 1/2] Revert "x86/hvm: disable pkeys for guests in non-paging mode" Andrew Cooper
2017-05-29  8:48   ` Jan Beulich
2017-05-31  7:09   ` Han, Huaitong
2017-05-31  7:44     ` Andrew Cooper
2017-05-31  7:56       ` Jan Beulich
2017-05-31  8:06         ` Andrew Cooper
2017-05-31  8:12           ` Jan Beulich
2017-05-31  8:14       ` Han, Huaitong
2017-06-01  2:15         ` Tian, Kevin
2017-05-26 17:03 ` [PATCH 2/2] x86/pagewalk: Fix pagewalk's handling of instruction fetches Andrew Cooper
2017-05-29  8:58   ` Jan Beulich
2017-05-29  9:03     ` Andrew Cooper
2017-05-29  9:15       ` Jan Beulich
2017-06-01 10:19         ` Andrew Cooper
2017-06-01 10:51           ` Jan Beulich
2017-06-01 11:22             ` Andrew Cooper
2017-06-01 12:06               ` Jan Beulich
2017-06-01 17:55 ` [PATCH RFC for-4.9 0/2] x86/pagewalk: Further bugfixes to pagetable walking Julien Grall
2017-06-01 17:57   ` Andrew Cooper
2017-06-01 18:00     ` Julien Grall

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.