linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] KVM: MMU: fix ept=0/pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0
@ 2016-03-08 11:44 Paolo Bonzini
  2016-03-08 11:44 ` [PATCH 1/2] KVM: MMU: fix ept=0/pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo Paolo Bonzini
  2016-03-08 11:44 ` [PATCH 2/2] KVM: MMU: fix reserved bit check for pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 Paolo Bonzini
  0 siblings, 2 replies; 12+ messages in thread
From: Paolo Bonzini @ 2016-03-08 11:44 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: guangrong.xiao

I found this while testing the permission_fault patch with ept=0.

Paolo Bonzini (2):
  KVM: MMU: fix ept=0/pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0
    combo
  KVM: MMU: fix reserved bit check for
    pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0

 Documentation/virtual/kvm/mmu.txt |  3 ++-
 arch/x86/kvm/mmu.c                |  4 +++-
 arch/x86/kvm/vmx.c                | 25 +++++++++++++++----------
 3 files changed, 20 insertions(+), 12 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/2] KVM: MMU: fix ept=0/pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo
  2016-03-08 11:44 [PATCH 0/2] KVM: MMU: fix ept=0/pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 Paolo Bonzini
@ 2016-03-08 11:44 ` Paolo Bonzini
  2016-03-10  8:27   ` Xiao Guangrong
  2016-03-10  8:46   ` Xiao Guangrong
  2016-03-08 11:44 ` [PATCH 2/2] KVM: MMU: fix reserved bit check for pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 Paolo Bonzini
  1 sibling, 2 replies; 12+ messages in thread
From: Paolo Bonzini @ 2016-03-08 11:44 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: guangrong.xiao, stable, Xiao Guangrong, Andy Lutomirski

Yes, all of these are needed. :) This is admittedly a bit odd, but
kvm-unit-tests access.flat tests this if you run it with "-cpu host"
and of course ept=0.

KVM handles supervisor writes of a pte.u=0/pte.w=0/CR0.WP=0 page by
setting U=0 and W=1 in the shadow PTE.  This will cause a user write
to fault and a supervisor write to succeed (which is correct because
CR0.WP=0).  A user read instead will flip U=0 to 1 and W=1 back to 0.
This enables user reads; it also disables supervisor writes, the next
of which will then flip the bits again.

When SMEP is in effect, however, U=0 will enable kernel execution of
this page.  To avoid this, KVM also sets NX=1 in the shadow PTE together
with U=0.  If the guest has not enabled NX, the result is a continuous
stream of page faults due to the NX bit being reserved.

The fix is to force EFER.NX=1 even if the CPU is taking care of the EFER
switch.

There is another bug in the reserved bit check, which I've split to a
separate patch for easier application to stable kernels.

Cc: stable@vger.kernel.org
Cc: Xiao Guangrong <guangrong.xiao@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Fixes: f6577a5fa15d82217ca73c74cd2dcbc0f6c781dd
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 Documentation/virtual/kvm/mmu.txt |  3 ++-
 arch/x86/kvm/vmx.c                | 25 +++++++++++++++----------
 2 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/Documentation/virtual/kvm/mmu.txt b/Documentation/virtual/kvm/mmu.txt
index daf9c0f742d2..c81731096a43 100644
--- a/Documentation/virtual/kvm/mmu.txt
+++ b/Documentation/virtual/kvm/mmu.txt
@@ -358,7 +358,8 @@ In the first case there are two additional complications:
 - if CR4.SMEP is enabled: since we've turned the page into a kernel page,
   the kernel may now execute it.  We handle this by also setting spte.nx.
   If we get a user fetch or read fault, we'll change spte.u=1 and
-  spte.nx=gpte.nx back.
+  spte.nx=gpte.nx back.  For this to work, KVM forces EFER.NX to 1 when
+  shadow paging is in use.
 - if CR4.SMAP is disabled: since the page has been changed to a kernel
   page, it can not be reused when CR4.SMAP is enabled. We set
   CR4.SMAP && !CR0.WP into shadow page's role to avoid this case. Note,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 6e51493ff4f9..91830809d837 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1863,20 +1863,20 @@ static bool update_transition_efer(struct vcpu_vmx *vmx, int efer_offset)
 	guest_efer = vmx->vcpu.arch.efer;
 
 	/*
-	 * NX is emulated; LMA and LME handled by hardware; SCE meaningless
-	 * outside long mode
+	 * LMA and LME handled by hardware; SCE meaningless outside long mode.
 	 */
-	ignore_bits = EFER_NX | EFER_SCE;
+	ignore_bits = EFER_SCE;
 #ifdef CONFIG_X86_64
 	ignore_bits |= EFER_LMA | EFER_LME;
 	/* SCE is meaningful only in long mode on Intel */
 	if (guest_efer & EFER_LMA)
 		ignore_bits &= ~(u64)EFER_SCE;
 #endif
-	guest_efer &= ~ignore_bits;
-	guest_efer |= host_efer & ignore_bits;
-	vmx->guest_msrs[efer_offset].data = guest_efer;
-	vmx->guest_msrs[efer_offset].mask = ~ignore_bits;
+	/* NX is needed to handle CR0.WP=1, CR4.SMEP=1.  */
+	if (!enable_ept) {
+		guest_efer |= EFER_NX;
+		ignore_bits |= EFER_NX;
+	}
 
 	clear_atomic_switch_msr(vmx, MSR_EFER);
 
@@ -1887,16 +1887,21 @@ static bool update_transition_efer(struct vcpu_vmx *vmx, int efer_offset)
 	 */
 	if (cpu_has_load_ia32_efer ||
 	    (enable_ept && ((vmx->vcpu.arch.efer ^ host_efer) & EFER_NX))) {
-		guest_efer = vmx->vcpu.arch.efer;
 		if (!(guest_efer & EFER_LMA))
 			guest_efer &= ~EFER_LME;
 		if (guest_efer != host_efer)
 			add_atomic_switch_msr(vmx, MSR_EFER,
 					      guest_efer, host_efer);
 		return false;
-	}
+	} else {
+		guest_efer &= ~ignore_bits;
+		guest_efer |= host_efer & ignore_bits;
 
-	return true;
+		vmx->guest_msrs[efer_offset].data = guest_efer;
+		vmx->guest_msrs[efer_offset].mask = ~ignore_bits;
+
+		return true;
+	}
 }
 
 static unsigned long segment_base(u16 selector)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/2] KVM: MMU: fix reserved bit check for pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0
  2016-03-08 11:44 [PATCH 0/2] KVM: MMU: fix ept=0/pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 Paolo Bonzini
  2016-03-08 11:44 ` [PATCH 1/2] KVM: MMU: fix ept=0/pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo Paolo Bonzini
@ 2016-03-08 11:44 ` Paolo Bonzini
  2016-03-10  8:36   ` Xiao Guangrong
  1 sibling, 1 reply; 12+ messages in thread
From: Paolo Bonzini @ 2016-03-08 11:44 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: guangrong.xiao, stable, Xiao Guangrong

KVM handles supervisor writes of a pte.u=0/pte.w=0/CR0.WP=0 page by
setting U=0 and W=1 in the shadow PTE.  This will cause a user write
to fault and a supervisor write to succeed (which is correct because
CR0.WP=0).  A user read instead will flip U=0 to 1 and W=1 back to 0.
This enables user reads; it also disables supervisor writes, the next
of which will then flip the bits again.

When SMEP is in effect, however, pte.u=0 will enable kernel execution
of this page.  To avoid this, KVM also sets pte.nx=1.  The reserved bit
catches this because it only looks at the guest's EFER.NX bit.  Teach it
that smep_andnot_wp will also use the NX bit of SPTEs.

Cc: stable@vger.kernel.org
Cc: Xiao Guangrong <guangrong.xiao@redhat.com>
Fixes: c258b62b264fdc469b6d3610a907708068145e3b
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/mmu.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 95a955de5964..0cd4ee01de94 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3721,13 +3721,15 @@ static void reset_rsvds_bits_mask_ept(struct kvm_vcpu *vcpu,
 void
 reset_shadow_zero_bits_mask(struct kvm_vcpu *vcpu, struct kvm_mmu *context)
 {
+	int uses_nx = context->nx || context->base_role.smep_andnot_wp;
+
 	/*
 	 * Passing "true" to the last argument is okay; it adds a check
 	 * on bit 8 of the SPTEs which KVM doesn't use anyway.
 	 */
 	__reset_rsvds_bits_mask(vcpu, &context->shadow_zero_check,
 				boot_cpu_data.x86_phys_bits,
-				context->shadow_root_level, context->nx,
+				context->shadow_root_level, uses_nx,
 				guest_cpuid_has_gbpages(vcpu), is_pse(vcpu),
 				true);
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] KVM: MMU: fix ept=0/pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo
  2016-03-08 11:44 ` [PATCH 1/2] KVM: MMU: fix ept=0/pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo Paolo Bonzini
@ 2016-03-10  8:27   ` Xiao Guangrong
  2016-03-10 10:01     ` Paolo Bonzini
  2016-03-10 10:09     ` Paolo Bonzini
  2016-03-10  8:46   ` Xiao Guangrong
  1 sibling, 2 replies; 12+ messages in thread
From: Xiao Guangrong @ 2016-03-10  8:27 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm; +Cc: stable, Xiao Guangrong, Andy Lutomirski



On 03/08/2016 07:44 PM, Paolo Bonzini wrote:
> Yes, all of these are needed. :) This is admittedly a bit odd, but
> kvm-unit-tests access.flat tests this if you run it with "-cpu host"
> and of course ept=0.
>
> KVM handles supervisor writes of a pte.u=0/pte.w=0/CR0.WP=0 page by
> setting U=0 and W=1 in the shadow PTE.  This will cause a user write
> to fault and a supervisor write to succeed (which is correct because
> CR0.WP=0).  A user read instead will flip U=0 to 1 and W=1 back to 0.
> This enables user reads; it also disables supervisor writes, the next
> of which will then flip the bits again.
>
> When SMEP is in effect, however, U=0 will enable kernel execution of
> this page.  To avoid this, KVM also sets NX=1 in the shadow PTE together
> with U=0.  If the guest has not enabled NX, the result is a continuous
> stream of page faults due to the NX bit being reserved.
>
> The fix is to force EFER.NX=1 even if the CPU is taking care of the EFER
> switch.

Good catch!

So it only hurts the box which has cpu_has_load_ia32_efer support otherwise
NX is inherited from kernel (kernel always sets NX if CPU supports it),
right?

>
> There is another bug in the reserved bit check, which I've split to a
> separate patch for easier application to stable kernels.
>

> Cc: stable@vger.kernel.org
> Cc: Xiao Guangrong <guangrong.xiao@redhat.com>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Fixes: f6577a5fa15d82217ca73c74cd2dcbc0f6c781dd
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   Documentation/virtual/kvm/mmu.txt |  3 ++-
>   arch/x86/kvm/vmx.c                | 25 +++++++++++++++----------
>   2 files changed, 17 insertions(+), 11 deletions(-)
>
> diff --git a/Documentation/virtual/kvm/mmu.txt b/Documentation/virtual/kvm/mmu.txt
> index daf9c0f742d2..c81731096a43 100644
> --- a/Documentation/virtual/kvm/mmu.txt
> +++ b/Documentation/virtual/kvm/mmu.txt
> @@ -358,7 +358,8 @@ In the first case there are two additional complications:
>   - if CR4.SMEP is enabled: since we've turned the page into a kernel page,
>     the kernel may now execute it.  We handle this by also setting spte.nx.
>     If we get a user fetch or read fault, we'll change spte.u=1 and
> -  spte.nx=gpte.nx back.
> +  spte.nx=gpte.nx back.  For this to work, KVM forces EFER.NX to 1 when
> +  shadow paging is in use.
>   - if CR4.SMAP is disabled: since the page has been changed to a kernel
>     page, it can not be reused when CR4.SMAP is enabled. We set
>     CR4.SMAP && !CR0.WP into shadow page's role to avoid this case. Note,
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 6e51493ff4f9..91830809d837 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -1863,20 +1863,20 @@ static bool update_transition_efer(struct vcpu_vmx *vmx, int efer_offset)
>   	guest_efer = vmx->vcpu.arch.efer;
>
>   	/*
> -	 * NX is emulated; LMA and LME handled by hardware; SCE meaningless
> -	 * outside long mode
> +	 * LMA and LME handled by hardware; SCE meaningless outside long mode.
>   	 */
> -	ignore_bits = EFER_NX | EFER_SCE;
> +	ignore_bits = EFER_SCE;
>   #ifdef CONFIG_X86_64
>   	ignore_bits |= EFER_LMA | EFER_LME;
>   	/* SCE is meaningful only in long mode on Intel */
>   	if (guest_efer & EFER_LMA)
>   		ignore_bits &= ~(u64)EFER_SCE;
>   #endif
> -	guest_efer &= ~ignore_bits;
> -	guest_efer |= host_efer & ignore_bits;
> -	vmx->guest_msrs[efer_offset].data = guest_efer;
> -	vmx->guest_msrs[efer_offset].mask = ~ignore_bits;
> +	/* NX is needed to handle CR0.WP=1, CR4.SMEP=1.  */

> +	if (!enable_ept) {
> +		guest_efer |= EFER_NX;
> +		ignore_bits |= EFER_NX;

Update ignore_bits is not necessary i think.

> +	}
>
>   	clear_atomic_switch_msr(vmx, MSR_EFER);
>
> @@ -1887,16 +1887,21 @@ static bool update_transition_efer(struct vcpu_vmx *vmx, int efer_offset)
>   	 */
>   	if (cpu_has_load_ia32_efer ||
>   	    (enable_ept && ((vmx->vcpu.arch.efer ^ host_efer) & EFER_NX))) {
> -		guest_efer = vmx->vcpu.arch.efer;
>   		if (!(guest_efer & EFER_LMA))
>   			guest_efer &= ~EFER_LME;
>   		if (guest_efer != host_efer)
>   			add_atomic_switch_msr(vmx, MSR_EFER,
>   					      guest_efer, host_efer);

So, why not set EFER_NX (if !ept) just in this branch to make the fix more simpler?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] KVM: MMU: fix reserved bit check for pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0
  2016-03-08 11:44 ` [PATCH 2/2] KVM: MMU: fix reserved bit check for pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 Paolo Bonzini
@ 2016-03-10  8:36   ` Xiao Guangrong
  2016-03-10 10:02     ` Paolo Bonzini
  0 siblings, 1 reply; 12+ messages in thread
From: Xiao Guangrong @ 2016-03-10  8:36 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm; +Cc: stable, Xiao Guangrong



On 03/08/2016 07:44 PM, Paolo Bonzini wrote:
> KVM handles supervisor writes of a pte.u=0/pte.w=0/CR0.WP=0 page by
> setting U=0 and W=1 in the shadow PTE.  This will cause a user write
> to fault and a supervisor write to succeed (which is correct because
> CR0.WP=0).  A user read instead will flip U=0 to 1 and W=1 back to 0.
> This enables user reads; it also disables supervisor writes, the next
> of which will then flip the bits again.
>
> When SMEP is in effect, however, pte.u=0 will enable kernel execution
> of this page.  To avoid this, KVM also sets pte.nx=1.  The reserved bit
> catches this because it only looks at the guest's EFER.NX bit.  Teach it
> that smep_andnot_wp will also use the NX bit of SPTEs.
>
> Cc: stable@vger.kernel.org
> Cc: Xiao Guangrong <guangrong.xiao@redhat.com>

As a redhat guy i am so proud. :)

> Fixes: c258b62b264fdc469b6d3610a907708068145e3b

Thanks for you fixing it, Paolo!

Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>

> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   arch/x86/kvm/mmu.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 95a955de5964..0cd4ee01de94 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -3721,13 +3721,15 @@ static void reset_rsvds_bits_mask_ept(struct kvm_vcpu *vcpu,
>   void
>   reset_shadow_zero_bits_mask(struct kvm_vcpu *vcpu, struct kvm_mmu *context)
>   {
> +	int uses_nx = context->nx || context->base_role.smep_andnot_wp;

It would be better if it is 'bool'

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] KVM: MMU: fix ept=0/pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo
  2016-03-08 11:44 ` [PATCH 1/2] KVM: MMU: fix ept=0/pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo Paolo Bonzini
  2016-03-10  8:27   ` Xiao Guangrong
@ 2016-03-10  8:46   ` Xiao Guangrong
  2016-03-10 10:03     ` Paolo Bonzini
  1 sibling, 1 reply; 12+ messages in thread
From: Xiao Guangrong @ 2016-03-10  8:46 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm; +Cc: stable, Xiao Guangrong, Andy Lutomirski



On 03/08/2016 07:44 PM, Paolo Bonzini wrote:
> Yes, all of these are needed. :) This is admittedly a bit odd, but
> kvm-unit-tests access.flat tests this if you run it with "-cpu host"
> and of course ept=0.
>
> KVM handles supervisor writes of a pte.u=0/pte.w=0/CR0.WP=0 page by
> setting U=0 and W=1 in the shadow PTE.  This will cause a user write
> to fault and a supervisor write to succeed (which is correct because
> CR0.WP=0).  A user read instead will flip U=0 to 1 and W=1 back to 0.

BTW, it should be pte.u = 1 where you mentioned above.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] KVM: MMU: fix ept=0/pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo
  2016-03-10  8:27   ` Xiao Guangrong
@ 2016-03-10 10:01     ` Paolo Bonzini
  2016-03-10 10:09     ` Paolo Bonzini
  1 sibling, 0 replies; 12+ messages in thread
From: Paolo Bonzini @ 2016-03-10 10:01 UTC (permalink / raw)
  To: Xiao Guangrong, linux-kernel, kvm; +Cc: stable, Andy Lutomirski



On 10/03/2016 09:27, Xiao Guangrong wrote:
> So it only hurts the box which has cpu_has_load_ia32_efer support otherwise
> NX is inherited from kernel (kernel always sets NX if CPU supports it),
> right?

Yes, but I think !cpu_has_load_ia32_efer && SMEP does not exist.  On the
other hand it's really only when disabling ept, so it's a weird corner
case that only happens during testing.

Paolo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] KVM: MMU: fix reserved bit check for pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0
  2016-03-10  8:36   ` Xiao Guangrong
@ 2016-03-10 10:02     ` Paolo Bonzini
  0 siblings, 0 replies; 12+ messages in thread
From: Paolo Bonzini @ 2016-03-10 10:02 UTC (permalink / raw)
  To: Xiao Guangrong, linux-kernel, kvm; +Cc: stable



On 10/03/2016 09:36, Xiao Guangrong wrote:
> 
> 
> On 03/08/2016 07:44 PM, Paolo Bonzini wrote:
>> KVM handles supervisor writes of a pte.u=0/pte.w=0/CR0.WP=0 page by
>> setting U=0 and W=1 in the shadow PTE.  This will cause a user write
>> to fault and a supervisor write to succeed (which is correct because
>> CR0.WP=0).  A user read instead will flip U=0 to 1 and W=1 back to 0.
>> This enables user reads; it also disables supervisor writes, the next
>> of which will then flip the bits again.
>>
>> When SMEP is in effect, however, pte.u=0 will enable kernel execution
>> of this page.  To avoid this, KVM also sets pte.nx=1.  The reserved bit
>> catches this because it only looks at the guest's EFER.NX bit.  Teach it
>> that smep_andnot_wp will also use the NX bit of SPTEs.
>>
>> Cc: stable@vger.kernel.org
>> Cc: Xiao Guangrong <guangrong.xiao@redhat.com>
> 
> As a redhat guy i am so proud. :)
> 
>> Fixes: c258b62b264fdc469b6d3610a907708068145e3b
> 
> Thanks for you fixing it, Paolo!
> 
> Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> 
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>>   arch/x86/kvm/mmu.c | 4 +++-
>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index 95a955de5964..0cd4ee01de94 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -3721,13 +3721,15 @@ static void reset_rsvds_bits_mask_ept(struct
>> kvm_vcpu *vcpu,
>>   void
>>   reset_shadow_zero_bits_mask(struct kvm_vcpu *vcpu, struct kvm_mmu
>> *context)
>>   {
>> +    int uses_nx = context->nx || context->base_role.smep_andnot_wp;
> 
> It would be better if it is 'bool'

Ok, will do.

Paolo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] KVM: MMU: fix ept=0/pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo
  2016-03-10  8:46   ` Xiao Guangrong
@ 2016-03-10 10:03     ` Paolo Bonzini
  0 siblings, 0 replies; 12+ messages in thread
From: Paolo Bonzini @ 2016-03-10 10:03 UTC (permalink / raw)
  To: Xiao Guangrong, linux-kernel, kvm; +Cc: stable, Andy Lutomirski



On 10/03/2016 09:46, Xiao Guangrong wrote:
> 
>> Yes, all of these are needed. :) This is admittedly a bit odd, but
>> kvm-unit-tests access.flat tests this if you run it with "-cpu host"
>> and of course ept=0.
>>
>> KVM handles supervisor writes of a pte.u=0/pte.w=0/CR0.WP=0 page by
>> setting U=0 and W=1 in the shadow PTE.  This will cause a user write
>> to fault and a supervisor write to succeed (which is correct because
>> CR0.WP=0).  A user read instead will flip U=0 to 1 and W=1 back to 0.
> 
> BTW, it should be pte.u = 1 where you mentioned above.

Ok, will fix.

Paolo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] KVM: MMU: fix ept=0/pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo
  2016-03-10  8:27   ` Xiao Guangrong
  2016-03-10 10:01     ` Paolo Bonzini
@ 2016-03-10 10:09     ` Paolo Bonzini
  2016-03-10 12:14       ` Xiao Guangrong
  1 sibling, 1 reply; 12+ messages in thread
From: Paolo Bonzini @ 2016-03-10 10:09 UTC (permalink / raw)
  To: Xiao Guangrong, linux-kernel, kvm; +Cc: stable, Andy Lutomirski



On 10/03/2016 09:27, Xiao Guangrong wrote:
>>
> 
>> +    if (!enable_ept) {
>> +        guest_efer |= EFER_NX;
>> +        ignore_bits |= EFER_NX;
> 
> Update ignore_bits is not necessary i think.

More precisely, ignore_bits is only needed if guest EFER.NX=0 and we're
not in this CR0.WP=1/CR4.SMEP=0 situation.  In theory you could have
guest EFER.NX=1 and host EFER.NX=0.

This is what I came up with (plus some comments :)):

	u64 guest_efer = vmx->vcpu.arch.efer;
	u64 ignore_bits = 0;

	if (!enable_ept) {
		if (boot_cpu_has(X86_FEATURE_SMEP))
			guest_efer |= EFER_NX;
		else if (!(guest_efer & EFER_NX))
			ignore_bits |= EFER_NX;
	}

>> -        guest_efer = vmx->vcpu.arch.efer;
>>           if (!(guest_efer & EFER_LMA))
>>               guest_efer &= ~EFER_LME;
>>           if (guest_efer != host_efer)
>>               add_atomic_switch_msr(vmx, MSR_EFER,
>>                             guest_efer, host_efer);
> 
> So, why not set EFER_NX (if !ept) just in this branch to make the fix
> more simpler?

I didn't like having

	guest_efer = vmx->vcpu.arch.efer;
	...
	if (!enable_ept)
		guest_efer |= EFER_NX;
	guest_efer &= ~ignore_bits;
	guest_efer |= host_efer & ignore_bits;
	...
	if (...) {
		guest_efer = vmx->vcpu.arch.efer;
		if (!enable_ept)
			guest_efer |= EFER_NX;
		...
	}

My patch is bigger but the resulting code is smaller and easier to follow:

	guest_efer = vmx->vcpu.arch.efer;
	if (!enable_ept)
		guest_efer |= EFER_NX;
	...
	if (...) {
		...
	} else {
		guest_efer &= ~ignore_bits;
		guest_efer |= host_efer & ignore_bits;
	}

Paolo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] KVM: MMU: fix ept=0/pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo
  2016-03-10 10:09     ` Paolo Bonzini
@ 2016-03-10 12:14       ` Xiao Guangrong
  2016-03-10 12:26         ` Paolo Bonzini
  0 siblings, 1 reply; 12+ messages in thread
From: Xiao Guangrong @ 2016-03-10 12:14 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm; +Cc: stable, Andy Lutomirski



On 03/10/2016 06:09 PM, Paolo Bonzini wrote:
>
>
> On 10/03/2016 09:27, Xiao Guangrong wrote:
>>>
>>
>>> +    if (!enable_ept) {
>>> +        guest_efer |= EFER_NX;
>>> +        ignore_bits |= EFER_NX;
>>
>> Update ignore_bits is not necessary i think.
>
> More precisely, ignore_bits is only needed if guest EFER.NX=0 and we're
> not in this CR0.WP=1/CR4.SMEP=0 situation.  In theory you could have
> guest EFER.NX=1 and host EFER.NX=0.

It is not in linux, the kernel always set EFER.NX if CPUID reports it,
arch/x86/kernel/head_64.S:

204         /* Setup EFER (Extended Feature Enable Register) */
205         movl    $MSR_EFER, %ecx
206         rdmsr
207         btsl    $_EFER_SCE, %eax        /* Enable System Call */
208         btl     $20,%edi                /* No Execute supported? */
209         jnc     1f
210         btsl    $_EFER_NX, %eax
211         btsq    $_PAGE_BIT_NX,early_pmd_flags(%rip)
212 1:      wrmsr                           /* Make changes effective */

So if guest sees NX in its cpuid then host EFER.NX should be 1.

>
> This is what I came up with (plus some comments :)):
>
> 	u64 guest_efer = vmx->vcpu.arch.efer;
> 	u64 ignore_bits = 0;
>
> 	if (!enable_ept) {
> 		if (boot_cpu_has(X86_FEATURE_SMEP))
> 			guest_efer |= EFER_NX;
> 		else if (!(guest_efer & EFER_NX))
> 			ignore_bits |= EFER_NX;
> 	}

Your logic is very right.

What my suggestion is we can keep ignore_bits = EFER_NX | EFER_SCE;
(needn't conditionally adjust it) because EFER_NX must be the same
between guest and host if we switch EFER manually.

> My patch is bigger but the resulting code is smaller and easier to follow:
>
> 	guest_efer = vmx->vcpu.arch.efer;
> 	if (!enable_ept)
> 		guest_efer |= EFER_NX;
> 	...
> 	if (...) {
> 		...
> 	} else {
> 		guest_efer &= ~ignore_bits;
> 		guest_efer |= host_efer & ignore_bits;
> 	}

I agreed. :)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] KVM: MMU: fix ept=0/pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo
  2016-03-10 12:14       ` Xiao Guangrong
@ 2016-03-10 12:26         ` Paolo Bonzini
  0 siblings, 0 replies; 12+ messages in thread
From: Paolo Bonzini @ 2016-03-10 12:26 UTC (permalink / raw)
  To: Xiao Guangrong, linux-kernel, kvm; +Cc: stable, Andy Lutomirski



On 10/03/2016 13:14, Xiao Guangrong wrote:
>> More precisely, ignore_bits is only needed if guest EFER.NX=0 and we're
>> not in this CR0.WP=1/CR4.SMEP=0 situation.  In theory you could have
>> guest EFER.NX=1 and host EFER.NX=0.
> 
> It is not in linux, the kernel always set EFER.NX if CPUID reports it,
> arch/x86/kernel/head_64.S:
> 
> 204         /* Setup EFER (Extended Feature Enable Register) */
> 205         movl    $MSR_EFER, %ecx
> 206         rdmsr
> 207         btsl    $_EFER_SCE, %eax        /* Enable System Call */
> 208         btl     $20,%edi                /* No Execute supported? */
> 209         jnc     1f
> 210         btsl    $_EFER_NX, %eax
> 211         btsq    $_PAGE_BIT_NX,early_pmd_flags(%rip)
> 212 1:      wrmsr                           /* Make changes effective */
> 
> So if guest sees NX in its cpuid then host EFER.NX should be 1.

You're right.  It's just in theory.  But ignoring EFER.NX when it is 1
is technically not correct; since we have to add some special EFER_NX
logic anyway, I preferred to make it pedantically right. :)

Paolo

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-03-10 12:26 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-08 11:44 [PATCH 0/2] KVM: MMU: fix ept=0/pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 Paolo Bonzini
2016-03-08 11:44 ` [PATCH 1/2] KVM: MMU: fix ept=0/pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo Paolo Bonzini
2016-03-10  8:27   ` Xiao Guangrong
2016-03-10 10:01     ` Paolo Bonzini
2016-03-10 10:09     ` Paolo Bonzini
2016-03-10 12:14       ` Xiao Guangrong
2016-03-10 12:26         ` Paolo Bonzini
2016-03-10  8:46   ` Xiao Guangrong
2016-03-10 10:03     ` Paolo Bonzini
2016-03-08 11:44 ` [PATCH 2/2] KVM: MMU: fix reserved bit check for pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 Paolo Bonzini
2016-03-10  8:36   ` Xiao Guangrong
2016-03-10 10:02     ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).