* [PATCH 0/2] KVM: MMU: fix ept=0/pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 @ 2016-03-08 11:44 Paolo Bonzini 2016-03-08 11:44 ` [PATCH 1/2] KVM: MMU: fix ept=0/pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo Paolo Bonzini 2016-03-08 11:44 ` [PATCH 2/2] KVM: MMU: fix reserved bit check for pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 Paolo Bonzini 0 siblings, 2 replies; 12+ messages in thread From: Paolo Bonzini @ 2016-03-08 11:44 UTC (permalink / raw) To: linux-kernel, kvm; +Cc: guangrong.xiao I found this while testing the permission_fault patch with ept=0. Paolo Bonzini (2): KVM: MMU: fix ept=0/pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo KVM: MMU: fix reserved bit check for pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 Documentation/virtual/kvm/mmu.txt | 3 ++- arch/x86/kvm/mmu.c | 4 +++- arch/x86/kvm/vmx.c | 25 +++++++++++++++---------- 3 files changed, 20 insertions(+), 12 deletions(-) -- 1.8.3.1 ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 1/2] KVM: MMU: fix ept=0/pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo 2016-03-08 11:44 [PATCH 0/2] KVM: MMU: fix ept=0/pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 Paolo Bonzini @ 2016-03-08 11:44 ` Paolo Bonzini 2016-03-10 8:27 ` Xiao Guangrong 2016-03-10 8:46 ` Xiao Guangrong 2016-03-08 11:44 ` [PATCH 2/2] KVM: MMU: fix reserved bit check for pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 Paolo Bonzini 1 sibling, 2 replies; 12+ messages in thread From: Paolo Bonzini @ 2016-03-08 11:44 UTC (permalink / raw) To: linux-kernel, kvm; +Cc: guangrong.xiao, stable, Xiao Guangrong, Andy Lutomirski Yes, all of these are needed. :) This is admittedly a bit odd, but kvm-unit-tests access.flat tests this if you run it with "-cpu host" and of course ept=0. KVM handles supervisor writes of a pte.u=0/pte.w=0/CR0.WP=0 page by setting U=0 and W=1 in the shadow PTE. This will cause a user write to fault and a supervisor write to succeed (which is correct because CR0.WP=0). A user read instead will flip U=0 to 1 and W=1 back to 0. This enables user reads; it also disables supervisor writes, the next of which will then flip the bits again. When SMEP is in effect, however, U=0 will enable kernel execution of this page. To avoid this, KVM also sets NX=1 in the shadow PTE together with U=0. If the guest has not enabled NX, the result is a continuous stream of page faults due to the NX bit being reserved. The fix is to force EFER.NX=1 even if the CPU is taking care of the EFER switch. There is another bug in the reserved bit check, which I've split to a separate patch for easier application to stable kernels. Cc: stable@vger.kernel.org Cc: Xiao Guangrong <guangrong.xiao@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Fixes: f6577a5fa15d82217ca73c74cd2dcbc0f6c781dd Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> --- Documentation/virtual/kvm/mmu.txt | 3 ++- arch/x86/kvm/vmx.c | 25 +++++++++++++++---------- 2 files changed, 17 insertions(+), 11 deletions(-) diff --git a/Documentation/virtual/kvm/mmu.txt b/Documentation/virtual/kvm/mmu.txt index daf9c0f742d2..c81731096a43 100644 --- a/Documentation/virtual/kvm/mmu.txt +++ b/Documentation/virtual/kvm/mmu.txt @@ -358,7 +358,8 @@ In the first case there are two additional complications: - if CR4.SMEP is enabled: since we've turned the page into a kernel page, the kernel may now execute it. We handle this by also setting spte.nx. If we get a user fetch or read fault, we'll change spte.u=1 and - spte.nx=gpte.nx back. + spte.nx=gpte.nx back. For this to work, KVM forces EFER.NX to 1 when + shadow paging is in use. - if CR4.SMAP is disabled: since the page has been changed to a kernel page, it can not be reused when CR4.SMAP is enabled. We set CR4.SMAP && !CR0.WP into shadow page's role to avoid this case. Note, diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 6e51493ff4f9..91830809d837 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -1863,20 +1863,20 @@ static bool update_transition_efer(struct vcpu_vmx *vmx, int efer_offset) guest_efer = vmx->vcpu.arch.efer; /* - * NX is emulated; LMA and LME handled by hardware; SCE meaningless - * outside long mode + * LMA and LME handled by hardware; SCE meaningless outside long mode. */ - ignore_bits = EFER_NX | EFER_SCE; + ignore_bits = EFER_SCE; #ifdef CONFIG_X86_64 ignore_bits |= EFER_LMA | EFER_LME; /* SCE is meaningful only in long mode on Intel */ if (guest_efer & EFER_LMA) ignore_bits &= ~(u64)EFER_SCE; #endif - guest_efer &= ~ignore_bits; - guest_efer |= host_efer & ignore_bits; - vmx->guest_msrs[efer_offset].data = guest_efer; - vmx->guest_msrs[efer_offset].mask = ~ignore_bits; + /* NX is needed to handle CR0.WP=1, CR4.SMEP=1. */ + if (!enable_ept) { + guest_efer |= EFER_NX; + ignore_bits |= EFER_NX; + } clear_atomic_switch_msr(vmx, MSR_EFER); @@ -1887,16 +1887,21 @@ static bool update_transition_efer(struct vcpu_vmx *vmx, int efer_offset) */ if (cpu_has_load_ia32_efer || (enable_ept && ((vmx->vcpu.arch.efer ^ host_efer) & EFER_NX))) { - guest_efer = vmx->vcpu.arch.efer; if (!(guest_efer & EFER_LMA)) guest_efer &= ~EFER_LME; if (guest_efer != host_efer) add_atomic_switch_msr(vmx, MSR_EFER, guest_efer, host_efer); return false; - } + } else { + guest_efer &= ~ignore_bits; + guest_efer |= host_efer & ignore_bits; - return true; + vmx->guest_msrs[efer_offset].data = guest_efer; + vmx->guest_msrs[efer_offset].mask = ~ignore_bits; + + return true; + } } static unsigned long segment_base(u16 selector) -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] KVM: MMU: fix ept=0/pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo 2016-03-08 11:44 ` [PATCH 1/2] KVM: MMU: fix ept=0/pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo Paolo Bonzini @ 2016-03-10 8:27 ` Xiao Guangrong 2016-03-10 10:01 ` Paolo Bonzini 2016-03-10 10:09 ` Paolo Bonzini 2016-03-10 8:46 ` Xiao Guangrong 1 sibling, 2 replies; 12+ messages in thread From: Xiao Guangrong @ 2016-03-10 8:27 UTC (permalink / raw) To: Paolo Bonzini, linux-kernel, kvm; +Cc: stable, Xiao Guangrong, Andy Lutomirski On 03/08/2016 07:44 PM, Paolo Bonzini wrote: > Yes, all of these are needed. :) This is admittedly a bit odd, but > kvm-unit-tests access.flat tests this if you run it with "-cpu host" > and of course ept=0. > > KVM handles supervisor writes of a pte.u=0/pte.w=0/CR0.WP=0 page by > setting U=0 and W=1 in the shadow PTE. This will cause a user write > to fault and a supervisor write to succeed (which is correct because > CR0.WP=0). A user read instead will flip U=0 to 1 and W=1 back to 0. > This enables user reads; it also disables supervisor writes, the next > of which will then flip the bits again. > > When SMEP is in effect, however, U=0 will enable kernel execution of > this page. To avoid this, KVM also sets NX=1 in the shadow PTE together > with U=0. If the guest has not enabled NX, the result is a continuous > stream of page faults due to the NX bit being reserved. > > The fix is to force EFER.NX=1 even if the CPU is taking care of the EFER > switch. Good catch! So it only hurts the box which has cpu_has_load_ia32_efer support otherwise NX is inherited from kernel (kernel always sets NX if CPU supports it), right? > > There is another bug in the reserved bit check, which I've split to a > separate patch for easier application to stable kernels. > > Cc: stable@vger.kernel.org > Cc: Xiao Guangrong <guangrong.xiao@redhat.com> > Cc: Andy Lutomirski <luto@amacapital.net> > Fixes: f6577a5fa15d82217ca73c74cd2dcbc0f6c781dd > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > --- > Documentation/virtual/kvm/mmu.txt | 3 ++- > arch/x86/kvm/vmx.c | 25 +++++++++++++++---------- > 2 files changed, 17 insertions(+), 11 deletions(-) > > diff --git a/Documentation/virtual/kvm/mmu.txt b/Documentation/virtual/kvm/mmu.txt > index daf9c0f742d2..c81731096a43 100644 > --- a/Documentation/virtual/kvm/mmu.txt > +++ b/Documentation/virtual/kvm/mmu.txt > @@ -358,7 +358,8 @@ In the first case there are two additional complications: > - if CR4.SMEP is enabled: since we've turned the page into a kernel page, > the kernel may now execute it. We handle this by also setting spte.nx. > If we get a user fetch or read fault, we'll change spte.u=1 and > - spte.nx=gpte.nx back. > + spte.nx=gpte.nx back. For this to work, KVM forces EFER.NX to 1 when > + shadow paging is in use. > - if CR4.SMAP is disabled: since the page has been changed to a kernel > page, it can not be reused when CR4.SMAP is enabled. We set > CR4.SMAP && !CR0.WP into shadow page's role to avoid this case. Note, > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > index 6e51493ff4f9..91830809d837 100644 > --- a/arch/x86/kvm/vmx.c > +++ b/arch/x86/kvm/vmx.c > @@ -1863,20 +1863,20 @@ static bool update_transition_efer(struct vcpu_vmx *vmx, int efer_offset) > guest_efer = vmx->vcpu.arch.efer; > > /* > - * NX is emulated; LMA and LME handled by hardware; SCE meaningless > - * outside long mode > + * LMA and LME handled by hardware; SCE meaningless outside long mode. > */ > - ignore_bits = EFER_NX | EFER_SCE; > + ignore_bits = EFER_SCE; > #ifdef CONFIG_X86_64 > ignore_bits |= EFER_LMA | EFER_LME; > /* SCE is meaningful only in long mode on Intel */ > if (guest_efer & EFER_LMA) > ignore_bits &= ~(u64)EFER_SCE; > #endif > - guest_efer &= ~ignore_bits; > - guest_efer |= host_efer & ignore_bits; > - vmx->guest_msrs[efer_offset].data = guest_efer; > - vmx->guest_msrs[efer_offset].mask = ~ignore_bits; > + /* NX is needed to handle CR0.WP=1, CR4.SMEP=1. */ > + if (!enable_ept) { > + guest_efer |= EFER_NX; > + ignore_bits |= EFER_NX; Update ignore_bits is not necessary i think. > + } > > clear_atomic_switch_msr(vmx, MSR_EFER); > > @@ -1887,16 +1887,21 @@ static bool update_transition_efer(struct vcpu_vmx *vmx, int efer_offset) > */ > if (cpu_has_load_ia32_efer || > (enable_ept && ((vmx->vcpu.arch.efer ^ host_efer) & EFER_NX))) { > - guest_efer = vmx->vcpu.arch.efer; > if (!(guest_efer & EFER_LMA)) > guest_efer &= ~EFER_LME; > if (guest_efer != host_efer) > add_atomic_switch_msr(vmx, MSR_EFER, > guest_efer, host_efer); So, why not set EFER_NX (if !ept) just in this branch to make the fix more simpler? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] KVM: MMU: fix ept=0/pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo 2016-03-10 8:27 ` Xiao Guangrong @ 2016-03-10 10:01 ` Paolo Bonzini 2016-03-10 10:09 ` Paolo Bonzini 1 sibling, 0 replies; 12+ messages in thread From: Paolo Bonzini @ 2016-03-10 10:01 UTC (permalink / raw) To: Xiao Guangrong, linux-kernel, kvm; +Cc: stable, Andy Lutomirski On 10/03/2016 09:27, Xiao Guangrong wrote: > So it only hurts the box which has cpu_has_load_ia32_efer support otherwise > NX is inherited from kernel (kernel always sets NX if CPU supports it), > right? Yes, but I think !cpu_has_load_ia32_efer && SMEP does not exist. On the other hand it's really only when disabling ept, so it's a weird corner case that only happens during testing. Paolo ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] KVM: MMU: fix ept=0/pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo 2016-03-10 8:27 ` Xiao Guangrong 2016-03-10 10:01 ` Paolo Bonzini @ 2016-03-10 10:09 ` Paolo Bonzini 2016-03-10 12:14 ` Xiao Guangrong 1 sibling, 1 reply; 12+ messages in thread From: Paolo Bonzini @ 2016-03-10 10:09 UTC (permalink / raw) To: Xiao Guangrong, linux-kernel, kvm; +Cc: stable, Andy Lutomirski On 10/03/2016 09:27, Xiao Guangrong wrote: >> > >> + if (!enable_ept) { >> + guest_efer |= EFER_NX; >> + ignore_bits |= EFER_NX; > > Update ignore_bits is not necessary i think. More precisely, ignore_bits is only needed if guest EFER.NX=0 and we're not in this CR0.WP=1/CR4.SMEP=0 situation. In theory you could have guest EFER.NX=1 and host EFER.NX=0. This is what I came up with (plus some comments :)): u64 guest_efer = vmx->vcpu.arch.efer; u64 ignore_bits = 0; if (!enable_ept) { if (boot_cpu_has(X86_FEATURE_SMEP)) guest_efer |= EFER_NX; else if (!(guest_efer & EFER_NX)) ignore_bits |= EFER_NX; } >> - guest_efer = vmx->vcpu.arch.efer; >> if (!(guest_efer & EFER_LMA)) >> guest_efer &= ~EFER_LME; >> if (guest_efer != host_efer) >> add_atomic_switch_msr(vmx, MSR_EFER, >> guest_efer, host_efer); > > So, why not set EFER_NX (if !ept) just in this branch to make the fix > more simpler? I didn't like having guest_efer = vmx->vcpu.arch.efer; ... if (!enable_ept) guest_efer |= EFER_NX; guest_efer &= ~ignore_bits; guest_efer |= host_efer & ignore_bits; ... if (...) { guest_efer = vmx->vcpu.arch.efer; if (!enable_ept) guest_efer |= EFER_NX; ... } My patch is bigger but the resulting code is smaller and easier to follow: guest_efer = vmx->vcpu.arch.efer; if (!enable_ept) guest_efer |= EFER_NX; ... if (...) { ... } else { guest_efer &= ~ignore_bits; guest_efer |= host_efer & ignore_bits; } Paolo ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] KVM: MMU: fix ept=0/pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo 2016-03-10 10:09 ` Paolo Bonzini @ 2016-03-10 12:14 ` Xiao Guangrong 2016-03-10 12:26 ` Paolo Bonzini 0 siblings, 1 reply; 12+ messages in thread From: Xiao Guangrong @ 2016-03-10 12:14 UTC (permalink / raw) To: Paolo Bonzini, linux-kernel, kvm; +Cc: stable, Andy Lutomirski On 03/10/2016 06:09 PM, Paolo Bonzini wrote: > > > On 10/03/2016 09:27, Xiao Guangrong wrote: >>> >> >>> + if (!enable_ept) { >>> + guest_efer |= EFER_NX; >>> + ignore_bits |= EFER_NX; >> >> Update ignore_bits is not necessary i think. > > More precisely, ignore_bits is only needed if guest EFER.NX=0 and we're > not in this CR0.WP=1/CR4.SMEP=0 situation. In theory you could have > guest EFER.NX=1 and host EFER.NX=0. It is not in linux, the kernel always set EFER.NX if CPUID reports it, arch/x86/kernel/head_64.S: 204 /* Setup EFER (Extended Feature Enable Register) */ 205 movl $MSR_EFER, %ecx 206 rdmsr 207 btsl $_EFER_SCE, %eax /* Enable System Call */ 208 btl $20,%edi /* No Execute supported? */ 209 jnc 1f 210 btsl $_EFER_NX, %eax 211 btsq $_PAGE_BIT_NX,early_pmd_flags(%rip) 212 1: wrmsr /* Make changes effective */ So if guest sees NX in its cpuid then host EFER.NX should be 1. > > This is what I came up with (plus some comments :)): > > u64 guest_efer = vmx->vcpu.arch.efer; > u64 ignore_bits = 0; > > if (!enable_ept) { > if (boot_cpu_has(X86_FEATURE_SMEP)) > guest_efer |= EFER_NX; > else if (!(guest_efer & EFER_NX)) > ignore_bits |= EFER_NX; > } Your logic is very right. What my suggestion is we can keep ignore_bits = EFER_NX | EFER_SCE; (needn't conditionally adjust it) because EFER_NX must be the same between guest and host if we switch EFER manually. > My patch is bigger but the resulting code is smaller and easier to follow: > > guest_efer = vmx->vcpu.arch.efer; > if (!enable_ept) > guest_efer |= EFER_NX; > ... > if (...) { > ... > } else { > guest_efer &= ~ignore_bits; > guest_efer |= host_efer & ignore_bits; > } I agreed. :) ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] KVM: MMU: fix ept=0/pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo 2016-03-10 12:14 ` Xiao Guangrong @ 2016-03-10 12:26 ` Paolo Bonzini 0 siblings, 0 replies; 12+ messages in thread From: Paolo Bonzini @ 2016-03-10 12:26 UTC (permalink / raw) To: Xiao Guangrong, linux-kernel, kvm; +Cc: stable, Andy Lutomirski On 10/03/2016 13:14, Xiao Guangrong wrote: >> More precisely, ignore_bits is only needed if guest EFER.NX=0 and we're >> not in this CR0.WP=1/CR4.SMEP=0 situation. In theory you could have >> guest EFER.NX=1 and host EFER.NX=0. > > It is not in linux, the kernel always set EFER.NX if CPUID reports it, > arch/x86/kernel/head_64.S: > > 204 /* Setup EFER (Extended Feature Enable Register) */ > 205 movl $MSR_EFER, %ecx > 206 rdmsr > 207 btsl $_EFER_SCE, %eax /* Enable System Call */ > 208 btl $20,%edi /* No Execute supported? */ > 209 jnc 1f > 210 btsl $_EFER_NX, %eax > 211 btsq $_PAGE_BIT_NX,early_pmd_flags(%rip) > 212 1: wrmsr /* Make changes effective */ > > So if guest sees NX in its cpuid then host EFER.NX should be 1. You're right. It's just in theory. But ignoring EFER.NX when it is 1 is technically not correct; since we have to add some special EFER_NX logic anyway, I preferred to make it pedantically right. :) Paolo ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] KVM: MMU: fix ept=0/pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo 2016-03-08 11:44 ` [PATCH 1/2] KVM: MMU: fix ept=0/pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo Paolo Bonzini 2016-03-10 8:27 ` Xiao Guangrong @ 2016-03-10 8:46 ` Xiao Guangrong 2016-03-10 10:03 ` Paolo Bonzini 1 sibling, 1 reply; 12+ messages in thread From: Xiao Guangrong @ 2016-03-10 8:46 UTC (permalink / raw) To: Paolo Bonzini, linux-kernel, kvm; +Cc: stable, Xiao Guangrong, Andy Lutomirski On 03/08/2016 07:44 PM, Paolo Bonzini wrote: > Yes, all of these are needed. :) This is admittedly a bit odd, but > kvm-unit-tests access.flat tests this if you run it with "-cpu host" > and of course ept=0. > > KVM handles supervisor writes of a pte.u=0/pte.w=0/CR0.WP=0 page by > setting U=0 and W=1 in the shadow PTE. This will cause a user write > to fault and a supervisor write to succeed (which is correct because > CR0.WP=0). A user read instead will flip U=0 to 1 and W=1 back to 0. BTW, it should be pte.u = 1 where you mentioned above. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] KVM: MMU: fix ept=0/pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo 2016-03-10 8:46 ` Xiao Guangrong @ 2016-03-10 10:03 ` Paolo Bonzini 0 siblings, 0 replies; 12+ messages in thread From: Paolo Bonzini @ 2016-03-10 10:03 UTC (permalink / raw) To: Xiao Guangrong, linux-kernel, kvm; +Cc: stable, Andy Lutomirski On 10/03/2016 09:46, Xiao Guangrong wrote: > >> Yes, all of these are needed. :) This is admittedly a bit odd, but >> kvm-unit-tests access.flat tests this if you run it with "-cpu host" >> and of course ept=0. >> >> KVM handles supervisor writes of a pte.u=0/pte.w=0/CR0.WP=0 page by >> setting U=0 and W=1 in the shadow PTE. This will cause a user write >> to fault and a supervisor write to succeed (which is correct because >> CR0.WP=0). A user read instead will flip U=0 to 1 and W=1 back to 0. > > BTW, it should be pte.u = 1 where you mentioned above. Ok, will fix. Paolo ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 2/2] KVM: MMU: fix reserved bit check for pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 2016-03-08 11:44 [PATCH 0/2] KVM: MMU: fix ept=0/pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 Paolo Bonzini 2016-03-08 11:44 ` [PATCH 1/2] KVM: MMU: fix ept=0/pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo Paolo Bonzini @ 2016-03-08 11:44 ` Paolo Bonzini 2016-03-10 8:36 ` Xiao Guangrong 1 sibling, 1 reply; 12+ messages in thread From: Paolo Bonzini @ 2016-03-08 11:44 UTC (permalink / raw) To: linux-kernel, kvm; +Cc: guangrong.xiao, stable, Xiao Guangrong KVM handles supervisor writes of a pte.u=0/pte.w=0/CR0.WP=0 page by setting U=0 and W=1 in the shadow PTE. This will cause a user write to fault and a supervisor write to succeed (which is correct because CR0.WP=0). A user read instead will flip U=0 to 1 and W=1 back to 0. This enables user reads; it also disables supervisor writes, the next of which will then flip the bits again. When SMEP is in effect, however, pte.u=0 will enable kernel execution of this page. To avoid this, KVM also sets pte.nx=1. The reserved bit catches this because it only looks at the guest's EFER.NX bit. Teach it that smep_andnot_wp will also use the NX bit of SPTEs. Cc: stable@vger.kernel.org Cc: Xiao Guangrong <guangrong.xiao@redhat.com> Fixes: c258b62b264fdc469b6d3610a907708068145e3b Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> --- arch/x86/kvm/mmu.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 95a955de5964..0cd4ee01de94 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -3721,13 +3721,15 @@ static void reset_rsvds_bits_mask_ept(struct kvm_vcpu *vcpu, void reset_shadow_zero_bits_mask(struct kvm_vcpu *vcpu, struct kvm_mmu *context) { + int uses_nx = context->nx || context->base_role.smep_andnot_wp; + /* * Passing "true" to the last argument is okay; it adds a check * on bit 8 of the SPTEs which KVM doesn't use anyway. */ __reset_rsvds_bits_mask(vcpu, &context->shadow_zero_check, boot_cpu_data.x86_phys_bits, - context->shadow_root_level, context->nx, + context->shadow_root_level, uses_nx, guest_cpuid_has_gbpages(vcpu), is_pse(vcpu), true); } -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] KVM: MMU: fix reserved bit check for pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 2016-03-08 11:44 ` [PATCH 2/2] KVM: MMU: fix reserved bit check for pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 Paolo Bonzini @ 2016-03-10 8:36 ` Xiao Guangrong 2016-03-10 10:02 ` Paolo Bonzini 0 siblings, 1 reply; 12+ messages in thread From: Xiao Guangrong @ 2016-03-10 8:36 UTC (permalink / raw) To: Paolo Bonzini, linux-kernel, kvm; +Cc: stable, Xiao Guangrong On 03/08/2016 07:44 PM, Paolo Bonzini wrote: > KVM handles supervisor writes of a pte.u=0/pte.w=0/CR0.WP=0 page by > setting U=0 and W=1 in the shadow PTE. This will cause a user write > to fault and a supervisor write to succeed (which is correct because > CR0.WP=0). A user read instead will flip U=0 to 1 and W=1 back to 0. > This enables user reads; it also disables supervisor writes, the next > of which will then flip the bits again. > > When SMEP is in effect, however, pte.u=0 will enable kernel execution > of this page. To avoid this, KVM also sets pte.nx=1. The reserved bit > catches this because it only looks at the guest's EFER.NX bit. Teach it > that smep_andnot_wp will also use the NX bit of SPTEs. > > Cc: stable@vger.kernel.org > Cc: Xiao Guangrong <guangrong.xiao@redhat.com> As a redhat guy i am so proud. :) > Fixes: c258b62b264fdc469b6d3610a907708068145e3b Thanks for you fixing it, Paolo! Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > --- > arch/x86/kvm/mmu.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c > index 95a955de5964..0cd4ee01de94 100644 > --- a/arch/x86/kvm/mmu.c > +++ b/arch/x86/kvm/mmu.c > @@ -3721,13 +3721,15 @@ static void reset_rsvds_bits_mask_ept(struct kvm_vcpu *vcpu, > void > reset_shadow_zero_bits_mask(struct kvm_vcpu *vcpu, struct kvm_mmu *context) > { > + int uses_nx = context->nx || context->base_role.smep_andnot_wp; It would be better if it is 'bool' ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] KVM: MMU: fix reserved bit check for pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 2016-03-10 8:36 ` Xiao Guangrong @ 2016-03-10 10:02 ` Paolo Bonzini 0 siblings, 0 replies; 12+ messages in thread From: Paolo Bonzini @ 2016-03-10 10:02 UTC (permalink / raw) To: Xiao Guangrong, linux-kernel, kvm; +Cc: stable On 10/03/2016 09:36, Xiao Guangrong wrote: > > > On 03/08/2016 07:44 PM, Paolo Bonzini wrote: >> KVM handles supervisor writes of a pte.u=0/pte.w=0/CR0.WP=0 page by >> setting U=0 and W=1 in the shadow PTE. This will cause a user write >> to fault and a supervisor write to succeed (which is correct because >> CR0.WP=0). A user read instead will flip U=0 to 1 and W=1 back to 0. >> This enables user reads; it also disables supervisor writes, the next >> of which will then flip the bits again. >> >> When SMEP is in effect, however, pte.u=0 will enable kernel execution >> of this page. To avoid this, KVM also sets pte.nx=1. The reserved bit >> catches this because it only looks at the guest's EFER.NX bit. Teach it >> that smep_andnot_wp will also use the NX bit of SPTEs. >> >> Cc: stable@vger.kernel.org >> Cc: Xiao Guangrong <guangrong.xiao@redhat.com> > > As a redhat guy i am so proud. :) > >> Fixes: c258b62b264fdc469b6d3610a907708068145e3b > > Thanks for you fixing it, Paolo! > > Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> > >> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> >> --- >> arch/x86/kvm/mmu.c | 4 +++- >> 1 file changed, 3 insertions(+), 1 deletion(-) >> >> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c >> index 95a955de5964..0cd4ee01de94 100644 >> --- a/arch/x86/kvm/mmu.c >> +++ b/arch/x86/kvm/mmu.c >> @@ -3721,13 +3721,15 @@ static void reset_rsvds_bits_mask_ept(struct >> kvm_vcpu *vcpu, >> void >> reset_shadow_zero_bits_mask(struct kvm_vcpu *vcpu, struct kvm_mmu >> *context) >> { >> + int uses_nx = context->nx || context->base_role.smep_andnot_wp; > > It would be better if it is 'bool' Ok, will do. Paolo ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2016-03-10 12:26 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-03-08 11:44 [PATCH 0/2] KVM: MMU: fix ept=0/pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 Paolo Bonzini 2016-03-08 11:44 ` [PATCH 1/2] KVM: MMU: fix ept=0/pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo Paolo Bonzini 2016-03-10 8:27 ` Xiao Guangrong 2016-03-10 10:01 ` Paolo Bonzini 2016-03-10 10:09 ` Paolo Bonzini 2016-03-10 12:14 ` Xiao Guangrong 2016-03-10 12:26 ` Paolo Bonzini 2016-03-10 8:46 ` Xiao Guangrong 2016-03-10 10:03 ` Paolo Bonzini 2016-03-08 11:44 ` [PATCH 2/2] KVM: MMU: fix reserved bit check for pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 Paolo Bonzini 2016-03-10 8:36 ` Xiao Guangrong 2016-03-10 10:02 ` Paolo Bonzini
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).