From: Paolo Bonzini <pbonzini@redhat.com>
To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: guangrong.xiao@linux.intel.com, stable@vger.kernel.org,
Xiao Guangrong <guangrong.xiao@redhat.com>,
Andy Lutomirski <luto@amacapital.net>
Subject: [PATCH 1/2] KVM: MMU: fix ept=0/pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo
Date: Tue, 8 Mar 2016 12:44:26 +0100 [thread overview]
Message-ID: <1457437467-65707-2-git-send-email-pbonzini@redhat.com> (raw)
In-Reply-To: <1457437467-65707-1-git-send-email-pbonzini@redhat.com>
Yes, all of these are needed. :) This is admittedly a bit odd, but
kvm-unit-tests access.flat tests this if you run it with "-cpu host"
and of course ept=0.
KVM handles supervisor writes of a pte.u=0/pte.w=0/CR0.WP=0 page by
setting U=0 and W=1 in the shadow PTE. This will cause a user write
to fault and a supervisor write to succeed (which is correct because
CR0.WP=0). A user read instead will flip U=0 to 1 and W=1 back to 0.
This enables user reads; it also disables supervisor writes, the next
of which will then flip the bits again.
When SMEP is in effect, however, U=0 will enable kernel execution of
this page. To avoid this, KVM also sets NX=1 in the shadow PTE together
with U=0. If the guest has not enabled NX, the result is a continuous
stream of page faults due to the NX bit being reserved.
The fix is to force EFER.NX=1 even if the CPU is taking care of the EFER
switch.
There is another bug in the reserved bit check, which I've split to a
separate patch for easier application to stable kernels.
Cc: stable@vger.kernel.org
Cc: Xiao Guangrong <guangrong.xiao@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Fixes: f6577a5fa15d82217ca73c74cd2dcbc0f6c781dd
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
Documentation/virtual/kvm/mmu.txt | 3 ++-
arch/x86/kvm/vmx.c | 25 +++++++++++++++----------
2 files changed, 17 insertions(+), 11 deletions(-)
diff --git a/Documentation/virtual/kvm/mmu.txt b/Documentation/virtual/kvm/mmu.txt
index daf9c0f742d2..c81731096a43 100644
--- a/Documentation/virtual/kvm/mmu.txt
+++ b/Documentation/virtual/kvm/mmu.txt
@@ -358,7 +358,8 @@ In the first case there are two additional complications:
- if CR4.SMEP is enabled: since we've turned the page into a kernel page,
the kernel may now execute it. We handle this by also setting spte.nx.
If we get a user fetch or read fault, we'll change spte.u=1 and
- spte.nx=gpte.nx back.
+ spte.nx=gpte.nx back. For this to work, KVM forces EFER.NX to 1 when
+ shadow paging is in use.
- if CR4.SMAP is disabled: since the page has been changed to a kernel
page, it can not be reused when CR4.SMAP is enabled. We set
CR4.SMAP && !CR0.WP into shadow page's role to avoid this case. Note,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 6e51493ff4f9..91830809d837 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1863,20 +1863,20 @@ static bool update_transition_efer(struct vcpu_vmx *vmx, int efer_offset)
guest_efer = vmx->vcpu.arch.efer;
/*
- * NX is emulated; LMA and LME handled by hardware; SCE meaningless
- * outside long mode
+ * LMA and LME handled by hardware; SCE meaningless outside long mode.
*/
- ignore_bits = EFER_NX | EFER_SCE;
+ ignore_bits = EFER_SCE;
#ifdef CONFIG_X86_64
ignore_bits |= EFER_LMA | EFER_LME;
/* SCE is meaningful only in long mode on Intel */
if (guest_efer & EFER_LMA)
ignore_bits &= ~(u64)EFER_SCE;
#endif
- guest_efer &= ~ignore_bits;
- guest_efer |= host_efer & ignore_bits;
- vmx->guest_msrs[efer_offset].data = guest_efer;
- vmx->guest_msrs[efer_offset].mask = ~ignore_bits;
+ /* NX is needed to handle CR0.WP=1, CR4.SMEP=1. */
+ if (!enable_ept) {
+ guest_efer |= EFER_NX;
+ ignore_bits |= EFER_NX;
+ }
clear_atomic_switch_msr(vmx, MSR_EFER);
@@ -1887,16 +1887,21 @@ static bool update_transition_efer(struct vcpu_vmx *vmx, int efer_offset)
*/
if (cpu_has_load_ia32_efer ||
(enable_ept && ((vmx->vcpu.arch.efer ^ host_efer) & EFER_NX))) {
- guest_efer = vmx->vcpu.arch.efer;
if (!(guest_efer & EFER_LMA))
guest_efer &= ~EFER_LME;
if (guest_efer != host_efer)
add_atomic_switch_msr(vmx, MSR_EFER,
guest_efer, host_efer);
return false;
- }
+ } else {
+ guest_efer &= ~ignore_bits;
+ guest_efer |= host_efer & ignore_bits;
- return true;
+ vmx->guest_msrs[efer_offset].data = guest_efer;
+ vmx->guest_msrs[efer_offset].mask = ~ignore_bits;
+
+ return true;
+ }
}
static unsigned long segment_base(u16 selector)
--
1.8.3.1
next prev parent reply other threads:[~2016-03-08 11:46 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-03-08 11:44 [PATCH 0/2] KVM: MMU: fix ept=0/pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 Paolo Bonzini
2016-03-08 11:44 ` Paolo Bonzini [this message]
2016-03-10 8:27 ` [PATCH 1/2] KVM: MMU: fix ept=0/pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo Xiao Guangrong
2016-03-10 10:01 ` Paolo Bonzini
2016-03-10 10:09 ` Paolo Bonzini
2016-03-10 12:14 ` Xiao Guangrong
2016-03-10 12:26 ` Paolo Bonzini
2016-03-10 8:46 ` Xiao Guangrong
2016-03-10 10:03 ` Paolo Bonzini
2016-03-08 11:44 ` [PATCH 2/2] KVM: MMU: fix reserved bit check for pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 Paolo Bonzini
2016-03-10 8:36 ` Xiao Guangrong
2016-03-10 10:02 ` Paolo Bonzini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1457437467-65707-2-git-send-email-pbonzini@redhat.com \
--to=pbonzini@redhat.com \
--cc=guangrong.xiao@linux.intel.com \
--cc=guangrong.xiao@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@amacapital.net \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).