All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Andy Lutomirski <luto@amacapital.net>,
	Xiao Guangrong <guangrong.xiao@linux.intel.com>,
	Paolo Bonzini <pbonzini@redhat.com>
Subject: [PATCH 4.4 10/50] KVM: MMU: fix ept=0/pte.u=1/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo
Date: Mon, 14 Mar 2016 10:50:28 -0700	[thread overview]
Message-ID: <20160314175014.873643926@linuxfoundation.org> (raw)
In-Reply-To: <20160314175013.403628835@linuxfoundation.org>

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Paolo Bonzini <pbonzini@redhat.com>

commit 844a5fe219cf472060315971e15cbf97674a3324 upstream.

Yes, all of these are needed. :) This is admittedly a bit odd, but
kvm-unit-tests access.flat tests this if you run it with "-cpu host"
and of course ept=0.

KVM runs the guest with CR0.WP=1, so it must handle supervisor writes
specially when pte.u=1/pte.w=0/CR0.WP=0.  Such writes cause a fault
when U=1 and W=0 in the SPTE, but they must succeed because CR0.WP=0.
When KVM gets the fault, it sets U=0 and W=1 in the shadow PTE and
restarts execution.  This will still cause a user write to fault, while
supervisor writes will succeed.  User reads will fault spuriously now,
and KVM will then flip U and W again in the SPTE (U=1, W=0).  User reads
will be enabled and supervisor writes disabled, going back to the
originary situation where supervisor writes fault spuriously.

When SMEP is in effect, however, U=0 will enable kernel execution of
this page.  To avoid this, KVM also sets NX=1 in the shadow PTE together
with U=0.  If the guest has not enabled NX, the result is a continuous
stream of page faults due to the NX bit being reserved.

The fix is to force EFER.NX=1 even if the CPU is taking care of the EFER
switch.  (All machines with SMEP have the CPU_LOAD_IA32_EFER vm-entry
control, so they do not use user-return notifiers for EFER---if they did,
EFER.NX would be forced to the same value as the host).

There is another bug in the reserved bit check, which I've split to a
separate patch for easier application to stable kernels.

Cc: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Fixes: f6577a5fa15d82217ca73c74cd2dcbc0f6c781dd
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 Documentation/virtual/kvm/mmu.txt |    3 ++-
 arch/x86/kvm/vmx.c                |   36 +++++++++++++++++++++++-------------
 2 files changed, 25 insertions(+), 14 deletions(-)

--- a/Documentation/virtual/kvm/mmu.txt
+++ b/Documentation/virtual/kvm/mmu.txt
@@ -358,7 +358,8 @@ In the first case there are two addition
 - if CR4.SMEP is enabled: since we've turned the page into a kernel page,
   the kernel may now execute it.  We handle this by also setting spte.nx.
   If we get a user fetch or read fault, we'll change spte.u=1 and
-  spte.nx=gpte.nx back.
+  spte.nx=gpte.nx back.  For this to work, KVM forces EFER.NX to 1 when
+  shadow paging is in use.
 - if CR4.SMAP is disabled: since the page has been changed to a kernel
   page, it can not be reused when CR4.SMAP is enabled. We set
   CR4.SMAP && !CR0.WP into shadow page's role to avoid this case. Note,
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1792,26 +1792,31 @@ static void reload_tss(void)
 
 static bool update_transition_efer(struct vcpu_vmx *vmx, int efer_offset)
 {
-	u64 guest_efer;
-	u64 ignore_bits;
+	u64 guest_efer = vmx->vcpu.arch.efer;
+	u64 ignore_bits = 0;
 
-	guest_efer = vmx->vcpu.arch.efer;
+	if (!enable_ept) {
+		/*
+		 * NX is needed to handle CR0.WP=1, CR4.SMEP=1.  Testing
+		 * host CPUID is more efficient than testing guest CPUID
+		 * or CR4.  Host SMEP is anyway a requirement for guest SMEP.
+		 */
+		if (boot_cpu_has(X86_FEATURE_SMEP))
+			guest_efer |= EFER_NX;
+		else if (!(guest_efer & EFER_NX))
+			ignore_bits |= EFER_NX;
+	}
 
 	/*
-	 * NX is emulated; LMA and LME handled by hardware; SCE meaningless
-	 * outside long mode
+	 * LMA and LME handled by hardware; SCE meaningless outside long mode.
 	 */
-	ignore_bits = EFER_NX | EFER_SCE;
+	ignore_bits |= EFER_SCE;
 #ifdef CONFIG_X86_64
 	ignore_bits |= EFER_LMA | EFER_LME;
 	/* SCE is meaningful only in long mode on Intel */
 	if (guest_efer & EFER_LMA)
 		ignore_bits &= ~(u64)EFER_SCE;
 #endif
-	guest_efer &= ~ignore_bits;
-	guest_efer |= host_efer & ignore_bits;
-	vmx->guest_msrs[efer_offset].data = guest_efer;
-	vmx->guest_msrs[efer_offset].mask = ~ignore_bits;
 
 	clear_atomic_switch_msr(vmx, MSR_EFER);
 
@@ -1822,16 +1827,21 @@ static bool update_transition_efer(struc
 	 */
 	if (cpu_has_load_ia32_efer ||
 	    (enable_ept && ((vmx->vcpu.arch.efer ^ host_efer) & EFER_NX))) {
-		guest_efer = vmx->vcpu.arch.efer;
 		if (!(guest_efer & EFER_LMA))
 			guest_efer &= ~EFER_LME;
 		if (guest_efer != host_efer)
 			add_atomic_switch_msr(vmx, MSR_EFER,
 					      guest_efer, host_efer);
 		return false;
-	}
+	} else {
+		guest_efer &= ~ignore_bits;
+		guest_efer |= host_efer & ignore_bits;
 
-	return true;
+		vmx->guest_msrs[efer_offset].data = guest_efer;
+		vmx->guest_msrs[efer_offset].mask = ~ignore_bits;
+
+		return true;
+	}
 }
 
 static unsigned long segment_base(u16 selector)

  parent reply	other threads:[~2016-03-14 18:01 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-14 17:50 [PATCH 4.4 00/50] 4.4.6-stable review Greg Kroah-Hartman
2016-03-14 17:50 ` [PATCH 4.4 01/50] arm64: account for sparsemem section alignment when choosing vmemmap offset Greg Kroah-Hartman
2016-03-14 17:50 ` [PATCH 4.4 02/50] ARM: mvebu: fix overlap of Crypto SRAM with PCIe memory window Greg Kroah-Hartman
2016-03-14 17:50 ` [PATCH 4.4 03/50] ARM: dts: dra7: do not gate cpsw clock due to errata i877 Greg Kroah-Hartman
2016-03-14 17:50 ` [PATCH 4.4 04/50] ARM: OMAP2+: hwmod: Introduce ti,no-idle dt property Greg Kroah-Hartman
2016-03-14 17:50 ` [PATCH 4.4 06/50] kvm: cap halt polling at exactly halt_poll_ns Greg Kroah-Hartman
2016-03-14 17:50 ` [PATCH 4.4 08/50] KVM: s390: correct fprs on SIGP (STOP AND) STORE STATUS Greg Kroah-Hartman
2016-03-14 17:50 ` [PATCH 4.4 09/50] KVM: PPC: Book3S HV: Sanitize special-purpose register values on guest exit Greg Kroah-Hartman
2016-03-14 17:50 ` Greg Kroah-Hartman [this message]
2016-03-14 17:50 ` [PATCH 4.4 11/50] KVM: MMU: fix reserved bit check for ept=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 Greg Kroah-Hartman
2016-03-14 17:50 ` [PATCH 4.4 13/50] s390/dasd: fix diag 0x250 inline assembly Greg Kroah-Hartman
2016-03-14 17:50 ` [PATCH 4.4 14/50] tracing: Fix check for cpu online when event is disabled Greg Kroah-Hartman
2016-03-14 17:50 ` [PATCH 4.4 15/50] dmaengine: at_xdmac: fix residue computation Greg Kroah-Hartman
2016-03-14 17:50 ` [PATCH 4.4 16/50] jffs2: reduce the breakage on recovery from halfway failed rename() Greg Kroah-Hartman
2016-03-14 17:50 ` [PATCH 4.4 17/50] ncpfs: fix a braino in OOM handling in ncp_fill_cache() Greg Kroah-Hartman
2016-03-14 17:50 ` [PATCH 4.4 18/50] ASoC: dapm: Fix ctl value accesses in a wrong type Greg Kroah-Hartman
2016-03-14 17:50 ` [PATCH 4.4 19/50] ASoC: samsung: Use IRQ safe spin lock calls Greg Kroah-Hartman
2016-03-14 17:50 ` [PATCH 4.4 20/50] ASoC: wm8994: Fix enum ctl accesses in a wrong type Greg Kroah-Hartman
2016-03-14 17:50 ` [PATCH 4.4 21/50] ASoC: wm8958: " Greg Kroah-Hartman
2016-03-14 17:50 ` [PATCH 4.4 22/50] ovl: ignore lower entries when checking purity of non-directory entries Greg Kroah-Hartman
2016-03-14 17:50 ` [PATCH 4.4 23/50] ovl: fix working on distributed fs as lower layer Greg Kroah-Hartman
2016-03-14 17:50 ` [PATCH 4.4 24/50] wext: fix message delay/ordering Greg Kroah-Hartman
2016-03-16 12:49   ` Ben Hutchings
2016-03-14 17:50 ` [PATCH 4.4 25/50] cfg80211/wext: fix message ordering Greg Kroah-Hartman
2016-03-14 17:50 ` [PATCH 4.4 26/50] can: gs_usb: fixed disconnect bug by removing erroneous use of kfree() Greg Kroah-Hartman
2016-03-14 17:50 ` [PATCH 4.4 27/50] iwlwifi: mvm: inc pending frames counter also when txing non-sta Greg Kroah-Hartman
2016-03-14 17:50 ` [PATCH 4.4 28/50] mac80211: minstrel: Change expected throughput unit back to Kbps Greg Kroah-Hartman
2016-03-14 17:50 ` [PATCH 4.4 29/50] mac80211: fix use of uninitialised values in RX aggregation Greg Kroah-Hartman
2016-03-14 17:50 ` [PATCH 4.4 30/50] mac80211: minstrel_ht: set default tx aggregation timeout to 0 Greg Kroah-Hartman
2016-03-14 17:50 ` [PATCH 4.4 32/50] mac80211: check PN correctly for GCMP-encrypted fragmented MPDUs Greg Kroah-Hartman
2016-03-14 17:50 ` [PATCH 4.4 33/50] mac80211: Fix Public Action frame RX in AP mode Greg Kroah-Hartman
2016-03-14 17:50 ` [PATCH 4.4 34/50] gpu: ipu-v3: Do not bail out on missing optional port nodes Greg Kroah-Hartman
2016-03-14 17:50 ` [PATCH 4.4 35/50] x86/mm: Fix slow_virt_to_phys() for X86_PAE again Greg Kroah-Hartman
2016-03-14 17:50   ` Greg Kroah-Hartman
2016-03-14 17:50   ` Greg Kroah-Hartman
2016-03-14 17:50 ` [PATCH 4.4 39/50] Revert "drm/radeon/pm: adjust display configuration after powerstate" Greg Kroah-Hartman
2016-03-14 17:50 ` [PATCH 4.4 40/50] powerpc: Fix dedotify for binutils >= 2.26 Greg Kroah-Hartman
2016-03-14 17:50 ` [PATCH 4.4 41/50] powerpc/powernv: Add a kmsg_dumper that flushes console output on panic Greg Kroah-Hartman
2016-03-14 17:51 ` [PATCH 4.4 42/50] powerpc/powernv: Fix OPAL_CONSOLE_FLUSH prototype and usages Greg Kroah-Hartman
2016-03-14 17:51 ` [PATCH 4.4 43/50] userfaultfd: dont block on the last VM updates at exit time Greg Kroah-Hartman
2016-03-14 17:51 ` [PATCH 4.4 44/50] ovl: copy new uid/gid into overlayfs runtime inode Greg Kroah-Hartman
2016-03-14 17:51 ` [PATCH 4.4 45/50] ovl: fix getcwd() failure after unsuccessful rmdir Greg Kroah-Hartman
2016-03-14 17:51 ` [PATCH 4.4 46/50] MIPS: Fix build error when SMP is used without GIC Greg Kroah-Hartman
2016-03-14 17:51 ` [PATCH 4.4 47/50] MIPS: smp.c: Fix uninitialised temp_foreign_map Greg Kroah-Hartman
2016-03-14 17:51 ` [PATCH 4.4 48/50] block: dont optimize for non-cloned bio in bio_get_last_bvec() Greg Kroah-Hartman
2016-03-14 17:51 ` [PATCH 4.4 49/50] target: Drop incorrect ABORT_TASK put for completed commands Greg Kroah-Hartman
2016-03-14 17:51 ` [PATCH 4.4 50/50] ld-version: Fix awk regex compile failure Greg Kroah-Hartman
2016-03-14 23:12 ` [PATCH 4.4 00/50] 4.4.6-stable review Shuah Khan
2016-03-16 15:40   ` Greg Kroah-Hartman
2016-03-15  2:34 ` Guenter Roeck
2016-03-16 15:41   ` Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160314175014.873643926@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=guangrong.xiao@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=pbonzini@redhat.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.