All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Fengguang Wu <fengguang.wu@intel.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	bp@suse.de, Kees Cook <keescook@chromium.org>,
	"David S. Miller" <davem@davemloft.net>,
	netdev@vger.kernel.org, Rusty Russell <rusty@rustcorp.com.au>,
	Alexei Starovoitov <ast@kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	lkp@01.org, Laura Abbott <labbott@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: [PATCH 4.9 59/63] x86/tlb: Fix tlb flushing when lguest clears PGE
Date: Mon, 13 Mar 2017 16:39:46 +0800	[thread overview]
Message-ID: <20170313083417.860726585@linuxfoundation.org> (raw)
In-Reply-To: <20170313083414.786638815@linuxfoundation.org>

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Daniel Borkmann <daniel@iogearbox.net>

commit 2c4ea6e28dbf15ab93632c5c189f3948366b8885 upstream.

Fengguang reported random corruptions from various locations on x86-32
after commits d2852a224050 ("arch: add ARCH_HAS_SET_MEMORY config") and
9d876e79df6a ("bpf: fix unlocking of jited image when module ronx not set")
that uses the former. While x86-32 doesn't have a JIT like x86_64, the
bpf_prog_lock_ro() and bpf_prog_unlock_ro() got enabled due to
ARCH_HAS_SET_MEMORY, whereas Fengguang's test kernel doesn't have module
support built in and therefore never had the DEBUG_SET_MODULE_RONX setting
enabled.

After investigating the crashes further, it turned out that using
set_memory_ro() and set_memory_rw() didn't have the desired effect, for
example, setting the pages as read-only on x86-32 would still let
probe_kernel_write() succeed without error. This behavior would manifest
itself in situations where the vmalloc'ed buffer was accessed prior to
set_memory_*() such as in case of bpf_prog_alloc(). In cases where it
wasn't, the page attribute changes seemed to have taken effect, leading to
the conclusion that a TLB invalidate didn't happen. Moreover, it turned out
that this issue reproduced with qemu in "-cpu kvm64" mode, but not for
"-cpu host". When the issue occurs, change_page_attr_set_clr() did trigger
a TLB flush as expected via __flush_tlb_all() through cpa_flush_range(),
though.

There are 3 variants for issuing a TLB flush: invpcid_flush_all() (depends
on CPU feature bits X86_FEATURE_INVPCID, X86_FEATURE_PGE), cr4 based flush
(depends on X86_FEATURE_PGE), and cr3 based flush.  For "-cpu host" case in
my setup, the flush used invpcid_flush_all() variant, whereas for "-cpu
kvm64", the flush was cr4 based. Switching the kvm64 case to cr3 manually
worked fine, and further investigating the cr4 one turned out that
X86_CR4_PGE bit was not set in cr4 register, meaning the
__native_flush_tlb_global_irq_disabled() wrote cr4 twice with the same
value instead of clearing X86_CR4_PGE in the first write to trigger the
flush.

It turned out that X86_CR4_PGE was cleared from cr4 during init from
lguest_arch_host_init() via adjust_pge(). The X86_FEATURE_PGE bit is also
cleared from there due to concerns of using PGE in guest kernel that can
lead to hard to trace bugs (see bff672e630a0 ("lguest: documentation V:
Host") in init()). The CPU feature bits are cleared in dynamic
boot_cpu_data, but they never propagated to __flush_tlb_all() as it uses
static_cpu_has() instead of boot_cpu_has() for testing which variant of TLB
flushing to use, meaning they still used the old setting of the host
kernel.

Clearing via setup_clear_cpu_cap(X86_FEATURE_PGE) so this would propagate
to static_cpu_has() checks is too late at this point as sections have been
patched already, so for now, it seems reasonable to switch back to
boot_cpu_has(X86_FEATURE_PGE) as it was prior to commit c109bf95992b
("x86/cpufeature: Remove cpu_has_pge"). This lets the TLB flush trigger via
cr3 as originally intended, properly makes the new page attributes visible
and thus fixes the crashes seen by Fengguang.

Fixes: c109bf95992b ("x86/cpufeature: Remove cpu_has_pge")
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: bp@suse.de
Cc: Kees Cook <keescook@chromium.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: lkp@01.org
Cc: Laura Abbott <labbott@redhat.com>
Link: http://lkml.kernrl.org/r/20170301125426.l4nf65rx4wahohyl@wfg-t540p.sh.intel.com
Link: http://lkml.kernel.org/r/25c41ad9eca164be4db9ad84f768965b7eb19d9e.1489191673.git.daniel@iogearbox.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/x86/include/asm/tlbflush.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -188,7 +188,7 @@ static inline void __native_flush_tlb_si
 
 static inline void __flush_tlb_all(void)
 {
-	if (static_cpu_has(X86_FEATURE_PGE))
+	if (boot_cpu_has(X86_FEATURE_PGE))
 		__flush_tlb_global();
 	else
 		__flush_tlb();

WARNING: multiple messages have this Message-ID (diff)
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: lkp@lists.01.org
Subject: [PATCH 4.9 59/63] x86/tlb: Fix tlb flushing when lguest clears PGE
Date: Mon, 13 Mar 2017 16:39:46 +0800	[thread overview]
Message-ID: <20170313083417.860726585@linuxfoundation.org> (raw)
In-Reply-To: <20170313083414.786638815@linuxfoundation.org>

[-- Attachment #1: Type: text/plain, Size: 4512 bytes --]

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Daniel Borkmann <daniel@iogearbox.net>

commit 2c4ea6e28dbf15ab93632c5c189f3948366b8885 upstream.

Fengguang reported random corruptions from various locations on x86-32
after commits d2852a224050 ("arch: add ARCH_HAS_SET_MEMORY config") and
9d876e79df6a ("bpf: fix unlocking of jited image when module ronx not set")
that uses the former. While x86-32 doesn't have a JIT like x86_64, the
bpf_prog_lock_ro() and bpf_prog_unlock_ro() got enabled due to
ARCH_HAS_SET_MEMORY, whereas Fengguang's test kernel doesn't have module
support built in and therefore never had the DEBUG_SET_MODULE_RONX setting
enabled.

After investigating the crashes further, it turned out that using
set_memory_ro() and set_memory_rw() didn't have the desired effect, for
example, setting the pages as read-only on x86-32 would still let
probe_kernel_write() succeed without error. This behavior would manifest
itself in situations where the vmalloc'ed buffer was accessed prior to
set_memory_*() such as in case of bpf_prog_alloc(). In cases where it
wasn't, the page attribute changes seemed to have taken effect, leading to
the conclusion that a TLB invalidate didn't happen. Moreover, it turned out
that this issue reproduced with qemu in "-cpu kvm64" mode, but not for
"-cpu host". When the issue occurs, change_page_attr_set_clr() did trigger
a TLB flush as expected via __flush_tlb_all() through cpa_flush_range(),
though.

There are 3 variants for issuing a TLB flush: invpcid_flush_all() (depends
on CPU feature bits X86_FEATURE_INVPCID, X86_FEATURE_PGE), cr4 based flush
(depends on X86_FEATURE_PGE), and cr3 based flush.  For "-cpu host" case in
my setup, the flush used invpcid_flush_all() variant, whereas for "-cpu
kvm64", the flush was cr4 based. Switching the kvm64 case to cr3 manually
worked fine, and further investigating the cr4 one turned out that
X86_CR4_PGE bit was not set in cr4 register, meaning the
__native_flush_tlb_global_irq_disabled() wrote cr4 twice with the same
value instead of clearing X86_CR4_PGE in the first write to trigger the
flush.

It turned out that X86_CR4_PGE was cleared from cr4 during init from
lguest_arch_host_init() via adjust_pge(). The X86_FEATURE_PGE bit is also
cleared from there due to concerns of using PGE in guest kernel that can
lead to hard to trace bugs (see bff672e630a0 ("lguest: documentation V:
Host") in init()). The CPU feature bits are cleared in dynamic
boot_cpu_data, but they never propagated to __flush_tlb_all() as it uses
static_cpu_has() instead of boot_cpu_has() for testing which variant of TLB
flushing to use, meaning they still used the old setting of the host
kernel.

Clearing via setup_clear_cpu_cap(X86_FEATURE_PGE) so this would propagate
to static_cpu_has() checks is too late at this point as sections have been
patched already, so for now, it seems reasonable to switch back to
boot_cpu_has(X86_FEATURE_PGE) as it was prior to commit c109bf95992b
("x86/cpufeature: Remove cpu_has_pge"). This lets the TLB flush trigger via
cr3 as originally intended, properly makes the new page attributes visible
and thus fixes the crashes seen by Fengguang.

Fixes: c109bf95992b ("x86/cpufeature: Remove cpu_has_pge")
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: bp(a)suse.de
Cc: Kees Cook <keescook@chromium.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev(a)vger.kernel.org
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: lkp(a)01.org
Cc: Laura Abbott <labbott@redhat.com>
Link: http://lkml.kernrl.org/r/20170301125426.l4nf65rx4wahohyl(a)wfg-t540p.sh.intel.com
Link: http://lkml.kernel.org/r/25c41ad9eca164be4db9ad84f768965b7eb19d9e.1489191673.git.daniel(a)iogearbox.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/x86/include/asm/tlbflush.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -188,7 +188,7 @@ static inline void __native_flush_tlb_si
 
 static inline void __flush_tlb_all(void)
 {
-	if (static_cpu_has(X86_FEATURE_PGE))
+	if (boot_cpu_has(X86_FEATURE_PGE))
 		__flush_tlb_global();
 	else
 		__flush_tlb();



  parent reply	other threads:[~2017-03-13  9:05 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-13  8:38 [PATCH 4.9 00/63] 4.9.15-stable review Greg Kroah-Hartman
2017-03-13  8:38 ` [PATCH 4.9 01/63] tty: n_hdlc: get rid of racy n_hdlc.tbuf Greg Kroah-Hartman
2017-03-13  8:38 ` [PATCH 4.9 02/63] serial: 8250_pci: Add MKS Tenta SCOM-0800 and SCOM-0801 cards Greg Kroah-Hartman
2017-03-13  8:38 ` [PATCH 4.9 03/63] KVM: s390: Disable dirty log retrieval for UCONTROL guests Greg Kroah-Hartman
2017-03-13  8:38 ` [PATCH 4.9 04/63] KVM: VMX: use correct vmcs_read/write for guest segment selector/base Greg Kroah-Hartman
2017-03-13  8:38 ` [PATCH 4.9 05/63] Bluetooth: Add another AR3012 04ca:3018 device Greg Kroah-Hartman
2017-03-13  8:38 ` [PATCH 4.9 06/63] s390/qdio: clear DSCI prior to scanning multiple input queues Greg Kroah-Hartman
2017-03-13  8:38 ` [PATCH 4.9 07/63] s390/dcssblk: fix device size calculation in dcssblk_direct_access() Greg Kroah-Hartman
2017-03-13  8:38 ` [PATCH 4.9 08/63] s390/kdump: Use "LINUX" ELF note name instead of "CORE" Greg Kroah-Hartman
2017-03-13  8:38 ` [PATCH 4.9 09/63] s390/chsc: Add exception handler for CHSC instruction Greg Kroah-Hartman
2017-03-13  8:38 ` [PATCH 4.9 10/63] s390: TASK_SIZE for kernel threads Greg Kroah-Hartman
2017-03-13  8:38 ` [PATCH 4.9 11/63] s390: make setup_randomness work Greg Kroah-Hartman
2017-03-13  8:38 ` [PATCH 4.9 12/63] s390: use correct input data address for setup_randomness Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.9 13/63] net: mvpp2: fix DMA address calculation in mvpp2_txq_inc_put() Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.9 14/63] cxl: Prevent read/write to AFU config space while AFU not configured Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.9 15/63] cxl: fix nested locking hang during EEH hotplug Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.9 16/63] brcmfmac: fix incorrect event channel deduction Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.9 17/63] mnt: Tuck mounts under others instead of creating shadow/side mounts Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.9 18/63] IB/ipoib: Fix deadlock between rmmod and set_mode Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.9 19/63] IB/IPoIB: Add destination address when re-queue packet Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.9 20/63] IB/mlx5: Fix out-of-bound access Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.9 21/63] IB/SRP: Avoid using IB_MR_TYPE_SG_GAPS Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.9 22/63] IB/srp: Avoid that duplicate responses trigger a kernel bug Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.9 23/63] IB/srp: Fix race conditions related to task management Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.9 24/63] fs: Better permission checking for submounts Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.9 25/63] orangefs: Use RCU for destroy_inode Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.9 26/63] memory/atmel-ebi: Fix ns <-> cycles conversions Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.9 27/63] ktest: Fix child exit code processing Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.9 28/63] ceph: remove req from unsafe list when unregistering it Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.9 29/63] pci/hotplug/pnv-php: Remove WARN_ON() in pnv_php_put_slot() Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.9 30/63] pci/hotplug/pnv-php: Disable surprise hotplug capability on conflicts Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.9 31/63] target: Fix NULL dereference during LUN lookup + active I/O shutdown Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.9 32/63] drivers/pci/hotplug: Handle presence detection change properly Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.9 33/63] drivers/pci/hotplug: Fix initial state for empty slot Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.9 34/63] nlm: Ensure callback code also checks that the files match Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.9 35/63] pwm: pca9685: Fix period change with same duty cycle Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.9 36/63] xtensa: move parse_tag_fdt out of #ifdef CONFIG_BLK_DEV_INITRD Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.9 37/63] nfit, libnvdimm: fix interleave set cookie calculation Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.9 38/63] mac80211: flush delayed work when entering suspend Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.9 39/63] mac80211: dont reorder frames with SN smaller than SSN Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.9 40/63] mac80211: dont handle filtered frames within a BA session Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.9 43/63] drm/ast: Fix test for VGA enabled Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.9 44/63] drm/ast: Call open_key before enable_mmio in POST code Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.9 45/63] drm/ast: Fix AST2400 POST failure without BMC FW or VBIOS Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.9 48/63] drm/vmwgfx: Work around drm removal of control nodes Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.9 49/63] dmaengine: imx-sdma - correct the dma transfer residue calculation Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.9 50/63] drm/imx: imx-tve: Do not set the regulator voltage Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.9 51/63] drm/atomic: fix an error code in mode_fixup() Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.9 52/63] drm/i915/gvt: Disable access to stolen memory as a guest Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.9 53/63] drm: Cancel drm_fb_helper_dirty_work on unload Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.9 54/63] drm: Cancel drm_fb_helper_resume_work " Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.9 56/63] drm/i915: Fix not finding the VBT when it overlaps with OPREGION_ASLE_EXT Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.9 57/63] libceph: use BUG() instead of BUG_ON(1) Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.9 58/63] x86, mm: fix gup_pte_range() vs DAX mappings Greg Kroah-Hartman
2017-03-13  8:39 ` Greg Kroah-Hartman [this message]
2017-03-13  8:39   ` [PATCH 4.9 59/63] x86/tlb: Fix tlb flushing when lguest clears PGE Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.9 60/63] thp: fix another corner case of munlock() vs. THPs Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.9 61/63] mm: do not call mem_cgroup_free() from within mem_cgroup_alloc() Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.9 62/63] fat: fix using uninitialized fields of fat_inode/fsinfo_inode Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.9 63/63] drivers: hv: Turn off write permission on the hypercall page Greg Kroah-Hartman
2017-03-13 22:37 ` [PATCH 4.9 00/63] 4.9.15-stable review Guenter Roeck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170313083417.860726585@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=ast@kernel.org \
    --cc=bp@suse.de \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=fengguang.wu@intel.com \
    --cc=keescook@chromium.org \
    --cc=labbott@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkp@01.org \
    --cc=netdev@vger.kernel.org \
    --cc=rusty@rustcorp.com.au \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.