From: Greg Kroah-Hartman <gregkh@linuxfoundation.org> To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>, stable@vger.kernel.org, Fengguang Wu <fengguang.wu@intel.com>, Daniel Borkmann <daniel@iogearbox.net>, bp@suse.de, Kees Cook <keescook@chromium.org>, "David S. Miller" <davem@davemloft.net>, netdev@vger.kernel.org, Rusty Russell <rusty@rustcorp.com.au>, Alexei Starovoitov <ast@kernel.org>, Linus Torvalds <torvalds@linux-foundation.org>, lkp@01.org, Laura Abbott <labbott@redhat.com>, Thomas Gleixner <tglx@linutronix.de> Subject: [PATCH 4.9 59/63] x86/tlb: Fix tlb flushing when lguest clears PGE Date: Mon, 13 Mar 2017 16:39:46 +0800 [thread overview] Message-ID: <20170313083417.860726585@linuxfoundation.org> (raw) In-Reply-To: <20170313083414.786638815@linuxfoundation.org> 4.9-stable review patch. If anyone has any objections, please let me know. ------------------ From: Daniel Borkmann <daniel@iogearbox.net> commit 2c4ea6e28dbf15ab93632c5c189f3948366b8885 upstream. Fengguang reported random corruptions from various locations on x86-32 after commits d2852a224050 ("arch: add ARCH_HAS_SET_MEMORY config") and 9d876e79df6a ("bpf: fix unlocking of jited image when module ronx not set") that uses the former. While x86-32 doesn't have a JIT like x86_64, the bpf_prog_lock_ro() and bpf_prog_unlock_ro() got enabled due to ARCH_HAS_SET_MEMORY, whereas Fengguang's test kernel doesn't have module support built in and therefore never had the DEBUG_SET_MODULE_RONX setting enabled. After investigating the crashes further, it turned out that using set_memory_ro() and set_memory_rw() didn't have the desired effect, for example, setting the pages as read-only on x86-32 would still let probe_kernel_write() succeed without error. This behavior would manifest itself in situations where the vmalloc'ed buffer was accessed prior to set_memory_*() such as in case of bpf_prog_alloc(). In cases where it wasn't, the page attribute changes seemed to have taken effect, leading to the conclusion that a TLB invalidate didn't happen. Moreover, it turned out that this issue reproduced with qemu in "-cpu kvm64" mode, but not for "-cpu host". When the issue occurs, change_page_attr_set_clr() did trigger a TLB flush as expected via __flush_tlb_all() through cpa_flush_range(), though. There are 3 variants for issuing a TLB flush: invpcid_flush_all() (depends on CPU feature bits X86_FEATURE_INVPCID, X86_FEATURE_PGE), cr4 based flush (depends on X86_FEATURE_PGE), and cr3 based flush. For "-cpu host" case in my setup, the flush used invpcid_flush_all() variant, whereas for "-cpu kvm64", the flush was cr4 based. Switching the kvm64 case to cr3 manually worked fine, and further investigating the cr4 one turned out that X86_CR4_PGE bit was not set in cr4 register, meaning the __native_flush_tlb_global_irq_disabled() wrote cr4 twice with the same value instead of clearing X86_CR4_PGE in the first write to trigger the flush. It turned out that X86_CR4_PGE was cleared from cr4 during init from lguest_arch_host_init() via adjust_pge(). The X86_FEATURE_PGE bit is also cleared from there due to concerns of using PGE in guest kernel that can lead to hard to trace bugs (see bff672e630a0 ("lguest: documentation V: Host") in init()). The CPU feature bits are cleared in dynamic boot_cpu_data, but they never propagated to __flush_tlb_all() as it uses static_cpu_has() instead of boot_cpu_has() for testing which variant of TLB flushing to use, meaning they still used the old setting of the host kernel. Clearing via setup_clear_cpu_cap(X86_FEATURE_PGE) so this would propagate to static_cpu_has() checks is too late at this point as sections have been patched already, so for now, it seems reasonable to switch back to boot_cpu_has(X86_FEATURE_PGE) as it was prior to commit c109bf95992b ("x86/cpufeature: Remove cpu_has_pge"). This lets the TLB flush trigger via cr3 as originally intended, properly makes the new page attributes visible and thus fixes the crashes seen by Fengguang. Fixes: c109bf95992b ("x86/cpufeature: Remove cpu_has_pge") Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: bp@suse.de Cc: Kees Cook <keescook@chromium.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: lkp@01.org Cc: Laura Abbott <labbott@redhat.com> Link: http://lkml.kernrl.org/r/20170301125426.l4nf65rx4wahohyl@wfg-t540p.sh.intel.com Link: http://lkml.kernel.org/r/25c41ad9eca164be4db9ad84f768965b7eb19d9e.1489191673.git.daniel@iogearbox.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> --- arch/x86/include/asm/tlbflush.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -188,7 +188,7 @@ static inline void __native_flush_tlb_si static inline void __flush_tlb_all(void) { - if (static_cpu_has(X86_FEATURE_PGE)) + if (boot_cpu_has(X86_FEATURE_PGE)) __flush_tlb_global(); else __flush_tlb();
WARNING: multiple messages have this Message-ID (diff)
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org> To: lkp@lists.01.org Subject: [PATCH 4.9 59/63] x86/tlb: Fix tlb flushing when lguest clears PGE Date: Mon, 13 Mar 2017 16:39:46 +0800 [thread overview] Message-ID: <20170313083417.860726585@linuxfoundation.org> (raw) In-Reply-To: <20170313083414.786638815@linuxfoundation.org> [-- Attachment #1: Type: text/plain, Size: 4512 bytes --] 4.9-stable review patch. If anyone has any objections, please let me know. ------------------ From: Daniel Borkmann <daniel@iogearbox.net> commit 2c4ea6e28dbf15ab93632c5c189f3948366b8885 upstream. Fengguang reported random corruptions from various locations on x86-32 after commits d2852a224050 ("arch: add ARCH_HAS_SET_MEMORY config") and 9d876e79df6a ("bpf: fix unlocking of jited image when module ronx not set") that uses the former. While x86-32 doesn't have a JIT like x86_64, the bpf_prog_lock_ro() and bpf_prog_unlock_ro() got enabled due to ARCH_HAS_SET_MEMORY, whereas Fengguang's test kernel doesn't have module support built in and therefore never had the DEBUG_SET_MODULE_RONX setting enabled. After investigating the crashes further, it turned out that using set_memory_ro() and set_memory_rw() didn't have the desired effect, for example, setting the pages as read-only on x86-32 would still let probe_kernel_write() succeed without error. This behavior would manifest itself in situations where the vmalloc'ed buffer was accessed prior to set_memory_*() such as in case of bpf_prog_alloc(). In cases where it wasn't, the page attribute changes seemed to have taken effect, leading to the conclusion that a TLB invalidate didn't happen. Moreover, it turned out that this issue reproduced with qemu in "-cpu kvm64" mode, but not for "-cpu host". When the issue occurs, change_page_attr_set_clr() did trigger a TLB flush as expected via __flush_tlb_all() through cpa_flush_range(), though. There are 3 variants for issuing a TLB flush: invpcid_flush_all() (depends on CPU feature bits X86_FEATURE_INVPCID, X86_FEATURE_PGE), cr4 based flush (depends on X86_FEATURE_PGE), and cr3 based flush. For "-cpu host" case in my setup, the flush used invpcid_flush_all() variant, whereas for "-cpu kvm64", the flush was cr4 based. Switching the kvm64 case to cr3 manually worked fine, and further investigating the cr4 one turned out that X86_CR4_PGE bit was not set in cr4 register, meaning the __native_flush_tlb_global_irq_disabled() wrote cr4 twice with the same value instead of clearing X86_CR4_PGE in the first write to trigger the flush. It turned out that X86_CR4_PGE was cleared from cr4 during init from lguest_arch_host_init() via adjust_pge(). The X86_FEATURE_PGE bit is also cleared from there due to concerns of using PGE in guest kernel that can lead to hard to trace bugs (see bff672e630a0 ("lguest: documentation V: Host") in init()). The CPU feature bits are cleared in dynamic boot_cpu_data, but they never propagated to __flush_tlb_all() as it uses static_cpu_has() instead of boot_cpu_has() for testing which variant of TLB flushing to use, meaning they still used the old setting of the host kernel. Clearing via setup_clear_cpu_cap(X86_FEATURE_PGE) so this would propagate to static_cpu_has() checks is too late at this point as sections have been patched already, so for now, it seems reasonable to switch back to boot_cpu_has(X86_FEATURE_PGE) as it was prior to commit c109bf95992b ("x86/cpufeature: Remove cpu_has_pge"). This lets the TLB flush trigger via cr3 as originally intended, properly makes the new page attributes visible and thus fixes the crashes seen by Fengguang. Fixes: c109bf95992b ("x86/cpufeature: Remove cpu_has_pge") Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: bp(a)suse.de Cc: Kees Cook <keescook@chromium.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev(a)vger.kernel.org Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: lkp(a)01.org Cc: Laura Abbott <labbott@redhat.com> Link: http://lkml.kernrl.org/r/20170301125426.l4nf65rx4wahohyl(a)wfg-t540p.sh.intel.com Link: http://lkml.kernel.org/r/25c41ad9eca164be4db9ad84f768965b7eb19d9e.1489191673.git.daniel(a)iogearbox.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> --- arch/x86/include/asm/tlbflush.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -188,7 +188,7 @@ static inline void __native_flush_tlb_si static inline void __flush_tlb_all(void) { - if (static_cpu_has(X86_FEATURE_PGE)) + if (boot_cpu_has(X86_FEATURE_PGE)) __flush_tlb_global(); else __flush_tlb();
next prev parent reply other threads:[~2017-03-13 9:05 UTC|newest] Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top 2017-03-13 8:38 [PATCH 4.9 00/63] 4.9.15-stable review Greg Kroah-Hartman 2017-03-13 8:38 ` [PATCH 4.9 01/63] tty: n_hdlc: get rid of racy n_hdlc.tbuf Greg Kroah-Hartman 2017-03-13 8:38 ` [PATCH 4.9 02/63] serial: 8250_pci: Add MKS Tenta SCOM-0800 and SCOM-0801 cards Greg Kroah-Hartman 2017-03-13 8:38 ` [PATCH 4.9 03/63] KVM: s390: Disable dirty log retrieval for UCONTROL guests Greg Kroah-Hartman 2017-03-13 8:38 ` [PATCH 4.9 04/63] KVM: VMX: use correct vmcs_read/write for guest segment selector/base Greg Kroah-Hartman 2017-03-13 8:38 ` [PATCH 4.9 05/63] Bluetooth: Add another AR3012 04ca:3018 device Greg Kroah-Hartman 2017-03-13 8:38 ` [PATCH 4.9 06/63] s390/qdio: clear DSCI prior to scanning multiple input queues Greg Kroah-Hartman 2017-03-13 8:38 ` [PATCH 4.9 07/63] s390/dcssblk: fix device size calculation in dcssblk_direct_access() Greg Kroah-Hartman 2017-03-13 8:38 ` [PATCH 4.9 08/63] s390/kdump: Use "LINUX" ELF note name instead of "CORE" Greg Kroah-Hartman 2017-03-13 8:38 ` [PATCH 4.9 09/63] s390/chsc: Add exception handler for CHSC instruction Greg Kroah-Hartman 2017-03-13 8:38 ` [PATCH 4.9 10/63] s390: TASK_SIZE for kernel threads Greg Kroah-Hartman 2017-03-13 8:38 ` [PATCH 4.9 11/63] s390: make setup_randomness work Greg Kroah-Hartman 2017-03-13 8:38 ` [PATCH 4.9 12/63] s390: use correct input data address for setup_randomness Greg Kroah-Hartman 2017-03-13 8:39 ` [PATCH 4.9 13/63] net: mvpp2: fix DMA address calculation in mvpp2_txq_inc_put() Greg Kroah-Hartman 2017-03-13 8:39 ` [PATCH 4.9 14/63] cxl: Prevent read/write to AFU config space while AFU not configured Greg Kroah-Hartman 2017-03-13 8:39 ` [PATCH 4.9 15/63] cxl: fix nested locking hang during EEH hotplug Greg Kroah-Hartman 2017-03-13 8:39 ` [PATCH 4.9 16/63] brcmfmac: fix incorrect event channel deduction Greg Kroah-Hartman 2017-03-13 8:39 ` [PATCH 4.9 17/63] mnt: Tuck mounts under others instead of creating shadow/side mounts Greg Kroah-Hartman 2017-03-13 8:39 ` [PATCH 4.9 18/63] IB/ipoib: Fix deadlock between rmmod and set_mode Greg Kroah-Hartman 2017-03-13 8:39 ` [PATCH 4.9 19/63] IB/IPoIB: Add destination address when re-queue packet Greg Kroah-Hartman 2017-03-13 8:39 ` [PATCH 4.9 20/63] IB/mlx5: Fix out-of-bound access Greg Kroah-Hartman 2017-03-13 8:39 ` [PATCH 4.9 21/63] IB/SRP: Avoid using IB_MR_TYPE_SG_GAPS Greg Kroah-Hartman 2017-03-13 8:39 ` [PATCH 4.9 22/63] IB/srp: Avoid that duplicate responses trigger a kernel bug Greg Kroah-Hartman 2017-03-13 8:39 ` [PATCH 4.9 23/63] IB/srp: Fix race conditions related to task management Greg Kroah-Hartman 2017-03-13 8:39 ` [PATCH 4.9 24/63] fs: Better permission checking for submounts Greg Kroah-Hartman 2017-03-13 8:39 ` [PATCH 4.9 25/63] orangefs: Use RCU for destroy_inode Greg Kroah-Hartman 2017-03-13 8:39 ` [PATCH 4.9 26/63] memory/atmel-ebi: Fix ns <-> cycles conversions Greg Kroah-Hartman 2017-03-13 8:39 ` [PATCH 4.9 27/63] ktest: Fix child exit code processing Greg Kroah-Hartman 2017-03-13 8:39 ` [PATCH 4.9 28/63] ceph: remove req from unsafe list when unregistering it Greg Kroah-Hartman 2017-03-13 8:39 ` [PATCH 4.9 29/63] pci/hotplug/pnv-php: Remove WARN_ON() in pnv_php_put_slot() Greg Kroah-Hartman 2017-03-13 8:39 ` [PATCH 4.9 30/63] pci/hotplug/pnv-php: Disable surprise hotplug capability on conflicts Greg Kroah-Hartman 2017-03-13 8:39 ` [PATCH 4.9 31/63] target: Fix NULL dereference during LUN lookup + active I/O shutdown Greg Kroah-Hartman 2017-03-13 8:39 ` [PATCH 4.9 32/63] drivers/pci/hotplug: Handle presence detection change properly Greg Kroah-Hartman 2017-03-13 8:39 ` [PATCH 4.9 33/63] drivers/pci/hotplug: Fix initial state for empty slot Greg Kroah-Hartman 2017-03-13 8:39 ` [PATCH 4.9 34/63] nlm: Ensure callback code also checks that the files match Greg Kroah-Hartman 2017-03-13 8:39 ` [PATCH 4.9 35/63] pwm: pca9685: Fix period change with same duty cycle Greg Kroah-Hartman 2017-03-13 8:39 ` [PATCH 4.9 36/63] xtensa: move parse_tag_fdt out of #ifdef CONFIG_BLK_DEV_INITRD Greg Kroah-Hartman 2017-03-13 8:39 ` [PATCH 4.9 37/63] nfit, libnvdimm: fix interleave set cookie calculation Greg Kroah-Hartman 2017-03-13 8:39 ` [PATCH 4.9 38/63] mac80211: flush delayed work when entering suspend Greg Kroah-Hartman 2017-03-13 8:39 ` [PATCH 4.9 39/63] mac80211: dont reorder frames with SN smaller than SSN Greg Kroah-Hartman 2017-03-13 8:39 ` [PATCH 4.9 40/63] mac80211: dont handle filtered frames within a BA session Greg Kroah-Hartman 2017-03-13 8:39 ` [PATCH 4.9 43/63] drm/ast: Fix test for VGA enabled Greg Kroah-Hartman 2017-03-13 8:39 ` [PATCH 4.9 44/63] drm/ast: Call open_key before enable_mmio in POST code Greg Kroah-Hartman 2017-03-13 8:39 ` [PATCH 4.9 45/63] drm/ast: Fix AST2400 POST failure without BMC FW or VBIOS Greg Kroah-Hartman 2017-03-13 8:39 ` [PATCH 4.9 48/63] drm/vmwgfx: Work around drm removal of control nodes Greg Kroah-Hartman 2017-03-13 8:39 ` [PATCH 4.9 49/63] dmaengine: imx-sdma - correct the dma transfer residue calculation Greg Kroah-Hartman 2017-03-13 8:39 ` [PATCH 4.9 50/63] drm/imx: imx-tve: Do not set the regulator voltage Greg Kroah-Hartman 2017-03-13 8:39 ` [PATCH 4.9 51/63] drm/atomic: fix an error code in mode_fixup() Greg Kroah-Hartman 2017-03-13 8:39 ` [PATCH 4.9 52/63] drm/i915/gvt: Disable access to stolen memory as a guest Greg Kroah-Hartman 2017-03-13 8:39 ` [PATCH 4.9 53/63] drm: Cancel drm_fb_helper_dirty_work on unload Greg Kroah-Hartman 2017-03-13 8:39 ` [PATCH 4.9 54/63] drm: Cancel drm_fb_helper_resume_work " Greg Kroah-Hartman 2017-03-13 8:39 ` [PATCH 4.9 56/63] drm/i915: Fix not finding the VBT when it overlaps with OPREGION_ASLE_EXT Greg Kroah-Hartman 2017-03-13 8:39 ` [PATCH 4.9 57/63] libceph: use BUG() instead of BUG_ON(1) Greg Kroah-Hartman 2017-03-13 8:39 ` [PATCH 4.9 58/63] x86, mm: fix gup_pte_range() vs DAX mappings Greg Kroah-Hartman 2017-03-13 8:39 ` Greg Kroah-Hartman [this message] 2017-03-13 8:39 ` [PATCH 4.9 59/63] x86/tlb: Fix tlb flushing when lguest clears PGE Greg Kroah-Hartman 2017-03-13 8:39 ` [PATCH 4.9 60/63] thp: fix another corner case of munlock() vs. THPs Greg Kroah-Hartman 2017-03-13 8:39 ` [PATCH 4.9 61/63] mm: do not call mem_cgroup_free() from within mem_cgroup_alloc() Greg Kroah-Hartman 2017-03-13 8:39 ` [PATCH 4.9 62/63] fat: fix using uninitialized fields of fat_inode/fsinfo_inode Greg Kroah-Hartman 2017-03-13 8:39 ` [PATCH 4.9 63/63] drivers: hv: Turn off write permission on the hypercall page Greg Kroah-Hartman 2017-03-13 22:37 ` [PATCH 4.9 00/63] 4.9.15-stable review Guenter Roeck
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20170313083417.860726585@linuxfoundation.org \ --to=gregkh@linuxfoundation.org \ --cc=ast@kernel.org \ --cc=bp@suse.de \ --cc=daniel@iogearbox.net \ --cc=davem@davemloft.net \ --cc=fengguang.wu@intel.com \ --cc=keescook@chromium.org \ --cc=labbott@redhat.com \ --cc=linux-kernel@vger.kernel.org \ --cc=lkp@01.org \ --cc=netdev@vger.kernel.org \ --cc=rusty@rustcorp.com.au \ --cc=stable@vger.kernel.org \ --cc=tglx@linutronix.de \ --cc=torvalds@linux-foundation.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.