LKML Archive on lore.kernel.org
 help / color / Atom feed
From: Daniel Borkmann <daniel@iogearbox.net>
To: bp@suse.de
Cc: fengguang.wu@intel.com, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, lkp@01.org, x86@kernel.org,
	Daniel Borkmann <daniel@iogearbox.net>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Kees Cook <keescook@chromium.org>,
	Laura Abbott <labbott@redhat.com>, Ingo Molnar <mingo@kernel.org>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Rusty Russell <rusty@rustcorp.com.au>,
	Alexei Starovoitov <ast@kernel.org>,
	"David S. Miller" <davem@davemloft.net>
Subject: [PATCH] x86-32: fix tlb flushing when lguest clears PGE
Date: Sat, 11 Mar 2017 01:31:19 +0100
Message-ID: <25c41ad9eca164be4db9ad84f768965b7eb19d9e.1489191673.git.daniel@iogearbox.net> (raw)

Fengguang reported [1] random corruptions from various locations on
x86-32 after commits d2852a224050 ("arch: add ARCH_HAS_SET_MEMORY
config") and 9d876e79df6a ("bpf: fix unlocking of jited image when
module ronx not set") that uses the former. While x86-32 doesn't
have a JIT like x86_64, the bpf_prog_lock_ro() and bpf_prog_unlock_ro()
got enabled due to ARCH_HAS_SET_MEMORY, whereas Fengguang's test
kernel doesn't have module support built in and therefore never
had the DEBUG_SET_MODULE_RONX setting enabled.

After investigating the crashes further, it turned out that using
set_memory_ro() and set_memory_rw() didn't have the desired effect,
for example, setting the pages as read-only on x86-32 would still
let probe_kernel_write() succeed without error. This behavior would
manifest itself in situations where the vmalloc'ed buffer was accessed
prior to set_memory_*() such as in case of bpf_prog_alloc(). In
cases where it wasn't, the page attribute changes seemed to have
taken effect, leading to the conclusion that a TLB invalidate
didn't happen. Moreover, it turned out that this issue reproduced
with qemu in "-cpu kvm64" mode, but not for "-cpu host". When the
issue occurs, change_page_attr_set_clr() did trigger a TLB flush
as expected via __flush_tlb_all() through cpa_flush_range(), though.

There are 3 variants for issuing a TLB flush: invpcid_flush_all()
(depends on CPU feature bits X86_FEATURE_INVPCID, X86_FEATURE_PGE),
cr4 based flush (depends on X86_FEATURE_PGE), and cr3 based flush.
For "-cpu host" case in my setup, the flush used invpcid_flush_all()
variant, whereas for "-cpu kvm64", the flush was cr4 based. Switching
the kvm64 case to cr3 manually worked fine, and further investigating
the cr4 one turned out that X86_CR4_PGE bit was not set in cr4
register, meaning the __native_flush_tlb_global_irq_disabled() wrote
cr4 twice with the same value instead of clearing X86_CR4_PGE in the
first write to trigger the flush.

It turned out that X86_CR4_PGE was cleared from cr4 during init
from lguest_arch_host_init() via adjust_pge(). The X86_FEATURE_PGE
bit is also cleared from there due to concerns of using PGE in
guest kernel that can lead to hard to trace bugs (see bff672e630a0
("lguest: documentation V: Host") in init()). The CPU feature bits
are cleared in dynamic boot_cpu_data, but they never propagated to
__flush_tlb_all() as it uses static_cpu_has() instead of boot_cpu_has()
for testing which variant of TLB flushing to use, meaning they still
used the old setting of the host kernel.

Clearing via setup_clear_cpu_cap(X86_FEATURE_PGE) so this would
propagate to static_cpu_has() checks is too late at this point as
sections have been patched already, so for now, it seems reasonable
to switch back to boot_cpu_has(X86_FEATURE_PGE) as it was prior to
commit c109bf95992b ("x86/cpufeature: Remove cpu_has_pge"). This
lets the TLB flush trigger via cr3 as originally intended, properly
makes the new page attributes visible and thus fixes the crashes
seen by Fengguang.

  [1] https://lkml.org/lkml/2017/3/1/344

Fixes: c109bf95992b ("x86/cpufeature: Remove cpu_has_pge")
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: David S. Miller <davem@davemloft.net>
---
 arch/x86/include/asm/tlbflush.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index 6fa8594..fc5abff 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -188,7 +188,7 @@ static inline void __native_flush_tlb_single(unsigned long addr)
 
 static inline void __flush_tlb_all(void)
 {
-	if (static_cpu_has(X86_FEATURE_PGE))
+	if (boot_cpu_has(X86_FEATURE_PGE))
 		__flush_tlb_global();
 	else
 		__flush_tlb();
-- 
1.9.3

             reply index

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-11  0:31 Daniel Borkmann [this message]
2017-03-11 10:13 ` [tip:x86/urgent] x86/tlb: Fix " tip-bot for Daniel Borkmann
2017-03-12 10:27 ` tip-bot for Daniel Borkmann
2017-03-13  2:02 ` [PATCH] x86-32: fix " Kees Cook
2017-03-13 18:48   ` Rustad, Mark D

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=25c41ad9eca164be4db9ad84f768965b7eb19d9e.1489191673.git.daniel@iogearbox.net \
    --to=daniel@iogearbox.net \
    --cc=ast@kernel.org \
    --cc=bp@suse.de \
    --cc=davem@davemloft.net \
    --cc=fengguang.wu@intel.com \
    --cc=hpa@zytor.com \
    --cc=keescook@chromium.org \
    --cc=labbott@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkp@01.org \
    --cc=mingo@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=rusty@rustcorp.com.au \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git