linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [patch 24/60] x86/paravirt: Dont patch flush_tlb_single
       [not found] <20171204140706.296109558@linutronix.de>
@ 2017-12-04 14:07 ` Thomas Gleixner
  2017-12-05 12:18   ` Juergen Gross
  2017-12-04 14:07 ` [patch 28/60] x86/mm/kpti: Disable global pages if KERNEL_PAGE_TABLE_ISOLATION=y Thomas Gleixner
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 16+ messages in thread
From: Thomas Gleixner @ 2017-12-04 14:07 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Andy Lutomirsky, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Greg KH, keescook, hughd,
	Brian Gerst, Josh Poimboeuf, Denys Vlasenko, Rik van Riel,
	Boris Ostrovsky, Juergen Gross, David Laight, Eduardo Valentin,
	aliguori, Will Deacon, daniel.gruss, Dave Hansen,
	michael.schwarz, linux-mm, Borislav Petkov, moritz.lipp,
	richard.fellner

[-- Attachment #1: x86-paravirt--Dont_patch_flush_tlb_single.patch --]
[-- Type: text/plain, Size: 2086 bytes --]

From: Thomas Gleixner <tglx@linutronix.de>

native_flush_tlb_single() will be changed with the upcoming
KERNEL_PAGE_TABLE_ISOLATION feature. This requires to have more code in
there than INVLPG.

Remove the paravirt patching for it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: michael.schwarz@iaik.tugraz.at
Cc: Brian Gerst <brgerst@gmail.com>
Cc: hughd@google.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: linux-mm@kvack.org
Cc: Borislav Petkov <bp@alien8.de>
Cc: moritz.lipp@iaik.tugraz.at
Cc: keescook@google.com
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: richard.fellner@student.tugraz.at

---
 arch/x86/kernel/paravirt_patch_64.c |    2 --
 1 file changed, 2 deletions(-)

--- a/arch/x86/kernel/paravirt_patch_64.c
+++ b/arch/x86/kernel/paravirt_patch_64.c
@@ -10,7 +10,6 @@ DEF_NATIVE(pv_irq_ops, save_fl, "pushfq;
 DEF_NATIVE(pv_mmu_ops, read_cr2, "movq %cr2, %rax");
 DEF_NATIVE(pv_mmu_ops, read_cr3, "movq %cr3, %rax");
 DEF_NATIVE(pv_mmu_ops, write_cr3, "movq %rdi, %cr3");
-DEF_NATIVE(pv_mmu_ops, flush_tlb_single, "invlpg (%rdi)");
 DEF_NATIVE(pv_cpu_ops, wbinvd, "wbinvd");
 
 DEF_NATIVE(pv_cpu_ops, usergs_sysret64, "swapgs; sysretq");
@@ -60,7 +59,6 @@ unsigned native_patch(u8 type, u16 clobb
 		PATCH_SITE(pv_mmu_ops, read_cr2);
 		PATCH_SITE(pv_mmu_ops, read_cr3);
 		PATCH_SITE(pv_mmu_ops, write_cr3);
-		PATCH_SITE(pv_mmu_ops, flush_tlb_single);
 		PATCH_SITE(pv_cpu_ops, wbinvd);
 #if defined(CONFIG_PARAVIRT_SPINLOCKS)
 		case PARAVIRT_PATCH(pv_lock_ops.queued_spin_unlock):


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [patch 28/60] x86/mm/kpti: Disable global pages if KERNEL_PAGE_TABLE_ISOLATION=y
       [not found] <20171204140706.296109558@linutronix.de>
  2017-12-04 14:07 ` [patch 24/60] x86/paravirt: Dont patch flush_tlb_single Thomas Gleixner
@ 2017-12-04 14:07 ` Thomas Gleixner
  2017-12-05 14:34   ` Borislav Petkov
  2017-12-04 14:07 ` [patch 29/60] x86/mm/kpti: Prepare the x86/entry assembly code for entry/exit CR3 switching Thomas Gleixner
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 16+ messages in thread
From: Thomas Gleixner @ 2017-12-04 14:07 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Andy Lutomirsky, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Greg KH, keescook, hughd,
	Brian Gerst, Josh Poimboeuf, Denys Vlasenko, Rik van Riel,
	Boris Ostrovsky, Juergen Gross, David Laight, Eduardo Valentin,
	aliguori, Will Deacon, daniel.gruss, Dave Hansen, Ingo Molnar,
	moritz.lipp, linux-mm, richard.fellner, michael.schwarz

[-- Attachment #1: x86-mm-kpti--Disable_global_pages_if_KERNEL_PAGE_TABLE_ISOLATION-y.patch --]
[-- Type: text/plain, Size: 2974 bytes --]

From: Dave Hansen <dave.hansen@linux.intel.com>

Global pages stay in the TLB across context switches.  Since all contexts
share the same kernel mapping, these mappings are marked as global pages
so kernel entries in the TLB are not flushed out on a context switch.

But, even having these entries in the TLB opens up something that an
attacker can use, such as the double-page-fault attack:

   http://www.ieee-security.org/TC/SP2013/papers/4977a191.pdf

That means that even when KERNEL_PAGE_TABLE_ISOLATION switches page tables
on return to user space the global pages would stay in the TLB cache.

Disable global pages so that kernel TLB entries can be flushed before
returning to user space. This way, all accesses to kernel addresses from
userspace result in a TLB miss independent of the existence of a kernel
mapping.

Supress global pages via the __supported_pte_mask. The user space
mappings set PAGE_GLOBAL for the minimal kernel mappings which are
required for entry/exit. These mappings are set up manually so the
filtering does not take place.

[ The __supported_pte_mask simplification was written by Thomas Gleixner. ]

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@google.com
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: moritz.lipp@iaik.tugraz.at
Cc: linux-mm@kvack.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: hughd@google.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: richard.fellner@student.tugraz.at
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: michael.schwarz@iaik.tugraz.at
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lkml.kernel.org/r/20171123003441.63DDFC6F@viggo.jf.intel.com

---
 arch/x86/mm/init.c |   12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -161,6 +161,12 @@ struct map_range {
 
 static int page_size_mask;
 
+static void enable_global_pages(void)
+{
+	if (!static_cpu_has_bug(X86_BUG_CPU_SECURE_MODE_KPTI))
+		__supported_pte_mask |= _PAGE_GLOBAL;
+}
+
 static void __init probe_page_size_mask(void)
 {
 	/*
@@ -179,11 +185,11 @@ static void __init probe_page_size_mask(
 		cr4_set_bits_and_update_boot(X86_CR4_PSE);
 
 	/* Enable PGE if available */
+	__supported_pte_mask &= ~_PAGE_GLOBAL;
 	if (boot_cpu_has(X86_FEATURE_PGE)) {
 		cr4_set_bits_and_update_boot(X86_CR4_PGE);
-		__supported_pte_mask |= _PAGE_GLOBAL;
-	} else
-		__supported_pte_mask &= ~_PAGE_GLOBAL;
+		enable_global_pages();
+	}
 
 	/* Enable 1 GB linear kernel mappings if available: */
 	if (direct_gbpages && boot_cpu_has(X86_FEATURE_GBPAGES)) {


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [patch 29/60] x86/mm/kpti: Prepare the x86/entry assembly code for entry/exit CR3 switching
       [not found] <20171204140706.296109558@linutronix.de>
  2017-12-04 14:07 ` [patch 24/60] x86/paravirt: Dont patch flush_tlb_single Thomas Gleixner
  2017-12-04 14:07 ` [patch 28/60] x86/mm/kpti: Disable global pages if KERNEL_PAGE_TABLE_ISOLATION=y Thomas Gleixner
@ 2017-12-04 14:07 ` Thomas Gleixner
  2017-12-04 14:07 ` [patch 48/60] x86/mm: Move the CR3 construction functions to tlbflush.h Thomas Gleixner
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Thomas Gleixner @ 2017-12-04 14:07 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Andy Lutomirsky, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Greg KH, keescook, hughd,
	Brian Gerst, Josh Poimboeuf, Denys Vlasenko, Rik van Riel,
	Boris Ostrovsky, Juergen Gross, David Laight, Eduardo Valentin,
	aliguori, Will Deacon, daniel.gruss, Dave Hansen, Ingo Molnar,
	Borislav Petkov, moritz.lipp, linux-mm, richard.fellner,
	michael.schwarz

[-- Attachment #1: x86-mm-kpti--Prepare_the_x86-entry_assembly_code_for_entry-exit_CR3_switching.patch --]
[-- Type: text/plain, Size: 10799 bytes --]

From: Dave Hansen <dave.hansen@linux.intel.com>

KERNEL_PAGE_TABLE_ISOLATION needs to switch to a different CR3 value when
it enters the kernel and switch back when it exits.  This essentially needs
to be done before leaving assembly code.

This is extra challenging because the switching context is tricky: the
registers that can be clobbered can vary.  It is also hard to store things
on the stack because there is an established ABI (ptregs) or the stack is
entirely unsafe to use.

Establish a set of macros that allow changing to the user and kernel CR3
values.

Interactions with SWAPGS:
Previous versions of the KERNEL_PAGE_TABLE_ISOLATION code relied on having
per-CPU scratch space to save/restore a register that can be used for the
CR3 MOV.  The %GS register is used to index into our per-CPU space, so
SWAPGS *had* to be done before the CR3 switch.  That scratch space is gone
now, but the semantic that SWAPGS must be done before the CR3 MOV is
retained.  This is good to keep because it is not that hard to do and it
allows to do things like add per-CPU debugging information.

What this does in the NMI code is worth pointing out.  NMIs can interrupt
*any* context and they can also be nested with NMIs interrupting other
NMIs.  The comments below ".Lnmi_from_kernel" explain the format of the
stack during this situation.  Changing the format of this stack is hard.
Instead of storing the old CR3 value on the stack, this depends on the
*regular* register save/restore mechanism and then uses %r14 to keep CR3
during the NMI.  It is callee-saved and will not be clobbered by the C NMI
handlers that get called.

[ peterz: ESPFIX optimization ]

Based-on-code-from: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: keescook@google.com
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: moritz.lipp@iaik.tugraz.at
Cc: linux-mm@kvack.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: hughd@google.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: richard.fellner@student.tugraz.at
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: michael.schwarz@iaik.tugraz.at
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lkml.kernel.org/r/20171123003442.2D047A7D@viggo.jf.intel.com

---
 arch/x86/entry/calling.h         |   66 +++++++++++++++++++++++++++++++++++++++
 arch/x86/entry/entry_64.S        |   45 +++++++++++++++++++++++---
 arch/x86/entry/entry_64_compat.S |   24 +++++++++++++-
 3 files changed, 128 insertions(+), 7 deletions(-)

--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -1,6 +1,8 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 #include <linux/jump_label.h>
 #include <asm/unwind_hints.h>
+#include <asm/cpufeatures.h>
+#include <asm/page_types.h>
 
 /*
 
@@ -187,6 +189,70 @@ For 32-bit we have the following convent
 #endif
 .endm
 
+#ifdef CONFIG_KERNEL_PAGE_TABLE_ISOLATION
+
+/* KERNEL_PAGE_TABLE_ISOLATION PGDs are 8k.  Flip bit 12 to switch between the two halves: */
+#define KPTI_SWITCH_MASK (1<<PAGE_SHIFT)
+
+.macro ADJUST_KERNEL_CR3 reg:req
+	/* Clear "KERNEL_PAGE_TABLE_ISOLATION bit", point CR3 at kernel pagetables: */
+	andq	$(~KPTI_SWITCH_MASK), \reg
+.endm
+
+.macro ADJUST_USER_CR3 reg:req
+	/* Move CR3 up a page to the user page tables: */
+	orq	$(KPTI_SWITCH_MASK), \reg
+.endm
+
+.macro SWITCH_TO_KERNEL_CR3 scratch_reg:req
+	mov	%cr3, \scratch_reg
+	ADJUST_KERNEL_CR3 \scratch_reg
+	mov	\scratch_reg, %cr3
+.endm
+
+.macro SWITCH_TO_USER_CR3 scratch_reg:req
+	mov	%cr3, \scratch_reg
+	ADJUST_USER_CR3 \scratch_reg
+	mov	\scratch_reg, %cr3
+.endm
+
+.macro SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg:req save_reg:req
+	movq	%cr3, \scratch_reg
+	movq	\scratch_reg, \save_reg
+	/*
+	 * Is the switch bit zero?  This means the address is
+	 * up in real KERNEL_PAGE_TABLE_ISOLATION patches in a moment.
+	 */
+	testq	$(KPTI_SWITCH_MASK), \scratch_reg
+	jz	.Ldone_\@
+
+	ADJUST_KERNEL_CR3 \scratch_reg
+	movq	\scratch_reg, %cr3
+
+.Ldone_\@:
+.endm
+
+.macro RESTORE_CR3 save_reg:req
+	/*
+	 * The CR3 write could be avoided when not changing its value,
+	 * but would require a CR3 read *and* a scratch register.
+	 */
+	movq	\save_reg, %cr3
+.endm
+
+#else /* CONFIG_KERNEL_PAGE_TABLE_ISOLATION=n: */
+
+.macro SWITCH_TO_KERNEL_CR3 scratch_reg:req
+.endm
+.macro SWITCH_TO_USER_CR3 scratch_reg:req
+.endm
+.macro SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg:req save_reg:req
+.endm
+.macro RESTORE_CR3 save_reg:req
+.endm
+
+#endif
+
 #endif /* CONFIG_X86_64 */
 
 /*
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -168,6 +168,9 @@ ENTRY(entry_SYSCALL_64_trampoline)
 	/* Stash the user RSP. */
 	movq	%rsp, RSP_SCRATCH
 
+	/* Note: using %rsp as a scratch reg. */
+	SWITCH_TO_KERNEL_CR3 scratch_reg=%rsp
+
 	/* Load the top of the task stack into RSP */
 	movq	CPU_ENTRY_AREA_tss + TSS_sp1 + CPU_ENTRY_AREA, %rsp
 
@@ -208,6 +211,10 @@ ENTRY(entry_SYSCALL_64)
 
 	swapgs
 	movq	%rsp, PER_CPU_VAR(rsp_scratch)
+	/*
+	 * This path is not taken when KERNEL_PAGE_TABLE_ISOLATION is disabled so it
+	 * is not required to switch CR3.
+	 */
 	movq	PER_CPU_VAR(cpu_current_top_of_stack), %rsp
 
 	/* Construct struct pt_regs on stack */
@@ -403,6 +410,7 @@ GLOBAL(entry_SYSCALL_64_after_hwframe)
 	 * We are on the trampoline stack.  All regs except RDI are live.
 	 * We can do future final exit work right here.
 	 */
+	SWITCH_TO_USER_CR3 scratch_reg=%rdi
 
 	popq	%rdi
 	popq	%rsp
@@ -740,6 +748,8 @@ GLOBAL(swapgs_restore_regs_and_return_to
 	 * We can do future final exit work right here.
 	 */
 
+	SWITCH_TO_USER_CR3 scratch_reg=%rdi
+
 	/* Restore RDI. */
 	popq	%rdi
 	SWAPGS
@@ -822,7 +832,9 @@ ENTRY(native_iret)
 	 */
 
 	pushq	%rdi				/* Stash user RDI */
-	SWAPGS
+	SWAPGS					/* to kernel GS */
+	SWITCH_TO_KERNEL_CR3 scratch_reg=%rdi	/* to kernel CR3 */
+
 	movq	PER_CPU_VAR(espfix_waddr), %rdi
 	movq	%rax, (0*8)(%rdi)		/* user RAX */
 	movq	(1*8)(%rsp), %rax		/* user RIP */
@@ -838,7 +850,6 @@ ENTRY(native_iret)
 	/* Now RAX == RSP. */
 
 	andl	$0xffff0000, %eax		/* RAX = (RSP & 0xffff0000) */
-	popq	%rdi				/* Restore user RDI */
 
 	/*
 	 * espfix_stack[31:16] == 0.  The page tables are set up such that
@@ -849,7 +860,11 @@ ENTRY(native_iret)
 	 * still points to an RO alias of the ESPFIX stack.
 	 */
 	orq	PER_CPU_VAR(espfix_stack), %rax
-	SWAPGS
+
+	SWITCH_TO_USER_CR3 scratch_reg=%rdi	/* to user CR3 */
+	SWAPGS					/* to user GS */
+	popq	%rdi				/* Restore user RDI */
+
 	movq	%rax, %rsp
 	UNWIND_HINT_IRET_REGS offset=8
 
@@ -949,6 +964,8 @@ ENTRY(switch_to_thread_stack)
 	UNWIND_HINT_FUNC
 
 	pushq	%rdi
+	/* Need to switch before accessing the thread stack. */
+	SWITCH_TO_KERNEL_CR3 scratch_reg=%rdi
 	movq	%rsp, %rdi
 	movq	PER_CPU_VAR(cpu_current_top_of_stack), %rsp
 	UNWIND_HINT sp_offset=16 sp_reg=ORC_REG_DI
@@ -1250,7 +1267,11 @@ ENTRY(paranoid_entry)
 	js	1f				/* negative -> in kernel */
 	SWAPGS
 	xorl	%ebx, %ebx
-1:	ret
+
+1:
+	SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg=%rax save_reg=%r14
+
+	ret
 END(paranoid_entry)
 
 /*
@@ -1272,6 +1293,7 @@ ENTRY(paranoid_exit)
 	testl	%ebx, %ebx			/* swapgs needed? */
 	jnz	.Lparanoid_exit_no_swapgs
 	TRACE_IRQS_IRETQ
+	RESTORE_CR3	save_reg=%r14
 	SWAPGS_UNSAFE_STACK
 	jmp	.Lparanoid_exit_restore
 .Lparanoid_exit_no_swapgs:
@@ -1299,6 +1321,8 @@ ENTRY(error_entry)
 	 * from user mode due to an IRET fault.
 	 */
 	SWAPGS
+	/* We have user CR3.  Change to kernel CR3. */
+	SWITCH_TO_KERNEL_CR3 scratch_reg=%rax
 
 .Lerror_entry_from_usermode_after_swapgs:
 	/* Put us onto the real thread stack. */
@@ -1345,6 +1369,7 @@ ENTRY(error_entry)
 	 * .Lgs_change's error handler with kernel gsbase.
 	 */
 	SWAPGS
+	SWITCH_TO_KERNEL_CR3 scratch_reg=%rax
 	jmp .Lerror_entry_done
 
 .Lbstep_iret:
@@ -1354,10 +1379,11 @@ ENTRY(error_entry)
 
 .Lerror_bad_iret:
 	/*
-	 * We came from an IRET to user mode, so we have user gsbase.
-	 * Switch to kernel gsbase:
+	 * We came from an IRET to user mode, so we have user
+	 * gsbase and CR3.  Switch to kernel gsbase and CR3:
 	 */
 	SWAPGS
+	SWITCH_TO_KERNEL_CR3 scratch_reg=%rax
 
 	/*
 	 * Pretend that the exception came from user mode: set up pt_regs
@@ -1389,6 +1415,10 @@ END(error_exit)
 /*
  * Runs on exception stack.  Xen PV does not go through this path at all,
  * so we can use real assembly here.
+ *
+ * Registers:
+ *	%r14: Used to save/restore the CR3 of the interrupted context
+ *	      when KERNEL_PAGE_TABLE_ISOLATION is in use.  Do not clobber.
  */
 ENTRY(nmi)
 	UNWIND_HINT_IRET_REGS
@@ -1452,6 +1482,7 @@ ENTRY(nmi)
 
 	swapgs
 	cld
+	SWITCH_TO_KERNEL_CR3 scratch_reg=%rdx
 	movq	%rsp, %rdx
 	movq	PER_CPU_VAR(cpu_current_top_of_stack), %rsp
 	UNWIND_HINT_IRET_REGS base=%rdx offset=8
@@ -1704,6 +1735,8 @@ ENTRY(nmi)
 	movq	$-1, %rsi
 	call	do_nmi
 
+	RESTORE_CR3 save_reg=%r14
+
 	testl	%ebx, %ebx			/* swapgs needed? */
 	jnz	nmi_restore
 nmi_swapgs:
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -49,6 +49,10 @@
 ENTRY(entry_SYSENTER_compat)
 	/* Interrupts are off on entry. */
 	SWAPGS
+
+	/* We are about to clobber %rsp anyway, clobbering here is OK */
+	SWITCH_TO_KERNEL_CR3 scratch_reg=%rsp
+
 	movq	PER_CPU_VAR(cpu_current_top_of_stack), %rsp
 
 	/*
@@ -216,6 +220,12 @@ GLOBAL(entry_SYSCALL_compat_after_hwfram
 	pushq   $0			/* pt_regs->r15 = 0 */
 
 	/*
+	 * We just saved %rdi so it is safe to clobber.  It is not
+	 * preserved during the C calls inside TRACE_IRQS_OFF anyway.
+	 */
+	SWITCH_TO_KERNEL_CR3 scratch_reg=%rdi
+
+	/*
 	 * User mode is traced as though IRQs are on, and SYSENTER
 	 * turned them off.
 	 */
@@ -256,10 +266,22 @@ GLOBAL(entry_SYSCALL_compat_after_hwfram
 	 * when the system call started, which is already known to user
 	 * code.  We zero R8-R10 to avoid info leaks.
          */
+	movq	RSP-ORIG_RAX(%rsp), %rsp
+
+	/*
+	 * The original userspace %rsp (RSP-ORIG_RAX(%rsp)) is stored
+	 * on the process stack which is not mapped to userspace and
+	 * not readable after we SWITCH_TO_USER_CR3.  Delay the CR3
+	 * switch until after after the last reference to the process
+	 * stack.
+	 *
+	 * %r8 is zeroed before the sysret, thus safe to clobber.
+	 */
+	SWITCH_TO_USER_CR3 scratch_reg=%r8
+
 	xorq	%r8, %r8
 	xorq	%r9, %r9
 	xorq	%r10, %r10
-	movq	RSP-ORIG_RAX(%rsp), %rsp
 	swapgs
 	sysretl
 END(entry_SYSCALL_compat)


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [patch 48/60] x86/mm: Move the CR3 construction functions to tlbflush.h
       [not found] <20171204140706.296109558@linutronix.de>
                   ` (2 preceding siblings ...)
  2017-12-04 14:07 ` [patch 29/60] x86/mm/kpti: Prepare the x86/entry assembly code for entry/exit CR3 switching Thomas Gleixner
@ 2017-12-04 14:07 ` Thomas Gleixner
  2017-12-04 14:07 ` [patch 49/60] x86/mm: Remove hard-coded ASID limit checks Thomas Gleixner
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Thomas Gleixner @ 2017-12-04 14:07 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Andy Lutomirsky, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Greg KH, keescook, hughd,
	Brian Gerst, Josh Poimboeuf, Denys Vlasenko, Rik van Riel,
	Boris Ostrovsky, Juergen Gross, David Laight, Eduardo Valentin,
	aliguori, Will Deacon, daniel.gruss, Dave Hansen, Ingo Molnar,
	moritz.lipp, linux-mm, Borislav Petkov, michael.schwarz,
	richard.fellner

[-- Attachment #1: x86-mm--Move_the_CR3_construction_functions_to_tlbflush.h.patch --]
[-- Type: text/plain, Size: 5807 bytes --]

From: Dave Hansen <dave.hansen@linux.intel.com>

For flushing the TLB, the ASID which has been programmed into the hardware
must be known.  That differs from what is in 'cpu_tlbstate'.

Add functions to transform the 'cpu_tlbstate' values into to the one
programmed into the hardware (CR3).

It's not easy to include mmu_context.h into tlbflush.h, so just move
the CR3 building over to tlbflush.h.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: keescook@google.com
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: moritz.lipp@iaik.tugraz.at
Cc: linux-mm@kvack.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: hughd@google.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: michael.schwarz@iaik.tugraz.at
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: richard.fellner@student.tugraz.at
Link: https://lkml.kernel.org/r/20171123003502.CC87BF47@viggo.jf.intel.com

---
 arch/x86/include/asm/mmu_context.h |   29 +----------------------------
 arch/x86/include/asm/tlbflush.h    |   26 ++++++++++++++++++++++++++
 arch/x86/mm/tlb.c                  |    8 ++++----
 3 files changed, 31 insertions(+), 32 deletions(-)

--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -260,33 +260,6 @@ static inline bool arch_vma_access_permi
 }
 
 /*
- * If PCID is on, ASID-aware code paths put the ASID+1 into the PCID
- * bits.  This serves two purposes.  It prevents a nasty situation in
- * which PCID-unaware code saves CR3, loads some other value (with PCID
- * == 0), and then restores CR3, thus corrupting the TLB for ASID 0 if
- * the saved ASID was nonzero.  It also means that any bugs involving
- * loading a PCID-enabled CR3 with CR4.PCIDE off will trigger
- * deterministically.
- */
-
-static inline unsigned long build_cr3(struct mm_struct *mm, u16 asid)
-{
-	if (static_cpu_has(X86_FEATURE_PCID)) {
-		VM_WARN_ON_ONCE(asid > 4094);
-		return __sme_pa(mm->pgd) | (asid + 1);
-	} else {
-		VM_WARN_ON_ONCE(asid != 0);
-		return __sme_pa(mm->pgd);
-	}
-}
-
-static inline unsigned long build_cr3_noflush(struct mm_struct *mm, u16 asid)
-{
-	VM_WARN_ON_ONCE(asid > 4094);
-	return __sme_pa(mm->pgd) | (asid + 1) | CR3_NOFLUSH;
-}
-
-/*
  * This can be used from process context to figure out what the value of
  * CR3 is without needing to do a (slow) __read_cr3().
  *
@@ -295,7 +268,7 @@ static inline unsigned long build_cr3_no
  */
 static inline unsigned long __get_current_cr3_fast(void)
 {
-	unsigned long cr3 = build_cr3(this_cpu_read(cpu_tlbstate.loaded_mm),
+	unsigned long cr3 = build_cr3(this_cpu_read(cpu_tlbstate.loaded_mm)->pgd,
 		this_cpu_read(cpu_tlbstate.loaded_mm_asid));
 
 	/* For now, be very restrictive about when this can be called. */
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -75,6 +75,32 @@ static inline u64 inc_mm_tlb_gen(struct
 	return new_tlb_gen;
 }
 
+/*
+ * If PCID is on, ASID-aware code paths put the ASID+1 into the PCID bits.
+ * This serves two purposes.  It prevents a nasty situation in which
+ * PCID-unaware code saves CR3, loads some other value (with PCID == 0),
+ * and then restores CR3, thus corrupting the TLB for ASID 0 if the saved
+ * ASID was nonzero.  It also means that any bugs involving loading a
+ * PCID-enabled CR3 with CR4.PCIDE off will trigger deterministically.
+ */
+struct pgd_t;
+static inline unsigned long build_cr3(pgd_t *pgd, u16 asid)
+{
+	if (static_cpu_has(X86_FEATURE_PCID)) {
+		VM_WARN_ON_ONCE(asid > 4094);
+		return __sme_pa(pgd) | (asid + 1);
+	} else {
+		VM_WARN_ON_ONCE(asid != 0);
+		return __sme_pa(pgd);
+	}
+}
+
+static inline unsigned long build_cr3_noflush(pgd_t *pgd, u16 asid)
+{
+	VM_WARN_ON_ONCE(asid > 4094);
+	return __sme_pa(pgd) | (asid + 1) | CR3_NOFLUSH;
+}
+
 #ifdef CONFIG_PARAVIRT
 #include <asm/paravirt.h>
 #else
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -128,7 +128,7 @@ void switch_mm_irqs_off(struct mm_struct
 	 * isn't free.
 	 */
 #ifdef CONFIG_DEBUG_VM
-	if (WARN_ON_ONCE(__read_cr3() != build_cr3(real_prev, prev_asid))) {
+	if (WARN_ON_ONCE(__read_cr3() != build_cr3(real_prev->pgd, prev_asid))) {
 		/*
 		 * If we were to BUG here, we'd be very likely to kill
 		 * the system so hard that we don't see the call trace.
@@ -195,7 +195,7 @@ void switch_mm_irqs_off(struct mm_struct
 		if (need_flush) {
 			this_cpu_write(cpu_tlbstate.ctxs[new_asid].ctx_id, next->context.ctx_id);
 			this_cpu_write(cpu_tlbstate.ctxs[new_asid].tlb_gen, next_tlb_gen);
-			write_cr3(build_cr3(next, new_asid));
+			write_cr3(build_cr3(next->pgd, new_asid));
 
 			/*
 			 * NB: This gets called via leave_mm() in the idle path
@@ -208,7 +208,7 @@ void switch_mm_irqs_off(struct mm_struct
 			trace_tlb_flush_rcuidle(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);
 		} else {
 			/* The new ASID is already up to date. */
-			write_cr3(build_cr3_noflush(next, new_asid));
+			write_cr3(build_cr3_noflush(next->pgd, new_asid));
 
 			/* See above wrt _rcuidle. */
 			trace_tlb_flush_rcuidle(TLB_FLUSH_ON_TASK_SWITCH, 0);
@@ -288,7 +288,7 @@ void initialize_tlbstate_and_flush(void)
 		!(cr4_read_shadow() & X86_CR4_PCIDE));
 
 	/* Force ASID 0 and force a TLB flush. */
-	write_cr3(build_cr3(mm, 0));
+	write_cr3(build_cr3(mm->pgd, 0));
 
 	/* Reinitialize tlbstate. */
 	this_cpu_write(cpu_tlbstate.loaded_mm_asid, 0);


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [patch 49/60] x86/mm: Remove hard-coded ASID limit checks
       [not found] <20171204140706.296109558@linutronix.de>
                   ` (3 preceding siblings ...)
  2017-12-04 14:07 ` [patch 48/60] x86/mm: Move the CR3 construction functions to tlbflush.h Thomas Gleixner
@ 2017-12-04 14:07 ` Thomas Gleixner
  2017-12-04 14:07 ` [patch 50/60] x86/mm: Put MMU to hardware ASID translation in one place Thomas Gleixner
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Thomas Gleixner @ 2017-12-04 14:07 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Andy Lutomirsky, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Greg KH, keescook, hughd,
	Brian Gerst, Josh Poimboeuf, Denys Vlasenko, Rik van Riel,
	Boris Ostrovsky, Juergen Gross, David Laight, Eduardo Valentin,
	aliguori, Will Deacon, daniel.gruss, Dave Hansen, Ingo Molnar,
	moritz.lipp, linux-mm, Borislav Petkov, michael.schwarz,
	richard.fellner

[-- Attachment #1: x86-mm--Remove_hard-coded_ASID_limit_checks.patch --]
[-- Type: text/plain, Size: 2805 bytes --]

From: Dave Hansen <dave.hansen@linux.intel.com>

First, it's nice to remove the magic numbers.

Second, KERNEL_PAGE_TABLE_ISOLATION is going to consume half of the
available ASID space.  The space is currently unused, but add a comment to
spell out this new restriction.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: keescook@google.com
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: moritz.lipp@iaik.tugraz.at
Cc: linux-mm@kvack.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: hughd@google.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: michael.schwarz@iaik.tugraz.at
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: richard.fellner@student.tugraz.at
Link: https://lkml.kernel.org/r/20171123003504.57EDB845@viggo.jf.intel.com

---
 arch/x86/include/asm/tlbflush.h |   20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -75,6 +75,22 @@ static inline u64 inc_mm_tlb_gen(struct
 	return new_tlb_gen;
 }
 
+/* There are 12 bits of space for ASIDS in CR3 */
+#define CR3_HW_ASID_BITS		12
+/*
+ * When enabled, KERNEL_PAGE_TABLE_ISOLATION consumes a single bit for
+ * user/kernel switches
+ */
+#define KPTI_CONSUMED_ASID_BITS		0
+
+#define CR3_AVAIL_ASID_BITS (CR3_HW_ASID_BITS - KPTI_CONSUMED_ASID_BITS)
+/*
+ * ASIDs are zero-based: 0->MAX_AVAIL_ASID are valid.  -1 below to account
+ * for them being zero-based.  Another -1 is because ASID 0 is reserved for
+ * use by non-PCID-aware users.
+ */
+#define MAX_ASID_AVAILABLE ((1 << CR3_AVAIL_ASID_BITS) - 2)
+
 /*
  * If PCID is on, ASID-aware code paths put the ASID+1 into the PCID bits.
  * This serves two purposes.  It prevents a nasty situation in which
@@ -87,7 +103,7 @@ struct pgd_t;
 static inline unsigned long build_cr3(pgd_t *pgd, u16 asid)
 {
 	if (static_cpu_has(X86_FEATURE_PCID)) {
-		VM_WARN_ON_ONCE(asid > 4094);
+		VM_WARN_ON_ONCE(asid > MAX_ASID_AVAILABLE);
 		return __sme_pa(pgd) | (asid + 1);
 	} else {
 		VM_WARN_ON_ONCE(asid != 0);
@@ -97,7 +113,7 @@ static inline unsigned long build_cr3(pg
 
 static inline unsigned long build_cr3_noflush(pgd_t *pgd, u16 asid)
 {
-	VM_WARN_ON_ONCE(asid > 4094);
+	VM_WARN_ON_ONCE(asid > MAX_ASID_AVAILABLE);
 	return __sme_pa(pgd) | (asid + 1) | CR3_NOFLUSH;
 }
 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [patch 50/60] x86/mm: Put MMU to hardware ASID translation in one place
       [not found] <20171204140706.296109558@linutronix.de>
                   ` (4 preceding siblings ...)
  2017-12-04 14:07 ` [patch 49/60] x86/mm: Remove hard-coded ASID limit checks Thomas Gleixner
@ 2017-12-04 14:07 ` Thomas Gleixner
  2017-12-04 14:08 ` [patch 56/60] x86/mm/kpti: Disable native VSYSCALL Thomas Gleixner
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Thomas Gleixner @ 2017-12-04 14:07 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Andy Lutomirsky, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Greg KH, keescook, hughd,
	Brian Gerst, Josh Poimboeuf, Denys Vlasenko, Rik van Riel,
	Boris Ostrovsky, Juergen Gross, David Laight, Eduardo Valentin,
	aliguori, Will Deacon, daniel.gruss, Dave Hansen, Ingo Molnar,
	moritz.lipp, linux-mm, Borislav Petkov, michael.schwarz,
	richard.fellner

[-- Attachment #1: x86-mm--Put_MMU-to-h-w_ASID_translation_in_one_place.patch --]
[-- Type: text/plain, Size: 3485 bytes --]

From: Dave Hansen <dave.hansen@linux.intel.com>

There are effectively two ASID types:

 1. The one stored in the mmu_context that goes from 0..5
 2. The one programmed into the hardware that goes from 1..6

This consolidates the locations where converting between the two (by doing
a +1) to a single place which gives us a nice place to comment.
KERNEL_PAGE_TABLE_ISOLATION will also need to, given an ASID, know which
hardware ASID to flush for the userspace mapping.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: keescook@google.com
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: moritz.lipp@iaik.tugraz.at
Cc: linux-mm@kvack.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: hughd@google.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: michael.schwarz@iaik.tugraz.at
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: richard.fellner@student.tugraz.at
Link: https://lkml.kernel.org/r/20171123003506.67E81D7F@viggo.jf.intel.com

---
 arch/x86/include/asm/tlbflush.h |   29 ++++++++++++++++++-----------
 1 file changed, 18 insertions(+), 11 deletions(-)

--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -91,20 +91,26 @@ static inline u64 inc_mm_tlb_gen(struct
  */
 #define MAX_ASID_AVAILABLE ((1 << CR3_AVAIL_ASID_BITS) - 2)
 
-/*
- * If PCID is on, ASID-aware code paths put the ASID+1 into the PCID bits.
- * This serves two purposes.  It prevents a nasty situation in which
- * PCID-unaware code saves CR3, loads some other value (with PCID == 0),
- * and then restores CR3, thus corrupting the TLB for ASID 0 if the saved
- * ASID was nonzero.  It also means that any bugs involving loading a
- * PCID-enabled CR3 with CR4.PCIDE off will trigger deterministically.
- */
+static inline u16 kern_pcid(u16 asid)
+{
+	VM_WARN_ON_ONCE(asid > MAX_ASID_AVAILABLE);
+	/*
+	 * If PCID is on, ASID-aware code paths put the ASID+1 into the
+	 * PCID bits.  This serves two purposes.  It prevents a nasty
+	 * situation in which PCID-unaware code saves CR3, loads some other
+	 * value (with PCID == 0), and then restores CR3, thus corrupting
+	 * the TLB for ASID 0 if the saved ASID was nonzero.  It also means
+	 * that any bugs involving loading a PCID-enabled CR3 with
+	 * CR4.PCIDE off will trigger deterministically.
+	 */
+	return asid + 1;
+}
+
 struct pgd_t;
 static inline unsigned long build_cr3(pgd_t *pgd, u16 asid)
 {
 	if (static_cpu_has(X86_FEATURE_PCID)) {
-		VM_WARN_ON_ONCE(asid > MAX_ASID_AVAILABLE);
-		return __sme_pa(pgd) | (asid + 1);
+		return __sme_pa(pgd) | kern_pcid(asid);
 	} else {
 		VM_WARN_ON_ONCE(asid != 0);
 		return __sme_pa(pgd);
@@ -114,7 +120,8 @@ static inline unsigned long build_cr3(pg
 static inline unsigned long build_cr3_noflush(pgd_t *pgd, u16 asid)
 {
 	VM_WARN_ON_ONCE(asid > MAX_ASID_AVAILABLE);
-	return __sme_pa(pgd) | (asid + 1) | CR3_NOFLUSH;
+	VM_WARN_ON_ONCE(!this_cpu_has(X86_FEATURE_PCID));
+	return __sme_pa(pgd) | kern_pcid(asid) | CR3_NOFLUSH;
 }
 
 #ifdef CONFIG_PARAVIRT


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [patch 56/60] x86/mm/kpti: Disable native VSYSCALL
       [not found] <20171204140706.296109558@linutronix.de>
                   ` (5 preceding siblings ...)
  2017-12-04 14:07 ` [patch 50/60] x86/mm: Put MMU to hardware ASID translation in one place Thomas Gleixner
@ 2017-12-04 14:08 ` Thomas Gleixner
  2017-12-04 22:33   ` Andy Lutomirski
  2017-12-04 14:08 ` [patch 57/60] x86/mm/kpti: Add Kconfig Thomas Gleixner
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 16+ messages in thread
From: Thomas Gleixner @ 2017-12-04 14:08 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Andy Lutomirsky, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Greg KH, keescook, hughd,
	Brian Gerst, Josh Poimboeuf, Denys Vlasenko, Rik van Riel,
	Boris Ostrovsky, Juergen Gross, David Laight, Eduardo Valentin,
	aliguori, Will Deacon, daniel.gruss, Dave Hansen, Ingo Molnar,
	moritz.lipp, linux-mm, Borislav Petkov, michael.schwarz,
	richard.fellner

[-- Attachment #1: x86-mm-kpti--Disable_native_VSYSCALL.patch --]
[-- Type: text/plain, Size: 2863 bytes --]

From: Dave Hansen <dave.hansen@linux.intel.com>

The KERNEL_PAGE_TABLE_ISOLATION code attempts to "poison" the user
portion of the kernel page tables. It detects entries that it wants that it
wants to poison in two ways:

 * Looking for addresses >= PAGE_OFFSET

 * Looking for entries without _PAGE_USER set

But, to allow the _PAGE_USER check to work, it must never be set on
init_mm entries, and an earlier patch in this series ensured that it
will never be set.

The VDSO is at a address >= PAGE_OFFSET and it is also mapped by init_mm.
Because of the earlier, KERNEL_PAGE_TABLE_ISOLATION-enforced restriction,
_PAGE_USER is never set which makes the VDSO unreadable to userspace.

This makes the "NATIVE" case totally unusable since userspace can not even
see the memory any more.  Disable it whenever KERNEL_PAGE_TABLE_ISOLATION
is enabled.

Also add some help text about how KERNEL_PAGE_TABLE_ISOLATION might
affect the emulation case as well.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: keescook@google.com
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: moritz.lipp@iaik.tugraz.at
Cc: linux-mm@kvack.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: hughd@google.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: michael.schwarz@iaik.tugraz.at
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: richard.fellner@student.tugraz.at
Link: https://lkml.kernel.org/r/20171123003513.10CAD896@viggo.jf.intel.com

---
 arch/x86/Kconfig |    8 ++++++++
 1 file changed, 8 insertions(+)

--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2249,6 +2249,9 @@ choice
 
 	config LEGACY_VSYSCALL_NATIVE
 		bool "Native"
+		# The VSYSCALL page comes from the kernel page tables
+		# and is not available when KERNEL_PAGE_TABLE_ISOLATION is enabled.
+		depends on !KERNEL_PAGE_TABLE_ISOLATION
 		help
 		  Actual executable code is located in the fixed vsyscall
 		  address mapping, implementing time() efficiently. Since
@@ -2266,6 +2269,11 @@ choice
 		  exploits. This configuration is recommended when userspace
 		  still uses the vsyscall area.
 
+		  When KERNEL_PAGE_TABLE_ISOLATION is enabled, the vsyscall area will become
+		  unreadable.  This emulation option still works, but KERNEL_PAGE_TABLE_ISOLATION
+		  will make it harder to do things like trace code using the
+		  emulation.
+
 	config LEGACY_VSYSCALL_NONE
 		bool "None"
 		help


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [patch 57/60] x86/mm/kpti: Add Kconfig
       [not found] <20171204140706.296109558@linutronix.de>
                   ` (6 preceding siblings ...)
  2017-12-04 14:08 ` [patch 56/60] x86/mm/kpti: Disable native VSYSCALL Thomas Gleixner
@ 2017-12-04 14:08 ` Thomas Gleixner
  2017-12-04 16:54   ` Andy Lutomirski
  2017-12-04 14:08 ` [patch 59/60] x86/mm/dump_pagetables: Check user space page table for WX pages Thomas Gleixner
  2017-12-04 14:08 ` [patch 60/60] x86/mm/debug_pagetables: Allow dumping current pagetables Thomas Gleixner
  9 siblings, 1 reply; 16+ messages in thread
From: Thomas Gleixner @ 2017-12-04 14:08 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Andy Lutomirsky, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Greg KH, keescook, hughd,
	Brian Gerst, Josh Poimboeuf, Denys Vlasenko, Rik van Riel,
	Boris Ostrovsky, Juergen Gross, David Laight, Eduardo Valentin,
	aliguori, Will Deacon, daniel.gruss, Dave Hansen, Ingo Molnar,
	moritz.lipp, linux-mm, Borislav Petkov, michael.schwarz,
	richard.fellner

[-- Attachment #1: x86-mm-kpti--Add_Kconfig.patch --]
[-- Type: text/plain, Size: 2573 bytes --]

From: Dave Hansen <dave.hansen@linux.intel.com>

Finally allow CONFIG_KERNEL_PAGE_TABLE_ISOLATION to be enabled.

PARAVIRT generally requires that the kernel not manage its own page tables.
It also means that the hypervisor and kernel must agree wholeheartedly
about what format the page tables are in and what they contain.
KERNEL_PAGE_TABLE_ISOLATION, unfortunately, changes the rules and they
can not be used together.

I've seen conflicting feedback from maintainers lately about whether they
want the Kconfig magic to go first or last in a patch series.  It's going
last here because the partially-applied series leads to kernels that can
not boot in a bunch of cases.  I did a run through the entire series with
CONFIG_KERNEL_PAGE_TABLE_ISOLATION=y to look for build errors, though.

[ tglx: Removed SMP and !PARAVIRT dependencies as they not longer exist ]

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: keescook@google.com
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: moritz.lipp@iaik.tugraz.at
Cc: linux-mm@kvack.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: hughd@google.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: michael.schwarz@iaik.tugraz.at
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: richard.fellner@student.tugraz.at
Link: https://lkml.kernel.org/r/20171123003524.88C90659@viggo.jf.intel.com

---
 security/Kconfig |   10 ++++++++++
 1 file changed, 10 insertions(+)

--- a/security/Kconfig
+++ b/security/Kconfig
@@ -54,6 +54,16 @@ config SECURITY_NETWORK
 	  implement socket and networking access controls.
 	  If you are unsure how to answer this question, answer N.
 
+config KERNEL_PAGE_TABLE_ISOLATION
+	bool "Remove the kernel mapping in user mode"
+	depends on X86_64 && JUMP_LABEL
+	help
+	  This feature reduces the number of hardware side channels by
+	  ensuring that the majority of kernel addresses are not mapped
+	  into userspace.
+
+	  See Documentation/x86/pagetable-isolation.txt for more details.
+
 config SECURITY_INFINIBAND
 	bool "Infiniband Security Hooks"
 	depends on SECURITY && INFINIBAND


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [patch 59/60] x86/mm/dump_pagetables: Check user space page table for WX pages
       [not found] <20171204140706.296109558@linutronix.de>
                   ` (7 preceding siblings ...)
  2017-12-04 14:08 ` [patch 57/60] x86/mm/kpti: Add Kconfig Thomas Gleixner
@ 2017-12-04 14:08 ` Thomas Gleixner
  2017-12-04 14:08 ` [patch 60/60] x86/mm/debug_pagetables: Allow dumping current pagetables Thomas Gleixner
  9 siblings, 0 replies; 16+ messages in thread
From: Thomas Gleixner @ 2017-12-04 14:08 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Andy Lutomirsky, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Greg KH, keescook, hughd,
	Brian Gerst, Josh Poimboeuf, Denys Vlasenko, Rik van Riel,
	Boris Ostrovsky, Juergen Gross, David Laight, Eduardo Valentin,
	aliguori, Will Deacon, daniel.gruss, moritz.lipp, linux-mm,
	Dave Hansen, Borislav Petkov, michael.schwarz, richard.fellner

[-- Attachment #1: x86-mm-dump_pagetables--Check_shadow_page_table_for_WX_pages.patch --]
[-- Type: text/plain, Size: 3643 bytes --]

From: Thomas Gleixner <tglx@linutronix.de>

ptdump_walk_pgd_level_checkwx() checks the kernel page table for WX pages,
but does not check the KERNEL_PAGE_TABLE_ISOLATION user space page table.

Restructure the code so that dmesg output is selected by an explicit
argument and not implicit via checking the pgd argument for !NULL.

Add the check for the user space page table.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: keescook@google.com
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: moritz.lipp@iaik.tugraz.at
Cc: linux-mm@kvack.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: hughd@google.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: michael.schwarz@iaik.tugraz.at
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: richard.fellner@student.tugraz.at

---
 arch/x86/include/asm/pgtable.h |    1 +
 arch/x86/mm/debug_pagetables.c |    2 +-
 arch/x86/mm/dump_pagetables.c  |   30 +++++++++++++++++++++++++-----
 3 files changed, 27 insertions(+), 6 deletions(-)

--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -28,6 +28,7 @@ extern pgd_t early_top_pgt[PTRS_PER_PGD]
 int __init __early_make_pgtable(unsigned long address, pmdval_t pmd);
 
 void ptdump_walk_pgd_level(struct seq_file *m, pgd_t *pgd);
+void ptdump_walk_pgd_level_debugfs(struct seq_file *m, pgd_t *pgd);
 void ptdump_walk_pgd_level_checkwx(void);
 
 #ifdef CONFIG_DEBUG_WX
--- a/arch/x86/mm/debug_pagetables.c
+++ b/arch/x86/mm/debug_pagetables.c
@@ -5,7 +5,7 @@
 
 static int ptdump_show(struct seq_file *m, void *v)
 {
-	ptdump_walk_pgd_level(m, NULL);
+	ptdump_walk_pgd_level_debugfs(m, NULL);
 	return 0;
 }
 
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -447,7 +447,7 @@ static inline bool is_hypervisor_range(i
 }
 
 static void ptdump_walk_pgd_level_core(struct seq_file *m, pgd_t *pgd,
-				       bool checkwx)
+				       bool checkwx, bool dmesg)
 {
 #ifdef CONFIG_X86_64
 	pgd_t *start = (pgd_t *) &init_top_pgt;
@@ -460,7 +460,7 @@ static void ptdump_walk_pgd_level_core(s
 
 	if (pgd) {
 		start = pgd;
-		st.to_dmesg = true;
+		st.to_dmesg = dmesg;
 	}
 
 	st.check_wx = checkwx;
@@ -498,13 +498,33 @@ static void ptdump_walk_pgd_level_core(s
 
 void ptdump_walk_pgd_level(struct seq_file *m, pgd_t *pgd)
 {
-	ptdump_walk_pgd_level_core(m, pgd, false);
+	ptdump_walk_pgd_level_core(m, pgd, false, true);
+}
+
+void ptdump_walk_pgd_level_debugfs(struct seq_file *m, pgd_t *pgd)
+{
+	ptdump_walk_pgd_level_core(m, pgd, false, false);
+}
+EXPORT_SYMBOL_GPL(ptdump_walk_pgd_level_debugfs);
+
+static void ptdump_walk_user_pgd_level_checkwx(void)
+{
+#ifdef CONFIG_KERNEL_PAGE_TABLE_ISOLATION
+	pgd_t *pgd = (pgd_t *) &init_top_pgt;
+
+	if (!static_cpu_has_bug(X86_BUG_CPU_SECURE_MODE_KPTI))
+		return;
+
+	pr_info("x86/mm: Checking user space page tables\n");
+	pgd = kernel_to_user_pgdp(pgd);
+	ptdump_walk_pgd_level_core(NULL, pgd, true, false);
+#endif
 }
-EXPORT_SYMBOL_GPL(ptdump_walk_pgd_level);
 
 void ptdump_walk_pgd_level_checkwx(void)
 {
-	ptdump_walk_pgd_level_core(NULL, NULL, true);
+	ptdump_walk_pgd_level_core(NULL, NULL, true, false);
+	ptdump_walk_user_pgd_level_checkwx();
 }
 
 static int __init pt_dump_init(void)


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [patch 60/60] x86/mm/debug_pagetables: Allow dumping current pagetables
       [not found] <20171204140706.296109558@linutronix.de>
                   ` (8 preceding siblings ...)
  2017-12-04 14:08 ` [patch 59/60] x86/mm/dump_pagetables: Check user space page table for WX pages Thomas Gleixner
@ 2017-12-04 14:08 ` Thomas Gleixner
  9 siblings, 0 replies; 16+ messages in thread
From: Thomas Gleixner @ 2017-12-04 14:08 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Andy Lutomirsky, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Greg KH, keescook, hughd,
	Brian Gerst, Josh Poimboeuf, Denys Vlasenko, Rik van Riel,
	Boris Ostrovsky, Juergen Gross, David Laight, Eduardo Valentin,
	aliguori, Will Deacon, daniel.gruss, moritz.lipp, linux-mm,
	Dave Hansen, Borislav Petkov, michael.schwarz, richard.fellner

[-- Attachment #1: x86-mm-debug_pagetables--Allow_dumping_current_pagetables.patch --]
[-- Type: text/plain, Size: 5143 bytes --]

From: Thomas Gleixner <tglx@linutronix.de>

Add two debugfs files which allow to dump the pagetable of the current
task.

current_kernel dumps the regular page table. This is the page table which
is normally shared between kernel and user space. If kernel page table
isolation is enabled this is the kernel space mapping.

If kernel page table isolation is enabled the second file, current_user,
dumps the user space page table.

These files allow to verify the resulting page tables for page table
isolation, but even in the normal case its useful to be able to inspect
user space page tables of current for debugging purposes.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: keescook@google.com
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: moritz.lipp@iaik.tugraz.at
Cc: linux-mm@kvack.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: hughd@google.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: michael.schwarz@iaik.tugraz.at
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: richard.fellner@student.tugraz.at

---
 arch/x86/include/asm/pgtable.h |    2 -
 arch/x86/mm/debug_pagetables.c |   71 ++++++++++++++++++++++++++++++++++++++---
 arch/x86/mm/dump_pagetables.c  |    6 ++-
 3 files changed, 73 insertions(+), 6 deletions(-)

--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -28,7 +28,7 @@ extern pgd_t early_top_pgt[PTRS_PER_PGD]
 int __init __early_make_pgtable(unsigned long address, pmdval_t pmd);
 
 void ptdump_walk_pgd_level(struct seq_file *m, pgd_t *pgd);
-void ptdump_walk_pgd_level_debugfs(struct seq_file *m, pgd_t *pgd);
+void ptdump_walk_pgd_level_debugfs(struct seq_file *m, pgd_t *pgd, bool user);
 void ptdump_walk_pgd_level_checkwx(void);
 
 #ifdef CONFIG_DEBUG_WX
--- a/arch/x86/mm/debug_pagetables.c
+++ b/arch/x86/mm/debug_pagetables.c
@@ -5,7 +5,7 @@
 
 static int ptdump_show(struct seq_file *m, void *v)
 {
-	ptdump_walk_pgd_level_debugfs(m, NULL);
+	ptdump_walk_pgd_level_debugfs(m, NULL, false);
 	return 0;
 }
 
@@ -22,7 +22,57 @@ static const struct file_operations ptdu
 	.release	= single_release,
 };
 
-static struct dentry *dir, *pe;
+static int ptdump_show_curknl(struct seq_file *m, void *v)
+{
+	if (current->mm->pgd) {
+		down_read(&current->mm->mmap_sem);
+		ptdump_walk_pgd_level_debugfs(m, current->mm->pgd, false);
+		up_read(&current->mm->mmap_sem);
+	}
+	return 0;
+}
+
+static int ptdump_open_curknl(struct inode *inode, struct file *filp)
+{
+	return single_open(filp, ptdump_show_curknl, NULL);
+}
+
+static const struct file_operations ptdump_curknl_fops = {
+	.owner		= THIS_MODULE,
+	.open		= ptdump_open_curknl,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= single_release,
+};
+
+#ifdef CONFIG_KERNEL_PAGE_TABLE_ISOLATION
+static struct dentry *pe_curusr;
+
+static int ptdump_show_curusr(struct seq_file *m, void *v)
+{
+	if (current->mm->pgd) {
+		down_read(&current->mm->mmap_sem);
+		ptdump_walk_pgd_level_debugfs(m, current->mm->pgd, true);
+		up_read(&current->mm->mmap_sem);
+	}
+	return 0;
+}
+
+static int ptdump_open_curusr(struct inode *inode, struct file *filp)
+{
+	return single_open(filp, ptdump_show_curusr, NULL);
+}
+
+static const struct file_operations ptdump_curusr_fops = {
+	.owner		= THIS_MODULE,
+	.open		= ptdump_open_curusr,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= single_release,
+};
+#endif
+
+static struct dentry *dir, *pe_knl, *pe_curknl;
 
 static int __init pt_dump_debug_init(void)
 {
@@ -30,9 +80,22 @@ static int __init pt_dump_debug_init(voi
 	if (!dir)
 		return -ENOMEM;
 
-	pe = debugfs_create_file("kernel", 0400, dir, NULL, &ptdump_fops);
-	if (!pe)
+	pe_knl = debugfs_create_file("kernel", 0400, dir, NULL,
+				     &ptdump_fops);
+	if (!pe_knl)
+		goto err;
+
+	pe_curknl = debugfs_create_file("current_kernel", 0400,
+					dir, NULL, &ptdump_curknl_fops);
+	if (!pe_curknl)
+		goto err;
+
+#ifdef CONFIG_KERNEL_PAGE_TABLE_ISOLATION
+	pe_curusr = debugfs_create_file("current_user", 0400,
+					dir, NULL, &ptdump_curusr_fops);
+	if (!pe_curusr)
 		goto err;
+#endif
 	return 0;
 err:
 	debugfs_remove_recursive(dir);
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -501,8 +501,12 @@ void ptdump_walk_pgd_level(struct seq_fi
 	ptdump_walk_pgd_level_core(m, pgd, false, true);
 }
 
-void ptdump_walk_pgd_level_debugfs(struct seq_file *m, pgd_t *pgd)
+void ptdump_walk_pgd_level_debugfs(struct seq_file *m, pgd_t *pgd, bool user)
 {
+#ifdef CONFIG_KERNEL_PAGE_TABLE_ISOLATION
+	if (user && static_cpu_has_bug(X86_BUG_CPU_SECURE_MODE_KPTI))
+		pgd = kernel_to_user_pgdp(pgd);
+#endif
 	ptdump_walk_pgd_level_core(m, pgd, false, false);
 }
 EXPORT_SYMBOL_GPL(ptdump_walk_pgd_level_debugfs);


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch 57/60] x86/mm/kpti: Add Kconfig
  2017-12-04 14:08 ` [patch 57/60] x86/mm/kpti: Add Kconfig Thomas Gleixner
@ 2017-12-04 16:54   ` Andy Lutomirski
  2017-12-04 16:57     ` Thomas Gleixner
  0 siblings, 1 reply; 16+ messages in thread
From: Andy Lutomirski @ 2017-12-04 16:54 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, X86 ML, Linus Torvalds, Andy Lutomirsky, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Greg KH, Kees Cook, Hugh Dickins,
	Brian Gerst, Josh Poimboeuf, Denys Vlasenko, Rik van Riel,
	Boris Ostrovsky, Juergen Gross, David Laight, Eduardo Valentin,
	aliguori, Will Deacon, Daniel Gruss, Dave Hansen, Ingo Molnar,
	moritz.lipp, linux-mm, Borislav Petkov, michael.schwarz,
	richard.fellner

On Mon, Dec 4, 2017 at 6:08 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> From: Dave Hansen <dave.hansen@linux.intel.com>
>
> Finally allow CONFIG_KERNEL_PAGE_TABLE_ISOLATION to be enabled.
>
> PARAVIRT generally requires that the kernel not manage its own page tables.
> It also means that the hypervisor and kernel must agree wholeheartedly
> about what format the page tables are in and what they contain.
> KERNEL_PAGE_TABLE_ISOLATION, unfortunately, changes the rules and they
> can not be used together.
>
> I've seen conflicting feedback from maintainers lately about whether they
> want the Kconfig magic to go first or last in a patch series.  It's going
> last here because the partially-applied series leads to kernels that can
> not boot in a bunch of cases.  I did a run through the entire series with
> CONFIG_KERNEL_PAGE_TABLE_ISOLATION=y to look for build errors, though.
>
> [ tglx: Removed SMP and !PARAVIRT dependencies as they not longer exist ]
>
> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
> Signed-off-by: Ingo Molnar <mingo@kernel.org>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: keescook@google.com
> Cc: Denys Vlasenko <dvlasenk@redhat.com>
> Cc: moritz.lipp@iaik.tugraz.at
> Cc: linux-mm@kvack.org
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Brian Gerst <brgerst@gmail.com>
> Cc: hughd@google.com
> Cc: daniel.gruss@iaik.tugraz.at
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Josh Poimboeuf <jpoimboe@redhat.com>
> Cc: michael.schwarz@iaik.tugraz.at
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: richard.fellner@student.tugraz.at
> Link: https://lkml.kernel.org/r/20171123003524.88C90659@viggo.jf.intel.com
>
> ---
>  security/Kconfig |   10 ++++++++++
>  1 file changed, 10 insertions(+)
>
> --- a/security/Kconfig
> +++ b/security/Kconfig
> @@ -54,6 +54,16 @@ config SECURITY_NETWORK
>           implement socket and networking access controls.
>           If you are unsure how to answer this question, answer N.
>
> +config KERNEL_PAGE_TABLE_ISOLATION
> +       bool "Remove the kernel mapping in user mode"
> +       depends on X86_64 && JUMP_LABEL

select JUMP_LABEL perhaps?

> +       help
> +         This feature reduces the number of hardware side channels by
> +         ensuring that the majority of kernel addresses are not mapped
> +         into userspace.
> +
> +         See Documentation/x86/pagetable-isolation.txt for more details.
> +
>  config SECURITY_INFINIBAND
>         bool "Infiniband Security Hooks"
>         depends on SECURITY && INFINIBAND
>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch 57/60] x86/mm/kpti: Add Kconfig
  2017-12-04 16:54   ` Andy Lutomirski
@ 2017-12-04 16:57     ` Thomas Gleixner
  2017-12-05  9:34       ` Thomas Gleixner
  0 siblings, 1 reply; 16+ messages in thread
From: Thomas Gleixner @ 2017-12-04 16:57 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: LKML, X86 ML, Linus Torvalds, Peter Zijlstra, Dave Hansen,
	Borislav Petkov, Greg KH, Kees Cook, Hugh Dickins, Brian Gerst,
	Josh Poimboeuf, Denys Vlasenko, Rik van Riel, Boris Ostrovsky,
	Juergen Gross, David Laight, Eduardo Valentin, aliguori,
	Will Deacon, Daniel Gruss, Dave Hansen, Ingo Molnar, moritz.lipp,
	linux-mm, Borislav Petkov, michael.schwarz, richard.fellner

On Mon, 4 Dec 2017, Andy Lutomirski wrote:
> On Mon, Dec 4, 2017 at 6:08 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > --- a/security/Kconfig
> > +++ b/security/Kconfig
> > @@ -54,6 +54,16 @@ config SECURITY_NETWORK
> >           implement socket and networking access controls.
> >           If you are unsure how to answer this question, answer N.
> >
> > +config KERNEL_PAGE_TABLE_ISOLATION
> > +       bool "Remove the kernel mapping in user mode"
> > +       depends on X86_64 && JUMP_LABEL
> 
> select JUMP_LABEL perhaps?

Silly me. Yes.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch 56/60] x86/mm/kpti: Disable native VSYSCALL
  2017-12-04 14:08 ` [patch 56/60] x86/mm/kpti: Disable native VSYSCALL Thomas Gleixner
@ 2017-12-04 22:33   ` Andy Lutomirski
  0 siblings, 0 replies; 16+ messages in thread
From: Andy Lutomirski @ 2017-12-04 22:33 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, X86 ML, Linus Torvalds, Andy Lutomirsky, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Greg KH, Kees Cook, Hugh Dickins,
	Brian Gerst, Josh Poimboeuf, Denys Vlasenko, Rik van Riel,
	Boris Ostrovsky, Juergen Gross, David Laight, Eduardo Valentin,
	aliguori, Will Deacon, Daniel Gruss, Dave Hansen, Ingo Molnar,
	moritz.lipp, linux-mm, Borislav Petkov, michael.schwarz,
	richard.fellner

On Mon, Dec 4, 2017 at 6:08 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> From: Dave Hansen <dave.hansen@linux.intel.com>
>
> The KERNEL_PAGE_TABLE_ISOLATION code attempts to "poison" the user
> portion of the kernel page tables. It detects entries that it wants that it
> wants to poison in two ways:
>
>  * Looking for addresses >= PAGE_OFFSET
>
>  * Looking for entries without _PAGE_USER set
>
> But, to allow the _PAGE_USER check to work, it must never be set on
> init_mm entries, and an earlier patch in this series ensured that it
> will never be set.
>
> The VDSO is at a address >= PAGE_OFFSET and it is also mapped by init_mm.
> Because of the earlier, KERNEL_PAGE_TABLE_ISOLATION-enforced restriction,
> _PAGE_USER is never set which makes the VDSO unreadable to userspace.
>
> This makes the "NATIVE" case totally unusable since userspace can not even
> see the memory any more.  Disable it whenever KERNEL_PAGE_TABLE_ISOLATION
> is enabled.
>
> Also add some help text about how KERNEL_PAGE_TABLE_ISOLATION might
> affect the emulation case as well.
>

I think my other suggestion may obsolete this patch.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch 57/60] x86/mm/kpti: Add Kconfig
  2017-12-04 16:57     ` Thomas Gleixner
@ 2017-12-05  9:34       ` Thomas Gleixner
  0 siblings, 0 replies; 16+ messages in thread
From: Thomas Gleixner @ 2017-12-05  9:34 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: LKML, X86 ML, Linus Torvalds, Peter Zijlstra, Dave Hansen,
	Borislav Petkov, Greg KH, Kees Cook, Hugh Dickins, Brian Gerst,
	Josh Poimboeuf, Denys Vlasenko, Rik van Riel, Boris Ostrovsky,
	Juergen Gross, David Laight, Eduardo Valentin, aliguori,
	Will Deacon, Daniel Gruss, Dave Hansen, Ingo Molnar, moritz.lipp,
	linux-mm, Borislav Petkov, michael.schwarz, richard.fellner

On Mon, 4 Dec 2017, Thomas Gleixner wrote:
> On Mon, 4 Dec 2017, Andy Lutomirski wrote:
> > On Mon, Dec 4, 2017 at 6:08 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > > --- a/security/Kconfig
> > > +++ b/security/Kconfig
> > > @@ -54,6 +54,16 @@ config SECURITY_NETWORK
> > >           implement socket and networking access controls.
> > >           If you are unsure how to answer this question, answer N.
> > >
> > > +config KERNEL_PAGE_TABLE_ISOLATION
> > > +       bool "Remove the kernel mapping in user mode"
> > > +       depends on X86_64 && JUMP_LABEL
> > 
> > select JUMP_LABEL perhaps?
> 
> Silly me. Yes.

Peter just pointed out that we switched everything to cpu_has() which is
using alternatives so jump label is not longer required at all.

Thanks,

	tglx

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch 24/60] x86/paravirt: Dont patch flush_tlb_single
  2017-12-04 14:07 ` [patch 24/60] x86/paravirt: Dont patch flush_tlb_single Thomas Gleixner
@ 2017-12-05 12:18   ` Juergen Gross
  0 siblings, 0 replies; 16+ messages in thread
From: Juergen Gross @ 2017-12-05 12:18 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: x86, Linus Torvalds, Andy Lutomirsky, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Greg KH, keescook, hughd,
	Brian Gerst, Josh Poimboeuf, Denys Vlasenko, Rik van Riel,
	Boris Ostrovsky, David Laight, Eduardo Valentin, aliguori,
	Will Deacon, daniel.gruss, Dave Hansen, michael.schwarz,
	linux-mm, Borislav Petkov, moritz.lipp, richard.fellner

On 04/12/17 15:07, Thomas Gleixner wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> native_flush_tlb_single() will be changed with the upcoming
> KERNEL_PAGE_TABLE_ISOLATION feature. This requires to have more code in
> there than INVLPG.
> 
> Remove the paravirt patching for it.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Acked-by: Peter Zijlstra <peterz@infradead.org>
> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>

Reviewed-by: Juergen Gross <jgross@suse.com>


Juergen

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch 28/60] x86/mm/kpti: Disable global pages if KERNEL_PAGE_TABLE_ISOLATION=y
  2017-12-04 14:07 ` [patch 28/60] x86/mm/kpti: Disable global pages if KERNEL_PAGE_TABLE_ISOLATION=y Thomas Gleixner
@ 2017-12-05 14:34   ` Borislav Petkov
  0 siblings, 0 replies; 16+ messages in thread
From: Borislav Petkov @ 2017-12-05 14:34 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Linus Torvalds, Andy Lutomirsky, Peter Zijlstra,
	Dave Hansen, Greg KH, keescook, hughd, Brian Gerst,
	Josh Poimboeuf, Denys Vlasenko, Rik van Riel, Boris Ostrovsky,
	Juergen Gross, David Laight, Eduardo Valentin, aliguori,
	Will Deacon, daniel.gruss, Dave Hansen, Ingo Molnar, moritz.lipp,
	linux-mm, richard.fellner, michael.schwarz

On Mon, Dec 04, 2017 at 03:07:34PM +0100, Thomas Gleixner wrote:
> From: Dave Hansen <dave.hansen@linux.intel.com>
> 
> Global pages stay in the TLB across context switches.  Since all contexts
> share the same kernel mapping, these mappings are marked as global pages
> so kernel entries in the TLB are not flushed out on a context switch.
> 
> But, even having these entries in the TLB opens up something that an
> attacker can use, such as the double-page-fault attack:
> 
>    http://www.ieee-security.org/TC/SP2013/papers/4977a191.pdf
> 
> That means that even when KERNEL_PAGE_TABLE_ISOLATION switches page tables
> on return to user space the global pages would stay in the TLB cache.
> 
> Disable global pages so that kernel TLB entries can be flushed before
> returning to user space. This way, all accesses to kernel addresses from
> userspace result in a TLB miss independent of the existence of a kernel
> mapping.
> 
> Supress global pages via the __supported_pte_mask. The user space

"Suppress"

Otherwise

Reviewed-by: Borislav Petkov <bp@suse.de>

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix ImendA?rffer, Jane Smithard, Graham Norton, HRB 21284 (AG NA 1/4 rnberg)
-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2017-12-05 14:35 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20171204140706.296109558@linutronix.de>
2017-12-04 14:07 ` [patch 24/60] x86/paravirt: Dont patch flush_tlb_single Thomas Gleixner
2017-12-05 12:18   ` Juergen Gross
2017-12-04 14:07 ` [patch 28/60] x86/mm/kpti: Disable global pages if KERNEL_PAGE_TABLE_ISOLATION=y Thomas Gleixner
2017-12-05 14:34   ` Borislav Petkov
2017-12-04 14:07 ` [patch 29/60] x86/mm/kpti: Prepare the x86/entry assembly code for entry/exit CR3 switching Thomas Gleixner
2017-12-04 14:07 ` [patch 48/60] x86/mm: Move the CR3 construction functions to tlbflush.h Thomas Gleixner
2017-12-04 14:07 ` [patch 49/60] x86/mm: Remove hard-coded ASID limit checks Thomas Gleixner
2017-12-04 14:07 ` [patch 50/60] x86/mm: Put MMU to hardware ASID translation in one place Thomas Gleixner
2017-12-04 14:08 ` [patch 56/60] x86/mm/kpti: Disable native VSYSCALL Thomas Gleixner
2017-12-04 22:33   ` Andy Lutomirski
2017-12-04 14:08 ` [patch 57/60] x86/mm/kpti: Add Kconfig Thomas Gleixner
2017-12-04 16:54   ` Andy Lutomirski
2017-12-04 16:57     ` Thomas Gleixner
2017-12-05  9:34       ` Thomas Gleixner
2017-12-04 14:08 ` [patch 59/60] x86/mm/dump_pagetables: Check user space page table for WX pages Thomas Gleixner
2017-12-04 14:08 ` [patch 60/60] x86/mm/debug_pagetables: Allow dumping current pagetables Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).