linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Dave Hansen <dave.hansen@linux.intel.com>
To: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org, dave.hansen@linux.intel.com,
	moritz.lipp@iaik.tugraz.at, daniel.gruss@iaik.tugraz.at,
	michael.schwarz@iaik.tugraz.at,
	richard.fellner@student.tugraz.at, luto@kernel.org,
	torvalds@linux-foundation.org, keescook@google.com,
	hughd@google.com, x86@kernel.org
Subject: [PATCH 05/30] x86, kaiser: prepare assembly for entry/exit CR3 switching
Date: Fri, 10 Nov 2017 11:31:07 -0800	[thread overview]
Message-ID: <20171110193107.67B798C3@viggo.jf.intel.com> (raw)
In-Reply-To: <20171110193058.BECA7D88@viggo.jf.intel.com>


From: Dave Hansen <dave.hansen@linux.intel.com>

This is largely code from Andy Lutomirski.  I fixed a few bugs
in it, and added a few SWITCH_TO_* spots.

KAISER needs to switch to a different CR3 value when it enters
the kernel and switch back when it exits.  This essentially
needs to be done before leaving assembly code.

This is extra challenging because the switching context is
tricky: the registers that can be clobbered can vary.  It is also
hard to store things on the stack because there is an established
ABI (ptregs) or the stack is entirely unsafe to use.

This patch establishes a set of macros that allow changing to
the user and kernel CR3 values.

Interactions with SWAPGS: previous versions of the KAISER code
relied on having per-cpu scratch space to save/restore a register
that can be used for the CR3 MOV.  The %GS register is used to
index into our per-cpu space, so SWAPGS *had* to be done before
the CR3 switch.  That scratch space is gone now, but the semantic
that SWAPGS must be done before the CR3 MOV is retained.  This is
good to keep because it is not that hard to do and it allows us
to do things like add per-cpu debugging information to help us
figure out what goes wrong sometimes.

What this does in the NMI code is worth pointing out.  NMIs
can interrupt *any* context and they can also be nested with
NMIs interrupting other NMIs.  The comments below
".Lnmi_from_kernel" explain the format of the stack during this
situation.  Changing the format of this stack is not a fun
exercise: I tried.  Instead of storing the old CR3 value on the
stack, this patch depend on the *regular* register save/restore
mechanism and then uses %r14 to keep CR3 during the NMI.  It is
callee-saved and will not be clobbered by the C NMI handlers that
get called.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Moritz Lipp <moritz.lipp@iaik.tugraz.at>
Cc: Daniel Gruss <daniel.gruss@iaik.tugraz.at>
Cc: Michael Schwarz <michael.schwarz@iaik.tugraz.at>
Cc: Richard Fellner <richard.fellner@student.tugraz.at>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Kees Cook <keescook@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: x86@kernel.org
---

 b/arch/x86/entry/calling.h         |   65 +++++++++++++++++++++++++++++++++++++
 b/arch/x86/entry/entry_64.S        |   34 ++++++++++++++++---
 b/arch/x86/entry/entry_64_compat.S |    8 ++++
 3 files changed, 102 insertions(+), 5 deletions(-)

diff -puN arch/x86/entry/calling.h~kaiser-luto-base-cr3-work arch/x86/entry/calling.h
--- a/arch/x86/entry/calling.h~kaiser-luto-base-cr3-work	2017-11-10 11:22:07.191244954 -0800
+++ b/arch/x86/entry/calling.h	2017-11-10 11:22:07.198244954 -0800
@@ -1,5 +1,6 @@
 #include <linux/jump_label.h>
 #include <asm/unwind_hints.h>
+#include <asm/cpufeatures.h>
 
 /*
 
@@ -186,6 +187,70 @@ For 32-bit we have the following convent
 #endif
 .endm
 
+#ifdef CONFIG_KAISER
+
+/* KAISER PGDs are 8k.  We flip bit 12 to switch between the two halves: */
+#define KAISER_SWITCH_MASK (1<<PAGE_SHIFT)
+
+.macro ADJUST_KERNEL_CR3 reg:req
+	/* Clear "KAISER bit", point CR3 at kernel pagetables: */
+	andq	$(~KAISER_SWITCH_MASK), \reg
+.endm
+
+.macro ADJUST_USER_CR3 reg:req
+	/* Move CR3 up a page to the user page tables: */
+	orq	$(KAISER_SWITCH_MASK), \reg
+.endm
+
+.macro SWITCH_TO_KERNEL_CR3 scratch_reg:req
+	mov	%cr3, \scratch_reg
+	ADJUST_KERNEL_CR3 \scratch_reg
+	mov	\scratch_reg, %cr3
+.endm
+
+.macro SWITCH_TO_USER_CR3 scratch_reg:req
+	mov	%cr3, \scratch_reg
+	ADJUST_USER_CR3 \scratch_reg
+	mov	\scratch_reg, %cr3
+.endm
+
+.macro SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg:req save_reg:req
+	movq	%cr3, %r\scratch_reg
+	movq	%r\scratch_reg, \save_reg
+	/*
+	 * Is the switch bit zero?  This means the address is
+	 * up in real KAISER patches in a moment.
+	 */
+	testq	$(KAISER_SWITCH_MASK), %r\scratch_reg
+	jz	.Ldone_\@
+
+	ADJUST_KERNEL_CR3 %r\scratch_reg
+	movq	%r\scratch_reg, %cr3
+
+.Ldone_\@:
+.endm
+
+.macro RESTORE_CR3 save_reg:req
+	/*
+	 * We could avoid the CR3 write if not changing its value,
+	 * but that requires a CR3 read *and* a scratch register.
+	 */
+	movq	\save_reg, %cr3
+.endm
+
+#else /* CONFIG_KAISER=n: */
+
+.macro SWITCH_TO_KERNEL_CR3 scratch_reg:req
+.endm
+.macro SWITCH_TO_USER_CR3 scratch_reg:req
+.endm
+.macro SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg:req save_reg:req
+.endm
+.macro RESTORE_CR3 save_reg:req
+.endm
+
+#endif
+
 #endif /* CONFIG_X86_64 */
 
 /*
diff -puN arch/x86/entry/entry_64_compat.S~kaiser-luto-base-cr3-work arch/x86/entry/entry_64_compat.S
--- a/arch/x86/entry/entry_64_compat.S~kaiser-luto-base-cr3-work	2017-11-10 11:22:07.193244954 -0800
+++ b/arch/x86/entry/entry_64_compat.S	2017-11-10 11:22:07.198244954 -0800
@@ -91,6 +91,9 @@ ENTRY(entry_SYSENTER_compat)
 	pushq   $0			/* pt_regs->r15 = 0 */
 	cld
 
+	/* We just saved all the registers, so safe to clobber %rdi */
+	SWITCH_TO_KERNEL_CR3 scratch_reg=%rdi
+
 	/*
 	 * SYSENTER doesn't filter flags, so we need to clear NT and AC
 	 * ourselves.  To save a few cycles, we can check whether
@@ -214,6 +217,8 @@ GLOBAL(entry_SYSCALL_compat_after_hwfram
 	pushq   $0			/* pt_regs->r14 = 0 */
 	pushq   $0			/* pt_regs->r15 = 0 */
 
+	SWITCH_TO_KERNEL_CR3 scratch_reg=%rdi
+
 	/*
 	 * User mode is traced as though IRQs are on, and SYSENTER
 	 * turned them off.
@@ -240,6 +245,7 @@ sysret32_from_system_call:
 	popq	%rsi			/* pt_regs->si */
 	popq	%rdi			/* pt_regs->di */
 
+	SWITCH_TO_USER_CR3 scratch_reg=%r8
         /*
          * USERGS_SYSRET32 does:
          *  GSBASE = user's GS base
@@ -324,6 +330,8 @@ ENTRY(entry_INT80_compat)
 	pushq   %r15                    /* pt_regs->r15 */
 	cld
 
+	SWITCH_TO_KERNEL_CR3 scratch_reg=%r11
+
 	movq	%rsp, %rdi			/* pt_regs pointer */
 	call	sync_regs
 	movq	%rax, %rsp			/* switch stack */
diff -puN arch/x86/entry/entry_64.S~kaiser-luto-base-cr3-work arch/x86/entry/entry_64.S
--- a/arch/x86/entry/entry_64.S~kaiser-luto-base-cr3-work	2017-11-10 11:22:07.194244954 -0800
+++ b/arch/x86/entry/entry_64.S	2017-11-10 11:22:07.199244954 -0800
@@ -147,8 +147,6 @@ ENTRY(entry_SYSCALL_64)
 	movq	%rsp, PER_CPU_VAR(rsp_scratch)
 	movq	PER_CPU_VAR(cpu_current_top_of_stack), %rsp
 
-	TRACE_IRQS_OFF
-
 	/* Construct struct pt_regs on stack */
 	pushq	$__USER_DS			/* pt_regs->ss */
 	pushq	PER_CPU_VAR(rsp_scratch)	/* pt_regs->sp */
@@ -169,6 +167,13 @@ GLOBAL(entry_SYSCALL_64_after_hwframe)
 	sub	$(6*8), %rsp			/* pt_regs->bp, bx, r12-15 not saved */
 	UNWIND_HINT_REGS extra=0
 
+	/* NB: right here, all regs except r11 are live. */
+
+	SWITCH_TO_KERNEL_CR3 scratch_reg=%r11
+
+	/* Must wait until we have the kernel CR3 to call C functions: */
+	TRACE_IRQS_OFF
+
 	/*
 	 * If we need to do entry work or if we guess we'll need to do
 	 * exit work, go straight to the slow path.
@@ -340,6 +345,7 @@ syscall_return_via_sysret:
 	 * We are on the trampoline stack.  All regs except RDI are live.
 	 * We can do future final exit work right here.
 	 */
+	SWITCH_TO_USER_CR3 scratch_reg=%rdi
 
 	popq	%rdi
 	popq	%rsp
@@ -679,6 +685,8 @@ GLOBAL(swapgs_restore_regs_and_return_to
 	 * We can do future final exit work right here.
 	 */
 
+	SWITCH_TO_USER_CR3 scratch_reg=%rdi
+
 	/* Restore RDI. */
 	popq	%rdi
 	SWAPGS
@@ -1167,7 +1175,11 @@ ENTRY(paranoid_entry)
 	js	1f				/* negative -> in kernel */
 	SWAPGS
 	xorl	%ebx, %ebx
-1:	ret
+
+1:
+	SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg=ax save_reg=%r14
+
+	ret
 END(paranoid_entry)
 
 /*
@@ -1189,6 +1201,7 @@ ENTRY(paranoid_exit)
 	testl	%ebx, %ebx			/* swapgs needed? */
 	jnz	.Lparanoid_exit_no_swapgs
 	TRACE_IRQS_IRETQ
+	RESTORE_CR3	%r14
 	SWAPGS_UNSAFE_STACK
 	jmp	.Lparanoid_exit_restore
 .Lparanoid_exit_no_swapgs:
@@ -1217,6 +1230,9 @@ ENTRY(error_entry)
 	 */
 	SWAPGS
 
+	/* We have user CR3.  Change to kernel CR3. */
+	SWITCH_TO_KERNEL_CR3 scratch_reg=%rax
+
 .Lerror_entry_from_usermode_after_swapgs:
 	/*
 	 * We need to tell lockdep that IRQs are off.  We can't do this until
@@ -1263,9 +1279,10 @@ ENTRY(error_entry)
 
 .Lerror_bad_iret:
 	/*
-	 * We came from an IRET to user mode, so we have user gsbase.
-	 * Switch to kernel gsbase:
+	 * We came from an IRET to user mode, so we have user
+	 * gsbase and CR3.  Switch to kernel gsbase and CR3:
 	 */
+	SWITCH_TO_KERNEL_CR3 scratch_reg=%rax
 	SWAPGS
 
 	/*
@@ -1298,6 +1315,10 @@ END(error_exit)
 /*
  * Runs on exception stack.  Xen PV does not go through this path at all,
  * so we can use real assembly here.
+ *
+ * Registers:
+ *	%r14: Used to save/restore the CR3 of the interrupted context
+ *	      when KAISER is in use.  Do not clobber.
  */
 ENTRY(nmi)
 	UNWIND_HINT_IRET_REGS
@@ -1389,6 +1410,7 @@ ENTRY(nmi)
 	UNWIND_HINT_REGS
 	ENCODE_FRAME_POINTER
 
+	SWITCH_TO_KERNEL_CR3 scratch_reg=%rdi
 	/*
 	 * At this point we no longer need to worry about stack damage
 	 * due to nesting -- we're on the normal thread stack and we're
@@ -1613,6 +1635,8 @@ end_repeat_nmi:
 	movq	$-1, %rsi
 	call	do_nmi
 
+	RESTORE_CR3 save_reg=%r14
+
 	testl	%ebx, %ebx			/* swapgs needed? */
 	jnz	nmi_restore
 nmi_swapgs:
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2017-11-10 19:31 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-10 19:30 [PATCH 00/30] [v3] KAISER: unmap most of the kernel from userspace page tables Dave Hansen
2017-11-10 19:31 ` [PATCH 01/30] x86, mm: do not set _PAGE_USER for init_mm " Dave Hansen
2017-11-10 19:31 ` [PATCH 02/30] x86, tlb: Make CR4-based TLB flushes more robust Dave Hansen
2017-11-10 19:31 ` [PATCH 03/30] x86/mm: Document X86_CR4_PGE toggling behavior Dave Hansen
2017-11-10 19:31 ` [PATCH 04/30] x86, kaiser: disable global pages by default with KAISER Dave Hansen
2017-11-14 19:38   ` Rik van Riel
2017-11-26 14:48     ` Ingo Molnar
2017-11-27 11:37       ` Thomas Gleixner
2017-11-27 13:20         ` [PATCH v2] x86/mm/kaiser: Disable " Ingo Molnar
2017-11-27 13:23           ` Thomas Gleixner
2017-11-27 13:27             ` Ingo Molnar
2017-11-10 19:31 ` Dave Hansen [this message]
2017-11-20 12:17   ` [PATCH 05/30] x86, kaiser: prepare assembly for entry/exit CR3 switching Thomas Gleixner
2017-11-10 19:31 ` [PATCH 06/30] x86, kaiser: introduce user-mapped per-cpu areas Dave Hansen
2017-11-10 19:31 ` [PATCH 07/30] x86, kaiser: mark per-cpu data structures required for entry/exit Dave Hansen
2017-11-10 19:31 ` [PATCH 08/30] x86, kaiser: unmap kernel from userspace page tables (core patch) Dave Hansen
2017-11-20 17:21   ` Thomas Gleixner
2017-11-22 22:45     ` Dave Hansen
2017-11-22 22:50     ` Dave Hansen
2017-11-22 22:54     ` Dave Hansen
2017-11-22 23:11     ` Dave Hansen
2017-11-10 19:31 ` [PATCH 09/30] x86, kaiser: only populate shadow page tables for userspace Dave Hansen
2017-11-20 20:12   ` Thomas Gleixner
2017-11-21  7:05     ` Ingo Molnar
2017-11-21 22:09     ` Dave Hansen
2017-11-22  3:44       ` Andy Lutomirski
2017-11-22 23:30         ` Dave Hansen
2017-11-10 19:31 ` [PATCH 10/30] x86, kaiser: allow NX poison to be set in p4d/pgd Dave Hansen
2017-11-10 19:31 ` [PATCH 11/30] x86, kaiser: make sure static PGDs are 8k in size Dave Hansen
2017-11-10 19:31 ` [PATCH 12/30] x86, kaiser: map GDT into user page tables Dave Hansen
2017-11-20 20:22   ` Thomas Gleixner
2017-11-20 20:46     ` Andy Lutomirski
2017-11-20 20:55       ` Thomas Gleixner
2017-11-21 21:19       ` Dave Hansen
2017-11-21 22:46         ` Andy Lutomirski
2017-11-21 23:17           ` Dave Hansen
2017-11-21 23:32             ` Andy Lutomirski
2017-11-21 23:42               ` Dave Hansen
2017-11-22  0:17                 ` Andy Lutomirski
2017-11-22  0:37                   ` Dave Hansen
2017-11-21 22:12     ` Dave Hansen
2017-11-10 19:31 ` [PATCH 13/30] x86, kaiser: map dynamically-allocated LDTs Dave Hansen
2017-11-10 19:31 ` [PATCH 14/30] x86, kaiser: map espfix structures Dave Hansen
2017-11-10 19:31 ` [PATCH 15/30] x86, kaiser: map entry stack variables Dave Hansen
2017-11-10 19:31 ` [PATCH 16/30] x86, kaiser: map trace interrupt entry Dave Hansen
2017-11-10 19:31 ` [PATCH 17/30] x86, kaiser: map debug IDT tables Dave Hansen
2017-11-20 20:40   ` Thomas Gleixner
2017-11-21 22:16     ` Dave Hansen
2017-11-20 20:44   ` Andy Lutomirski
2017-11-20 20:54     ` Thomas Gleixner
2017-11-10 19:31 ` [PATCH 18/30] x86, kaiser: map virtually-addressed performance monitoring buffers Dave Hansen
2017-11-14 18:20   ` Peter Zijlstra
2017-11-14 18:28     ` Dave Hansen
2017-11-14 19:10       ` Hugh Dickins
2017-11-14 19:24         ` Andy Lutomirski
2017-11-15  9:41         ` Peter Zijlstra
2017-11-10 19:31 ` [PATCH 19/30] x86, mm: Move CR3 construction functions Dave Hansen
2017-11-10 19:31 ` [PATCH 20/30] x86, mm: remove hard-coded ASID limit checks Dave Hansen
2017-11-20 20:47   ` Thomas Gleixner
2017-11-10 19:31 ` [PATCH 21/30] x86, mm: put mmu-to-h/w ASID translation in one place Dave Hansen
2017-11-10 22:03   ` Andy Lutomirski
2017-11-10 22:09     ` Dave Hansen
2017-11-10 22:10       ` Andy Lutomirski
2017-11-10 19:31 ` [PATCH 22/30] x86, pcid, kaiser: allow flushing for future ASID switches Dave Hansen
2017-11-10 19:31 ` [PATCH 23/30] x86, kaiser: use PCID feature to make user and kernel switches faster Dave Hansen
2017-11-16 19:19   ` Andrea Arcangeli
2017-11-16 19:25     ` Dave Hansen
2017-11-10 19:31 ` [PATCH 24/30] x86, kaiser: disable native VSYSCALL Dave Hansen
2017-11-10 19:31 ` [PATCH 25/30] x86, kaiser: add debugfs file to turn KAISER on/off at runtime Dave Hansen
2017-11-10 19:31 ` [PATCH 26/30] x86, kaiser: add a function to check for KAISER being enabled Dave Hansen
2017-11-10 19:31 ` [PATCH 27/30] x86, kaiser: un-poison PGDs at runtime Dave Hansen
2017-11-10 19:31 ` [PATCH 28/30] x86, kaiser: allow KAISER to be enabled/disabled " Dave Hansen
2017-11-10 19:32 ` [PATCH 29/30] x86, kaiser: add Kconfig Dave Hansen
2017-11-10 19:32 ` [PATCH 30/30] x86, kaiser, xen: Dynamically disable KAISER when running under Xen PV Dave Hansen
2017-11-20 16:02 ` [PATCH 00/30] [v3] KAISER: unmap most of the kernel from userspace page tables Juerg Haefliger
  -- strict thread matches above, loose matches on Subject: below --
2017-11-08 19:46 [PATCH 00/30] [v2] " Dave Hansen
2017-11-08 19:46 ` [PATCH 05/30] x86, kaiser: prepare assembly for entry/exit CR3 switching Dave Hansen
2017-11-09 13:20   ` Borislav Petkov
2017-11-09 15:34     ` Dave Hansen
2017-11-09 15:59       ` Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171110193107.67B798C3@viggo.jf.intel.com \
    --to=dave.hansen@linux.intel.com \
    --cc=daniel.gruss@iaik.tugraz.at \
    --cc=hughd@google.com \
    --cc=keescook@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=michael.schwarz@iaik.tugraz.at \
    --cc=moritz.lipp@iaik.tugraz.at \
    --cc=richard.fellner@student.tugraz.at \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).