All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@kernel.org>
To: linux-kernel@vger.kernel.org
Cc: Dave Hansen <dave.hansen@linux.intel.com>,
	Andy Lutomirski <luto@amacapital.net>,
	Thomas Gleixner <tglx@linutronix.de>,
	"H . Peter Anvin" <hpa@zytor.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Borislav Petkov <bp@alien8.de>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH 15/43] x86/entry/64: Create a percpu SYSCALL entry trampoline
Date: Fri, 24 Nov 2017 18:23:43 +0100	[thread overview]
Message-ID: <20171124172411.19476-16-mingo@kernel.org> (raw)
In-Reply-To: <20171124172411.19476-1-mingo@kernel.org>

From: Andy Lutomirski <luto@kernel.org>

Handling SYSCALL is tricky: the SYSCALL handler is entered with every
single register (except FLAGS), including RSP, live.  It somehow needs
to set RSP to point to a valid stack, which means it needs to save the
user RSP somewhere and find its own stack pointer.  The canonical way
to do this is with SWAPGS, which lets us access percpu data using the
%gs prefix.

With KAISER-like pagetable switching, this is problematic.  Without a
scratch register, switching CR3 is impossible, so %gs-based percpu
memory would need to be mapped in the user pagetables.  Doing that
without information leaks is difficult or impossible.

Instead, use a different sneaky trick.  Map a copy of the first part
of the SYSCALL asm at a different address for each CPU.  Now RIP
varies depending on the CPU, so we can use RIP-relative memory access
to access percpu memory.  By putting the relevant information (one
scratch slot and the stack address) at a constant offset relative to
RIP, we can make SYSCALL work without relying on %gs.

A nice thing about this approach is that we can easily switch it on
and off if we want pagetable switching to be configurable.

The compat variant of SYSCALL doesn't have this problem in the first
place -- there are plenty of scratch registers, since we don't care
about preserving r8-r15.  This patch therefore doesn't touch SYSCALL32
at all.

XXX: Whenever we settle how KAISER gets turned on and off, we should do
the same to this.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/b95ccae0a5a2f090c901e49fce7c9e8ff6acd40d.1511497875.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/entry/entry_64.S     | 48 +++++++++++++++++++++++++++++++++++++++++++
 arch/x86/include/asm/fixmap.h |  2 ++
 arch/x86/kernel/asm-offsets.c |  1 +
 arch/x86/kernel/cpu/common.c  | 12 ++++++++++-
 arch/x86/kernel/vmlinux.lds.S | 10 +++++++++
 5 files changed, 72 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 426b8c669d6a..0cde243b7542 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -140,6 +140,54 @@ END(native_usergs_sysret64)
  * with them due to bugs in both AMD and Intel CPUs.
  */
 
+	.pushsection .entry_trampoline, "ax"
+
+/*
+ * The code in here gets remapped into cpu_entry_area's trampoline.  This means
+ * that the assembler and linker have the wrong idea as to where this code
+ * lives (and, in fact, it's mapped more than once, so it's not even at a
+ * fixed address).  So we can't reference any symbols outside the entry
+ * trampoline and expect it to work.
+ *
+ * Instead, we carefully abuse %rip-relative addressing.
+ * .Lentry_trampoline(%rip) refers to the start of the remapped) entry
+ * trampoline.  We can thus find cpu_entry_area with this macro:
+ */
+
+#define CPU_ENTRY_AREA \
+	_entry_trampoline - CPU_ENTRY_AREA_entry_trampoline(%rip)
+
+/* The top word of the SYSENTER stack is hot and is usable as scratch space. */
+#define RSP_SCRATCH CPU_ENTRY_AREA_tss + CPU_TSS_SYSENTER_stack + \
+	SIZEOF_SYSENTER_stack - 8 + CPU_ENTRY_AREA
+
+ENTRY(entry_SYSCALL_64_trampoline)
+	UNWIND_HINT_EMPTY
+	swapgs
+
+	/* Stash the user RSP. */
+	movq	%rsp, RSP_SCRATCH
+
+	/* Load the top of the task stack into RSP */
+	movq	CPU_ENTRY_AREA_tss + TSS_sp1 + CPU_ENTRY_AREA, %rsp
+
+	/* Start building the simulated IRET frame. */
+	pushq	$__USER_DS			/* pt_regs->ss */
+	pushq	RSP_SCRATCH			/* pt_regs->sp */
+	pushq	%r11				/* pt_regs->flags */
+	pushq	$__USER_CS			/* pt_regs->cs */
+	pushq	%rcx				/* pt_regs->ip */
+
+	/*
+	 * x86 lacks a near absolute jump, and we can't jump to the real
+	 * entry text with a relative jump, so we fake it using retq.
+	 */
+	pushq	$entry_SYSCALL_64_after_hwframe
+	retq
+END(entry_SYSCALL_64_trampoline)
+
+	.popsection
+
 ENTRY(entry_SYSCALL_64)
 	UNWIND_HINT_EMPTY
 	/*
diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
index 3a42da14c2cb..7eb1b5490395 100644
--- a/arch/x86/include/asm/fixmap.h
+++ b/arch/x86/include/asm/fixmap.h
@@ -58,6 +58,8 @@ struct cpu_entry_area {
 	 * of the TSS region.
 	 */
 	struct tss_struct tss;
+
+	char entry_trampoline[PAGE_SIZE];
 };
 
 #define CPU_ENTRY_AREA_PAGES (sizeof(struct cpu_entry_area) / PAGE_SIZE)
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index 55858b277cf6..61b1af88ac07 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -101,4 +101,5 @@ void common(void) {
 
 	/* Layout info for cpu_entry_area */
 	OFFSET(CPU_ENTRY_AREA_tss, cpu_entry_area, tss);
+	OFFSET(CPU_ENTRY_AREA_entry_trampoline, cpu_entry_area, entry_trampoline);
 }
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 7c82a8a8bfda..5a05db084659 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -507,6 +507,8 @@ DEFINE_PER_CPU(struct cpu_entry_area *, cpu_entry_area);
 static inline void setup_cpu_entry_area(int cpu)
 {
 #ifdef CONFIG_X86_64
+	extern char _entry_trampoline[];
+
 	/* On 64-bit systems, we use a read-only fixmap GDT. */
 	pgprot_t gdt_prot = PAGE_KERNEL_RO;
 #else
@@ -553,6 +555,11 @@ static inline void setup_cpu_entry_area(int cpu)
 #ifdef CONFIG_X86_32
 	this_cpu_write(cpu_entry_area, get_cpu_entry_area(cpu));
 #endif
+
+#ifdef CONFIG_X86_64
+	__set_fixmap(get_cpu_entry_area_index(cpu, entry_trampoline),
+		     __pa_symbol(_entry_trampoline), PAGE_KERNEL_RX);
+#endif
 }
 
 /* Load the original GDT from the per-cpu structure */
@@ -1417,10 +1424,13 @@ static DEFINE_PER_CPU_PAGE_ALIGNED(char, exception_stacks
 /* May not be marked __init: used by software suspend */
 void syscall_init(void)
 {
+	extern char _entry_trampoline[];
+	extern char entry_SYSCALL_64_trampoline[];
+
 	int cpu = smp_processor_id();
 
 	wrmsr(MSR_STAR, 0, (__USER32_CS << 16) | __KERNEL_CS);
-	wrmsrl(MSR_LSTAR, (unsigned long)entry_SYSCALL_64);
+	wrmsrl(MSR_LSTAR, (unsigned long)get_cpu_entry_area(cpu)->entry_trampoline + (entry_SYSCALL_64_trampoline - _entry_trampoline));
 
 #ifdef CONFIG_IA32_EMULATION
 	wrmsrl(MSR_CSTAR, (unsigned long)entry_SYSCALL_compat);
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index a4009fb9be87..2738cfb6c8c8 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -107,6 +107,16 @@ SECTIONS
 		SOFTIRQENTRY_TEXT
 		*(.fixup)
 		*(.gnu.warning)
+
+#ifdef CONFIG_X86_64
+		/* Entry trampoline */
+		. = ALIGN(PAGE_SIZE);
+		_entry_trampoline = .;
+		*(.entry_trampoline)
+		. = ALIGN(PAGE_SIZE);
+		ASSERT(. - _entry_trampoline == PAGE_SIZE, "entry trampoline is too big");
+#endif
+
 		/* End of text section */
 		_etext = .;
 	} :text = 0x9090
-- 
2.14.1

  parent reply	other threads:[~2017-11-24 17:31 UTC|newest]

Thread overview: 113+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-24 17:23 [PATCH 00/43] x86 entry-stack and Kaiser series, 2017/11/24, v2 version Ingo Molnar
2017-11-24 17:23 ` [PATCH 01/43] x86/decoder: Add new TEST instruction pattern Ingo Molnar
2017-11-24 17:23 ` [PATCH 02/43] x86/asm/64: Allocate and enable the SYSENTER stack Ingo Molnar
2017-11-24 17:23 ` [PATCH 03/43] x86/dumpstack: Add get_stack_info() support for " Ingo Molnar
2017-11-24 17:23 ` [PATCH 04/43] x86/gdt: Put per-cpu GDT remaps in ascending order Ingo Molnar
2017-11-24 17:23 ` [PATCH 05/43] x86/fixmap: Generalize the GDT fixmap mechanism Ingo Molnar
2017-11-24 17:23 ` [PATCH 06/43] x86/kasan/64: Teach KASAN about the cpu_entry_area Ingo Molnar
2017-11-24 17:23 ` [PATCH 07/43] x86/asm: Fix assumptions that the HW TSS is at the beginning of cpu_tss Ingo Molnar
2017-11-24 17:23 ` [PATCH 08/43] x86/dumpstack: Handle stack overflow on all stacks Ingo Molnar
2017-11-24 17:23 ` [PATCH 09/43] x86/asm: Move SYSENTER_stack to the beginning of struct tss_struct Ingo Molnar
2017-11-24 17:23 ` [PATCH 10/43] x86/asm: Remap the TSS into the cpu entry area Ingo Molnar
2017-11-24 17:23 ` [PATCH 11/43] x86/asm/64: Separate cpu_current_top_of_stack from TSS.sp0 Ingo Molnar
2017-11-24 17:23 ` [PATCH 12/43] x86/espfix/64: Stop assuming that pt_regs is on the entry stack Ingo Molnar
2017-11-24 18:25   ` Borislav Petkov
2017-11-24 19:12     ` Andy Lutomirski
2017-11-26 14:05       ` Ingo Molnar
2017-11-26 17:28         ` Borislav Petkov
2017-11-27  9:19           ` Ingo Molnar
2017-11-24 17:23 ` [PATCH 13/43] x86/asm/64: Use a percpu trampoline stack for IDT entries Ingo Molnar
2017-11-24 19:02   ` Borislav Petkov
2017-11-26 14:16     ` Ingo Molnar
2017-11-24 17:23 ` [PATCH 14/43] x86/asm/64: Return to userspace from the trampoline stack Ingo Molnar
2017-11-24 19:10   ` Borislav Petkov
2017-11-26 14:18     ` Ingo Molnar
2017-11-26 17:33       ` Borislav Petkov
2017-11-24 17:23 ` Ingo Molnar [this message]
2017-11-25 11:40   ` [PATCH 15/43] x86/entry/64: Create a percpu SYSCALL entry trampoline Borislav Petkov
2017-11-25 15:00     ` Andy Lutomirski
2017-11-26 14:26       ` [PATCH] " Ingo Molnar
2017-11-24 17:23 ` [PATCH 16/43] x86/irq: Remove an old outdated comment about context tracking races Ingo Molnar
2017-11-25 12:05   ` Borislav Petkov
2017-11-24 17:23 ` [PATCH 17/43] x86/irq/64: In the stack overflow warning, print the offending IP Ingo Molnar
2017-11-25 12:07   ` Borislav Petkov
2017-11-24 17:23 ` [PATCH 18/43] x86/entry/64: Move the IST stacks into cpu_entry_area Ingo Molnar
2017-11-25 12:34   ` Borislav Petkov
2017-11-24 17:23 ` [PATCH 19/43] x86/entry/64: Remove the SYSENTER stack canary Ingo Molnar
2017-11-25 15:29   ` Borislav Petkov
2017-11-24 17:23 ` [PATCH 20/43] x86/entry: Clean up SYSENTER_stack code Ingo Molnar
2017-11-25 16:39   ` Borislav Petkov
2017-11-25 16:50     ` Thomas Gleixner
2017-11-25 16:55       ` Andy Lutomirski
2017-11-25 17:03         ` Thomas Gleixner
2017-11-25 17:10           ` Borislav Petkov
2017-11-25 17:26             ` Andy Lutomirski
2017-11-27  9:27               ` Peter Zijlstra
2017-11-24 17:23 ` [PATCH 21/43] x86/mm/kaiser: Disable global pages by default with KAISER Ingo Molnar
2017-11-24 17:23 ` [PATCH 22/43] x86/mm/kaiser: Prepare assembly for entry/exit CR3 switching Ingo Molnar
2017-11-25  0:02   ` Thomas Gleixner
2017-11-25 12:41     ` Thomas Gleixner
2017-11-26 11:50   ` Borislav Petkov
2017-11-26 14:55     ` [PATCH v2] x86/mm/kaiser: Prepare the x86/entry assembly code " Ingo Molnar
2017-11-27 13:29       ` Josh Poimboeuf
2017-11-27 13:36         ` Thomas Gleixner
2017-11-24 17:23 ` [PATCH 23/43] x86/mm/kaiser: Introduce user-mapped per-cpu areas Ingo Molnar
2017-11-26 17:41   ` Borislav Petkov
2017-11-27  9:26     ` Ingo Molnar
2017-11-27 21:14     ` Dave Hansen
2017-11-24 17:23 ` [PATCH 24/43] x86/mm/kaiser: Mark per-cpu data structures required for entry/exit Ingo Molnar
2017-11-25 17:17   ` Thomas Gleixner
2017-11-26 15:54     ` Ingo Molnar
2017-11-24 17:23 ` [PATCH 25/43] x86/mm/kaiser: Unmap kernel from userspace page tables (core patch) Ingo Molnar
2017-11-26 18:51   ` Borislav Petkov
2017-11-27  9:30     ` Ingo Molnar
2017-11-26 20:49   ` Borislav Petkov
2017-11-27 10:38     ` Ingo Molnar
2017-11-26 22:25   ` [PATCH 25/43] x86/mm/kaiser: Unmap kernel from userspace page tables (core patch), noexec=off Borislav Petkov
2017-11-26 22:41     ` Thomas Gleixner
2017-11-24 17:23 ` [PATCH 26/43] x86/mm/kaiser: Allow NX poison to be set in p4d/pgd Ingo Molnar
2017-11-24 17:23 ` [PATCH 27/43] x86/mm/kaiser: Make sure static PGDs are 8k in size Ingo Molnar
2017-11-24 17:23 ` [PATCH 28/43] x86/mm/kaiser: Map cpu entry area Ingo Molnar
2017-11-25 21:40   ` Thomas Gleixner
2017-11-26 15:19     ` Ingo Molnar
2017-11-24 17:23 ` [PATCH 29/43] x86/mm/kaiser: Map dynamically-allocated LDTs Ingo Molnar
2017-11-24 17:23 ` [PATCH 30/43] x86/mm/kaiser: Map espfix structures Ingo Molnar
2017-11-24 17:23 ` [PATCH 31/43] x86/mm/kaiser: Map entry stack variable Ingo Molnar
2017-11-24 17:24 ` [PATCH 32/43] x86/mm/kaiser: Map virtually-addressed performance monitoring buffers Ingo Molnar
2017-11-24 17:24 ` [PATCH 33/43] x86/mm: Move CR3 construction functions Ingo Molnar
2017-11-24 17:24 ` [PATCH 34/43] x86/mm: Remove hard-coded ASID limit checks Ingo Molnar
2017-11-24 17:24 ` [PATCH 35/43] x86/mm: Put mmu-to-h/w ASID translation in one place Ingo Molnar
2017-11-24 17:24 ` [PATCH 36/43] x86/mm/kaiser: Allow flushing for future ASID switches Ingo Molnar
2017-11-24 17:24 ` [PATCH 37/43] x86/mm/kaiser: Use PCID feature to make user and kernel switches faster Ingo Molnar
2017-11-24 17:24 ` [PATCH 38/43] x86/mm/kaiser: Disable native VSYSCALL Ingo Molnar
2017-11-24 17:24 ` [PATCH 39/43] x86/mm/kaiser: Add debugfs file to turn KAISER on/off at runtime Ingo Molnar
2017-11-24 17:24 ` [PATCH 40/43] x86/mm/kaiser: Add a function to check for KAISER being enabled Ingo Molnar
2017-11-24 17:24 ` [PATCH 41/43] x86/mm/kaiser: Un-poison PGDs at runtime Ingo Molnar
2017-11-24 17:24 ` [PATCH 42/43] x86/mm/kaiser: Allow KAISER to be enabled/disabled " Ingo Molnar
2017-11-25 19:18   ` Thomas Gleixner
2017-11-25 19:53     ` Andy Lutomirski
2017-11-25 20:05       ` Thomas Gleixner
2017-11-25 22:10         ` Andy Lutomirski
2017-11-25 22:48           ` Thomas Gleixner
2017-11-26  0:21             ` Andy Lutomirski
2017-11-26  8:11               ` Thomas Gleixner
2017-11-24 17:24 ` [PATCH 43/43] x86/mm/kaiser: Add Kconfig Ingo Molnar
2017-11-24 20:22 ` [crash] PANIC: double fault, error_code: 0x0 Ingo Molnar
2017-11-24 20:59   ` Andy Lutomirski
2017-11-24 21:49     ` Ingo Molnar
2017-11-24 21:52       ` Ingo Molnar
2017-11-24 22:09   ` Ingo Molnar
2017-11-24 22:35     ` Andy Lutomirski
2017-11-24 22:53       ` Ingo Molnar
2017-11-25  9:21         ` Ingo Molnar
2017-11-25  9:32           ` Ingo Molnar
2017-11-25  9:39             ` Ingo Molnar
2017-11-25 11:17               ` [PATCH] x86/mm/kaiser: Fix IRQ entries text section mapping Ingo Molnar
2017-11-25 16:08                 ` Thomas Gleixner
2017-11-25 20:06                   ` Steven Rostedt
2017-11-27  8:14                   ` Peter Zijlstra
2017-11-27  8:21                     ` Peter Zijlstra
2017-11-25  4:09   ` [crash] PANIC: double fault, error_code: 0x0 Dave Hansen
2017-11-25  4:15     ` Dave Hansen
  -- strict thread matches above, loose matches on Subject: below --
2017-11-24  9:14 [PATCH 00/43] x86 entry-stack and Kaiser series, 2017/11/24 version Ingo Molnar
2017-11-24  9:14 ` [PATCH 15/43] x86/entry/64: Create a percpu SYSCALL entry trampoline Ingo Molnar
2017-11-24 13:52   ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171124172411.19476-16-mingo@kernel.org \
    --to=mingo@kernel.org \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.