linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86/retpoline/entry: Disable the entire SYSCALL64 fast path with retpolines on
@ 2018-01-22 18:04 Andy Lutomirski
  2018-01-22 18:55 ` Linus Torvalds
  0 siblings, 1 reply; 28+ messages in thread
From: Andy Lutomirski @ 2018-01-22 18:04 UTC (permalink / raw)
  To: x86, LKML
  Cc: Linus Torvalds, Greg Kroah-Hartman, Alan Cox, Jann Horn,
	Samuel Neves, Dan Williams, Kernel Hardening, Borislav Petkov,
	Andy Lutomirski

The existing retpoline code carefully and awkwardly retpolinifies
the SYSCALL64 slow path.  This stops the fast path from being
particularly fast, and it's IMO rather messy.

Instead, just bypass the fast path entirely if retpolines are on.
This seems to be a speedup on a "minimal" retpoline kernel, mainly
because do_syscall_64() ends up calling the syscall handler without
using a slow retpoline thunk.

As an added benefit, we won't need to apply further Spectre
mitigations to the fast path.  The current fast path spectre
mitigations may have a hole: if the syscall nr is out of bounds, it
is plausible that the CPU would mispredict the bounds check and,
load a bogus function pointer, and speculatively execute it right
though the retpoline.  If this is indeed a problem, we need to fix
it in the slow paths anyway, but with this patch applied, we can at
least leave the fast path alone.

Cleans-up: 2641f08bb7fc ("x86/retpoline/entry: Convert entry assembler indirect jumps")
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/entry_64.S | 23 +++++++++++++++++------
 1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 4f8e1d35a97c..b915bad58754 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -245,6 +245,9 @@ GLOBAL(entry_SYSCALL_64_after_hwframe)
 	 * If we need to do entry work or if we guess we'll need to do
 	 * exit work, go straight to the slow path.
 	 */
+#ifdef CONFIG_RETPOLINE
+	ALTERNATIVE "", "jmp entry_SYSCALL64_slow_path", X86_FEATURE_RETPOLINE
+#endif
 	movq	PER_CPU_VAR(current_task), %r11
 	testl	$_TIF_WORK_SYSCALL_ENTRY|_TIF_ALLWORK_MASK, TASK_TI_flags(%r11)
 	jnz	entry_SYSCALL64_slow_path
@@ -270,13 +273,11 @@ entry_SYSCALL_64_fastpath:
 	 * This call instruction is handled specially in stub_ptregs_64.
 	 * It might end up jumping to the slow path.  If it jumps, RAX
 	 * and all argument registers are clobbered.
+	 *
+	 * NB: no retpoline needed -- we don't execute this code with
+	 * retpolines enabled.
 	 */
-#ifdef CONFIG_RETPOLINE
-	movq	sys_call_table(, %rax, 8), %rax
-	call	__x86_indirect_thunk_rax
-#else
 	call	*sys_call_table(, %rax, 8)
-#endif
 .Lentry_SYSCALL_64_after_fastpath_call:
 
 	movq	%rax, RAX(%rsp)
@@ -431,6 +432,9 @@ ENTRY(stub_ptregs_64)
 	 * which we achieve by trying again on the slow path.  If we are on
 	 * the slow path, the extra regs are already saved.
 	 *
+	 * This code is unreachable (even via mispredicted conditional branches)
+	 * if we're using retpolines.
+	 *
 	 * RAX stores a pointer to the C function implementing the syscall.
 	 * IRQs are on.
 	 */
@@ -448,12 +452,19 @@ ENTRY(stub_ptregs_64)
 	jmp	entry_SYSCALL64_slow_path
 
 1:
-	JMP_NOSPEC %rax				/* Called from C */
+	jmp	*%rax				/* Called from C */
 END(stub_ptregs_64)
 
 .macro ptregs_stub func
 ENTRY(ptregs_\func)
 	UNWIND_HINT_FUNC
+#ifdef CONFIG_RETPOLINE
+	/*
+	 * If retpolines are enabled, we don't use the syscall fast path,
+	 * so just jump straight to the syscall body.
+	 */
+	ALTERNATIVE "", __stringify(jmp \func), X86_FEATURE_RETPOLINE
+#endif
 	leaq	\func(%rip), %rax
 	jmp	stub_ptregs_64
 END(ptregs_\func)
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH] x86/retpoline/entry: Disable the entire SYSCALL64 fast path with retpolines on
  2018-01-22 18:04 [PATCH] x86/retpoline/entry: Disable the entire SYSCALL64 fast path with retpolines on Andy Lutomirski
@ 2018-01-22 18:55 ` Linus Torvalds
  2018-01-23  8:01   ` Ingo Molnar
                     ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Linus Torvalds @ 2018-01-22 18:55 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: the arch/x86 maintainers, LKML, Greg Kroah-Hartman, Alan Cox,
	Jann Horn, Samuel Neves, Dan Williams, Kernel Hardening,
	Borislav Petkov

On Mon, Jan 22, 2018 at 10:04 AM, Andy Lutomirski <luto@kernel.org> wrote:
> The existing retpoline code carefully and awkwardly retpolinifies
> the SYSCALL64 slow path.  This stops the fast path from being
> particularly fast, and it's IMO rather messy.

I'm not convinced your patch isn't messier still.. It's certainly
subtle. I had to look at that ptregs stub generator thing twice.

Honestly, I'd rather get rid of the fast-path entirely. Compared to
all the PTI mess, it's not even noticeable.

And if we ever get CPU's that have this all fixed, we can re-visit
introducing the fastpath. But this is all very messy and it doesn't
seem worth it right now.

If we get rid of the fastpath, we can lay out the slow path slightly
better, and get rid of some of those jump-overs. And we'd get rid of
the ptregs hooks entirely.

So we can try to make the "slow" path better while at it, but I really
don't think it matters much now in the post-PTI era. Sadly.

                     Linus

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] x86/retpoline/entry: Disable the entire SYSCALL64 fast path with retpolines on
  2018-01-22 18:55 ` Linus Torvalds
@ 2018-01-23  8:01   ` Ingo Molnar
  2018-01-23 18:36   ` Alan Cox
  2018-01-25 18:48   ` Linus Torvalds
  2 siblings, 0 replies; 28+ messages in thread
From: Ingo Molnar @ 2018-01-23  8:01 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, the arch/x86 maintainers, LKML,
	Greg Kroah-Hartman, Alan Cox, Jann Horn, Samuel Neves,
	Dan Williams, Kernel Hardening, Borislav Petkov


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Mon, Jan 22, 2018 at 10:04 AM, Andy Lutomirski <luto@kernel.org> wrote:
> > The existing retpoline code carefully and awkwardly retpolinifies
> > the SYSCALL64 slow path.  This stops the fast path from being
> > particularly fast, and it's IMO rather messy.
> 
> I'm not convinced your patch isn't messier still.. It's certainly
> subtle. I had to look at that ptregs stub generator thing twice.
> 
> Honestly, I'd rather get rid of the fast-path entirely. Compared to
> all the PTI mess, it's not even noticeable.
> 
> And if we ever get CPU's that have this all fixed, we can re-visit
> introducing the fastpath. But this is all very messy and it doesn't
> seem worth it right now.
> 
> If we get rid of the fastpath, we can lay out the slow path slightly
> better, and get rid of some of those jump-overs. And we'd get rid of
> the ptregs hooks entirely.
> 
> So we can try to make the "slow" path better while at it, but I really
> don't think it matters much now in the post-PTI era. Sadly.

Note that there's another advantage to your proposal: should other vulnerabilities 
arise in the future, requiring changes in the syscall entry path, we'd be more 
flexible to address them in the C space than in the assembly space.

In hindsight a _LOT_ of the PTI complexity and fragility centered around 
interacting with x86 kernel entry assembly code - which entry code fortunately got 
much simpler (and easier to review) in the past 1-2 years due to the thorough 
cleanups and the conversion of most of it to C. But it was still painful.

So I'm fully in favor of that.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] x86/retpoline/entry: Disable the entire SYSCALL64 fast path with retpolines on
  2018-01-22 18:55 ` Linus Torvalds
  2018-01-23  8:01   ` Ingo Molnar
@ 2018-01-23 18:36   ` Alan Cox
  2018-01-25 18:48   ` Linus Torvalds
  2 siblings, 0 replies; 28+ messages in thread
From: Alan Cox @ 2018-01-23 18:36 UTC (permalink / raw)
  To: Linus Torvalds, Andy Lutomirski
  Cc: the arch/x86 maintainers, LKML, Greg Kroah-Hartman, Jann Horn,
	Samuel Neves, Dan Williams, Kernel Hardening, Borislav Petkov

> And if we ever get CPU's that have this all fixed, we can re-visit
> introducing the fastpath. But this is all very messy and it doesn't
> seem worth it right now.

We already have CPUs it doesn't affect - older Atom for one and I
believe some or all of the VIA ones (still trying to get a definitive
confirmation).

Alan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] x86/retpoline/entry: Disable the entire SYSCALL64 fast path with retpolines on
  2018-01-22 18:55 ` Linus Torvalds
  2018-01-23  8:01   ` Ingo Molnar
  2018-01-23 18:36   ` Alan Cox
@ 2018-01-25 18:48   ` Linus Torvalds
  2018-01-25 19:16     ` [kernel-hardening] " Linus Torvalds
  2018-01-25 21:02     ` Andy Lutomirski
  2 siblings, 2 replies; 28+ messages in thread
From: Linus Torvalds @ 2018-01-25 18:48 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: the arch/x86 maintainers, LKML, Greg Kroah-Hartman, Alan Cox,
	Jann Horn, Samuel Neves, Dan Williams, Kernel Hardening,
	Borislav Petkov

[-- Attachment #1: Type: text/plain, Size: 937 bytes --]

On Mon, Jan 22, 2018 at 10:55 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Honestly, I'd rather get rid of the fast-path entirely. Compared to
> all the PTI mess, it's not even noticeable.

So I looked at how that would be.

Patch attached. Not really "tested", but I'm running the kernel with
this patch now, and 'strace' etc works, and honestly, it seems very
obvious.

Also, code generation for 'do_syscall_64()' does not look horrible. In
fact, it doesn't look all that much worse than the fast-path ever did.

So the biggest impact of this is the extra register saves
(SAVE_EXTRA_REGS) from setting up the full ptregs. And honestly, I
hate how that stupid macro still uses "movq reg,off(%rsp)" instead of
"pushq %reg".

Considering the diffstat:

 2 files changed, 2 insertions(+), 121 deletions(-)

and how those 100+ lines are nasty assembly code, I do think we should
just do it.

Comments?

              Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/plain, Size: 5124 bytes --]

 arch/x86/entry/entry_64.S   | 116 --------------------------------------------
 arch/x86/entry/syscall_64.c |   7 +--
 2 files changed, 2 insertions(+), 121 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index ff6f8022612c..65f4d29be69b 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -241,80 +241,6 @@ GLOBAL(entry_SYSCALL_64_after_hwframe)
 
 	TRACE_IRQS_OFF
 
-	/*
-	 * If we need to do entry work or if we guess we'll need to do
-	 * exit work, go straight to the slow path.
-	 */
-	movq	PER_CPU_VAR(current_task), %r11
-	testl	$_TIF_WORK_SYSCALL_ENTRY|_TIF_ALLWORK_MASK, TASK_TI_flags(%r11)
-	jnz	entry_SYSCALL64_slow_path
-
-entry_SYSCALL_64_fastpath:
-	/*
-	 * Easy case: enable interrupts and issue the syscall.  If the syscall
-	 * needs pt_regs, we'll call a stub that disables interrupts again
-	 * and jumps to the slow path.
-	 */
-	TRACE_IRQS_ON
-	ENABLE_INTERRUPTS(CLBR_NONE)
-#if __SYSCALL_MASK == ~0
-	cmpq	$__NR_syscall_max, %rax
-#else
-	andl	$__SYSCALL_MASK, %eax
-	cmpl	$__NR_syscall_max, %eax
-#endif
-	ja	1f				/* return -ENOSYS (already in pt_regs->ax) */
-	movq	%r10, %rcx
-
-	/*
-	 * This call instruction is handled specially in stub_ptregs_64.
-	 * It might end up jumping to the slow path.  If it jumps, RAX
-	 * and all argument registers are clobbered.
-	 */
-#ifdef CONFIG_RETPOLINE
-	movq	sys_call_table(, %rax, 8), %rax
-	call	__x86_indirect_thunk_rax
-#else
-	call	*sys_call_table(, %rax, 8)
-#endif
-.Lentry_SYSCALL_64_after_fastpath_call:
-
-	movq	%rax, RAX(%rsp)
-1:
-
-	/*
-	 * If we get here, then we know that pt_regs is clean for SYSRET64.
-	 * If we see that no exit work is required (which we are required
-	 * to check with IRQs off), then we can go straight to SYSRET64.
-	 */
-	DISABLE_INTERRUPTS(CLBR_ANY)
-	TRACE_IRQS_OFF
-	movq	PER_CPU_VAR(current_task), %r11
-	testl	$_TIF_ALLWORK_MASK, TASK_TI_flags(%r11)
-	jnz	1f
-
-	LOCKDEP_SYS_EXIT
-	TRACE_IRQS_ON		/* user mode is traced as IRQs on */
-	movq	RIP(%rsp), %rcx
-	movq	EFLAGS(%rsp), %r11
-	addq	$6*8, %rsp	/* skip extra regs -- they were preserved */
-	UNWIND_HINT_EMPTY
-	jmp	.Lpop_c_regs_except_rcx_r11_and_sysret
-
-1:
-	/*
-	 * The fast path looked good when we started, but something changed
-	 * along the way and we need to switch to the slow path.  Calling
-	 * raise(3) will trigger this, for example.  IRQs are off.
-	 */
-	TRACE_IRQS_ON
-	ENABLE_INTERRUPTS(CLBR_ANY)
-	SAVE_EXTRA_REGS
-	movq	%rsp, %rdi
-	call	syscall_return_slowpath	/* returns with IRQs disabled */
-	jmp	return_from_SYSCALL_64
-
-entry_SYSCALL64_slow_path:
 	/* IRQs are off. */
 	SAVE_EXTRA_REGS
 	movq	%rsp, %rdi
@@ -393,7 +319,6 @@ syscall_return_via_sysret:
 	/* rcx and r11 are already restored (see code above) */
 	UNWIND_HINT_EMPTY
 	POP_EXTRA_REGS
-.Lpop_c_regs_except_rcx_r11_and_sysret:
 	popq	%rsi	/* skip r11 */
 	popq	%r10
 	popq	%r9
@@ -424,47 +349,6 @@ syscall_return_via_sysret:
 	USERGS_SYSRET64
 END(entry_SYSCALL_64)
 
-ENTRY(stub_ptregs_64)
-	/*
-	 * Syscalls marked as needing ptregs land here.
-	 * If we are on the fast path, we need to save the extra regs,
-	 * which we achieve by trying again on the slow path.  If we are on
-	 * the slow path, the extra regs are already saved.
-	 *
-	 * RAX stores a pointer to the C function implementing the syscall.
-	 * IRQs are on.
-	 */
-	cmpq	$.Lentry_SYSCALL_64_after_fastpath_call, (%rsp)
-	jne	1f
-
-	/*
-	 * Called from fast path -- disable IRQs again, pop return address
-	 * and jump to slow path
-	 */
-	DISABLE_INTERRUPTS(CLBR_ANY)
-	TRACE_IRQS_OFF
-	popq	%rax
-	UNWIND_HINT_REGS extra=0
-	jmp	entry_SYSCALL64_slow_path
-
-1:
-	JMP_NOSPEC %rax				/* Called from C */
-END(stub_ptregs_64)
-
-.macro ptregs_stub func
-ENTRY(ptregs_\func)
-	UNWIND_HINT_FUNC
-	leaq	\func(%rip), %rax
-	jmp	stub_ptregs_64
-END(ptregs_\func)
-.endm
-
-/* Instantiate ptregs_stub for each ptregs-using syscall */
-#define __SYSCALL_64_QUAL_(sym)
-#define __SYSCALL_64_QUAL_ptregs(sym) ptregs_stub sym
-#define __SYSCALL_64(nr, sym, qual) __SYSCALL_64_QUAL_##qual(sym)
-#include <asm/syscalls_64.h>
-
 /*
  * %rdi: prev task
  * %rsi: next task
diff --git a/arch/x86/entry/syscall_64.c b/arch/x86/entry/syscall_64.c
index 9c09775e589d..c176d2fab1da 100644
--- a/arch/x86/entry/syscall_64.c
+++ b/arch/x86/entry/syscall_64.c
@@ -7,14 +7,11 @@
 #include <asm/asm-offsets.h>
 #include <asm/syscall.h>
 
-#define __SYSCALL_64_QUAL_(sym) sym
-#define __SYSCALL_64_QUAL_ptregs(sym) ptregs_##sym
-
-#define __SYSCALL_64(nr, sym, qual) extern asmlinkage long __SYSCALL_64_QUAL_##qual(sym)(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
+#define __SYSCALL_64(nr, sym, qual) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
 #include <asm/syscalls_64.h>
 #undef __SYSCALL_64
 
-#define __SYSCALL_64(nr, sym, qual) [nr] = __SYSCALL_64_QUAL_##qual(sym),
+#define __SYSCALL_64(nr, sym, qual) [nr] = sym,
 
 extern long sys_ni_syscall(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
 

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [kernel-hardening] Re: [PATCH] x86/retpoline/entry: Disable the entire SYSCALL64 fast path with retpolines on
  2018-01-25 18:48   ` Linus Torvalds
@ 2018-01-25 19:16     ` Linus Torvalds
  2018-01-25 20:04       ` Brian Gerst
  2018-01-25 21:02     ` Andy Lutomirski
  1 sibling, 1 reply; 28+ messages in thread
From: Linus Torvalds @ 2018-01-25 19:16 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: the arch/x86 maintainers, LKML, Greg Kroah-Hartman, Alan Cox,
	Jann Horn, Samuel Neves, Dan Williams, Kernel Hardening,
	Borislav Petkov

On Thu, Jan 25, 2018 at 10:48 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> So the biggest impact of this is the extra register saves

Actually, the other noticeable part is the reloading of the argument
registers from ptregs. Together with just the extra level of
'call/ret' and the stack setup, I'm guessing we're talking maybe 20
cycles or so.

So there's the extra register saves, and simply the fact that the
fastpath had a flatter calling structure.

It still feels worth it. And if we do decide that we want to do the
register clearing on kernel entry for some paranoid mode, we'd pretty
much have to do this anyway.

                 Linus

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] x86/retpoline/entry: Disable the entire SYSCALL64 fast path with retpolines on
  2018-01-25 19:16     ` [kernel-hardening] " Linus Torvalds
@ 2018-01-25 20:04       ` Brian Gerst
  2018-01-25 20:54         ` Linus Torvalds
  0 siblings, 1 reply; 28+ messages in thread
From: Brian Gerst @ 2018-01-25 20:04 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, the arch/x86 maintainers, LKML,
	Greg Kroah-Hartman, Alan Cox, Jann Horn, Samuel Neves,
	Dan Williams, Kernel Hardening, Borislav Petkov

In Thu, Jan 25, 2018 at 2:16 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Thu, Jan 25, 2018 at 10:48 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> So the biggest impact of this is the extra register saves
>
> Actually, the other noticeable part is the reloading of the argument
> registers from ptregs. Together with just the extra level of
> 'call/ret' and the stack setup, I'm guessing we're talking maybe 20
> cycles or so.
>
> So there's the extra register saves, and simply the fact that the
> fastpath had a flatter calling structure.
>
> It still feels worth it. And if we do decide that we want to do the
> register clearing on kernel entry for some paranoid mode, we'd pretty
> much have to do this anyway.
>
>                  Linus

Another extra step the slow path does is checking to see if ptregs is
safe for SYSRET.  I think that can be mitigated by moving the check to
the places that do modify ptregs (ptrace, sigreturn, and exec) which
would set a flag to force return with IRET if the modified regs do not
satisfy the criteria for SYSRET.

--
Brian Gerst

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] x86/retpoline/entry: Disable the entire SYSCALL64 fast path with retpolines on
  2018-01-25 20:04       ` Brian Gerst
@ 2018-01-25 20:54         ` Linus Torvalds
  0 siblings, 0 replies; 28+ messages in thread
From: Linus Torvalds @ 2018-01-25 20:54 UTC (permalink / raw)
  To: Brian Gerst
  Cc: Andy Lutomirski, the arch/x86 maintainers, LKML,
	Greg Kroah-Hartman, Alan Cox, Jann Horn, Samuel Neves,
	Dan Williams, Kernel Hardening, Borislav Petkov

On Thu, Jan 25, 2018 at 12:04 PM, Brian Gerst <brgerst@gmail.com> wrote:
>
> Another extra step the slow path does is checking to see if ptregs is
> safe for SYSRET.  I think that can be mitigated by moving the check to
> the places that do modify ptregs (ptrace, sigreturn, and exec) which
> would set a flag to force return with IRET if the modified regs do not
> satisfy the criteria for SYSRET.

I tried to do some profiling, and none of that shows up for me.

That said, what _also_ doesn't show up is the actual page table switch
on entry. And that seems to be because the per-pcu trampoline code
isn't captures by perf (or at least not shown). Oh well.

What _does_ show up a bit is this in prepare_exit_to_usermode():

#ifdef CONFIG_COMPAT
        /*
         * Compat syscalls set TS_COMPAT.  Make sure we clear it before
         * returning to user mode.  We need to clear it *after* signal
         * handling, because syscall restart has a fixup for compat
         * syscalls.  The fixup is exercised by the ptrace_syscall_32
         * selftest.
         *
         * We also need to clear TS_REGS_POKED_I386: the 32-bit tracer
         * special case only applies after poking regs and before the
         * very next return to user mode.
         */
        current->thread.status &= ~(TS_COMPAT|TS_I386_REGS_POKED);
#endif

and I think the problem there is that it is unnecessarily dirtying
that cacheline. Afaik, those bits are already clear 99.999% of the
time.

So things would be better if that 'status' would be in the thread-info
(to keep cachelines close to the other stuff we already touch) and the
code should have something like

        if (unlikely(ti->status & (TS_COMPAT|TS_I386_REGS_POKED)))

or whatever.

There might be other similar small tuning issues going on.

So there is room for improvement there in the slow path.

                Linus

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] x86/retpoline/entry: Disable the entire SYSCALL64 fast path with retpolines on
  2018-01-25 18:48   ` Linus Torvalds
  2018-01-25 19:16     ` [kernel-hardening] " Linus Torvalds
@ 2018-01-25 21:02     ` Andy Lutomirski
  2018-01-25 21:05       ` Thomas Gleixner
  2018-01-25 21:06       ` Linus Torvalds
  1 sibling, 2 replies; 28+ messages in thread
From: Andy Lutomirski @ 2018-01-25 21:02 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, the arch/x86 maintainers, LKML,
	Greg Kroah-Hartman, Alan Cox, Jann Horn, Samuel Neves,
	Dan Williams, Kernel Hardening, Borislav Petkov

On Thu, Jan 25, 2018 at 10:48 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Mon, Jan 22, 2018 at 10:55 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> Honestly, I'd rather get rid of the fast-path entirely. Compared to
>> all the PTI mess, it's not even noticeable.
>
> So I looked at how that would be.
>
> Patch attached. Not really "tested", but I'm running the kernel with
> this patch now, and 'strace' etc works, and honestly, it seems very
> obvious.
>
> Also, code generation for 'do_syscall_64()' does not look horrible. In
> fact, it doesn't look all that much worse than the fast-path ever did.
>
> So the biggest impact of this is the extra register saves
> (SAVE_EXTRA_REGS) from setting up the full ptregs. And honestly, I
> hate how that stupid macro still uses "movq reg,off(%rsp)" instead of
> "pushq %reg".
>
> Considering the diffstat:
>
>  2 files changed, 2 insertions(+), 121 deletions(-)
>
> and how those 100+ lines are nasty assembly code, I do think we should
> just do it.

Feel free to Acked-by: Andy Lutomirski <luto@kernel.org> that patch.

Or I can grab it and send it to -tip.

Re: the trampoline not showing up: if I find some time, I'll try to
wire it up correctly in kallsyms.

--Andy

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] x86/retpoline/entry: Disable the entire SYSCALL64 fast path with retpolines on
  2018-01-25 21:02     ` Andy Lutomirski
@ 2018-01-25 21:05       ` Thomas Gleixner
  2018-01-25 21:06       ` Linus Torvalds
  1 sibling, 0 replies; 28+ messages in thread
From: Thomas Gleixner @ 2018-01-25 21:05 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, the arch/x86 maintainers, LKML,
	Greg Kroah-Hartman, Alan Cox, Jann Horn, Samuel Neves,
	Dan Williams, Kernel Hardening, Borislav Petkov

On Thu, 25 Jan 2018, Andy Lutomirski wrote:
> On Thu, Jan 25, 2018 at 10:48 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> > On Mon, Jan 22, 2018 at 10:55 AM, Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> >>
> >> Honestly, I'd rather get rid of the fast-path entirely. Compared to
> >> all the PTI mess, it's not even noticeable.
> >
> > So I looked at how that would be.
> >
> > Patch attached. Not really "tested", but I'm running the kernel with
> > this patch now, and 'strace' etc works, and honestly, it seems very
> > obvious.
> >
> > Also, code generation for 'do_syscall_64()' does not look horrible. In
> > fact, it doesn't look all that much worse than the fast-path ever did.
> >
> > So the biggest impact of this is the extra register saves
> > (SAVE_EXTRA_REGS) from setting up the full ptregs. And honestly, I
> > hate how that stupid macro still uses "movq reg,off(%rsp)" instead of
> > "pushq %reg".
> >
> > Considering the diffstat:
> >
> >  2 files changed, 2 insertions(+), 121 deletions(-)
> >
> > and how those 100+ lines are nasty assembly code, I do think we should
> > just do it.
> 
> Feel free to Acked-by: Andy Lutomirski <luto@kernel.org> that patch.
> 
> Or I can grab it and send it to -tip.

That would be nice, so we can route it through x86/pti which provides it
for backporting cleanly.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] x86/retpoline/entry: Disable the entire SYSCALL64 fast path with retpolines on
  2018-01-25 21:02     ` Andy Lutomirski
  2018-01-25 21:05       ` Thomas Gleixner
@ 2018-01-25 21:06       ` Linus Torvalds
  2018-01-25 21:08         ` Andy Lutomirski
  1 sibling, 1 reply; 28+ messages in thread
From: Linus Torvalds @ 2018-01-25 21:06 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: the arch/x86 maintainers, LKML, Greg Kroah-Hartman, Alan Cox,
	Jann Horn, Samuel Neves, Dan Williams, Kernel Hardening,
	Borislav Petkov

On Thu, Jan 25, 2018 at 1:02 PM, Andy Lutomirski <luto@kernel.org> wrote:
>
> Feel free to Acked-by: Andy Lutomirski <luto@kernel.org> that patch.
>
> Or I can grab it and send it to -tip.

I'm not going to apply it for 4.15, I just wanted to see how it
looked, and do some minimal profiling.

>From the profiles, as mentioned, moving 'status' from thread_struct to
thread_info is probably worth doing. But I didn't look at the impact
of that at all.

So it should go through all the normal channels in -tip for 4.16.

I'll happily sign off on the patch, but it was really pretty mindless,
so I'm not sure I need the authorship either.

> Re: the trampoline not showing up: if I find some time, I'll try to
> wire it up correctly in kallsyms.

That would be lovely. Right now the system call exit shows up pretty
clearly in profiles, and most of it is (obviously) the cr3 write. So
the missing entry trampoline is not insignificant.

             Linus

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] x86/retpoline/entry: Disable the entire SYSCALL64 fast path with retpolines on
  2018-01-25 21:06       ` Linus Torvalds
@ 2018-01-25 21:08         ` Andy Lutomirski
  2018-01-25 21:20           ` Linus Torvalds
  0 siblings, 1 reply; 28+ messages in thread
From: Andy Lutomirski @ 2018-01-25 21:08 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, the arch/x86 maintainers, LKML,
	Greg Kroah-Hartman, Alan Cox, Jann Horn, Samuel Neves,
	Dan Williams, Kernel Hardening, Borislav Petkov

On Thu, Jan 25, 2018 at 1:06 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Thu, Jan 25, 2018 at 1:02 PM, Andy Lutomirski <luto@kernel.org> wrote:
>>
>> Feel free to Acked-by: Andy Lutomirski <luto@kernel.org> that patch.
>>
>> Or I can grab it and send it to -tip.
>
> I'm not going to apply it for 4.15, I just wanted to see how it
> looked, and do some minimal profiling.
>
> From the profiles, as mentioned, moving 'status' from thread_struct to
> thread_info is probably worth doing. But I didn't look at the impact
> of that at all.
>
> So it should go through all the normal channels in -tip for 4.16.
>
> I'll happily sign off on the patch, but it was really pretty mindless,
> so I'm not sure I need the authorship either.
>
>> Re: the trampoline not showing up: if I find some time, I'll try to
>> wire it up correctly in kallsyms.
>
> That would be lovely. Right now the system call exit shows up pretty
> clearly in profiles, and most of it is (obviously) the cr3 write. So
> the missing entry trampoline is not insignificant.
>

With retpoline, the retpoline in the trampoline sucks.  I don't need
perf for that -- I've benchmarked it both ways.  It sucks.  I'll fix
it, but it'll be kind of complicated.

--Andy

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] x86/retpoline/entry: Disable the entire SYSCALL64 fast path with retpolines on
  2018-01-25 21:08         ` Andy Lutomirski
@ 2018-01-25 21:20           ` Linus Torvalds
  2018-01-25 21:31             ` Andy Lutomirski
  0 siblings, 1 reply; 28+ messages in thread
From: Linus Torvalds @ 2018-01-25 21:20 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: the arch/x86 maintainers, LKML, Greg Kroah-Hartman, Alan Cox,
	Jann Horn, Samuel Neves, Dan Williams, Kernel Hardening,
	Borislav Petkov

On Thu, Jan 25, 2018 at 1:08 PM, Andy Lutomirski <luto@kernel.org> wrote:
>
> With retpoline, the retpoline in the trampoline sucks.  I don't need
> perf for that -- I've benchmarked it both ways.  It sucks.  I'll fix
> it, but it'll be kind of complicated.

Ahh, I'd forgotten about that (and obviously didn't see it in the profiles).

But yeah, that is fixable even if it does require a page per CPU. Or
did you have some clever scheme in mind?

But that's independent of the slow/fast path.

          Linus

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] x86/retpoline/entry: Disable the entire SYSCALL64 fast path with retpolines on
  2018-01-25 21:20           ` Linus Torvalds
@ 2018-01-25 21:31             ` Andy Lutomirski
  2018-01-25 21:39               ` Dan Williams
  2018-01-26 11:17               ` David Laight
  0 siblings, 2 replies; 28+ messages in thread
From: Andy Lutomirski @ 2018-01-25 21:31 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, the arch/x86 maintainers, LKML,
	Greg Kroah-Hartman, Alan Cox, Jann Horn, Samuel Neves,
	Dan Williams, Kernel Hardening, Borislav Petkov

On Thu, Jan 25, 2018 at 1:20 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Thu, Jan 25, 2018 at 1:08 PM, Andy Lutomirski <luto@kernel.org> wrote:
>>
>> With retpoline, the retpoline in the trampoline sucks.  I don't need
>> perf for that -- I've benchmarked it both ways.  It sucks.  I'll fix
>> it, but it'll be kind of complicated.
>
> Ahh, I'd forgotten about that (and obviously didn't see it in the profiles).
>
> But yeah, that is fixable even if it does require a page per CPU. Or
> did you have some clever scheme in mind?

Nothing clever.  I was going to see if I could get actual
binutils-generated relocations to work in the trampoline.  We already
have code to parse ELF relocations and turn them into a simple table,
and it shouldn't be *that* hard to run a separate pass on the entry
trampoline.

Another potentially useful if rather minor optimization would be to
rejigger the SYSCALL_DEFINE macros a bit.  Currently we treat all
syscalls like this:

long func(long arg0, long arg1, long arg2, long arg3, long arg4, long arg5);

I wonder if we'd be better off doing:

long func(const struct pt_regs *regs);

and autogenerating:

static long SyS_read(const struct pt_regs *regs)
{
   return sys_reg(regs->di, ...);
}

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] x86/retpoline/entry: Disable the entire SYSCALL64 fast path with retpolines on
  2018-01-25 21:31             ` Andy Lutomirski
@ 2018-01-25 21:39               ` Dan Williams
  2018-01-25 21:53                 ` Andy Lutomirski
  2018-01-25 21:53                 ` Linus Torvalds
  2018-01-26 11:17               ` David Laight
  1 sibling, 2 replies; 28+ messages in thread
From: Dan Williams @ 2018-01-25 21:39 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, the arch/x86 maintainers, LKML,
	Greg Kroah-Hartman, Alan Cox, Jann Horn, Samuel Neves,
	Kernel Hardening, Borislav Petkov

On Thu, Jan 25, 2018 at 1:31 PM, Andy Lutomirski <luto@kernel.org> wrote:
> On Thu, Jan 25, 2018 at 1:20 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>> On Thu, Jan 25, 2018 at 1:08 PM, Andy Lutomirski <luto@kernel.org> wrote:
>>>
>>> With retpoline, the retpoline in the trampoline sucks.  I don't need
>>> perf for that -- I've benchmarked it both ways.  It sucks.  I'll fix
>>> it, but it'll be kind of complicated.
>>
>> Ahh, I'd forgotten about that (and obviously didn't see it in the profiles).
>>
>> But yeah, that is fixable even if it does require a page per CPU. Or
>> did you have some clever scheme in mind?
>
> Nothing clever.  I was going to see if I could get actual
> binutils-generated relocations to work in the trampoline.  We already
> have code to parse ELF relocations and turn them into a simple table,
> and it shouldn't be *that* hard to run a separate pass on the entry
> trampoline.
>
> Another potentially useful if rather minor optimization would be to
> rejigger the SYSCALL_DEFINE macros a bit.  Currently we treat all
> syscalls like this:
>
> long func(long arg0, long arg1, long arg2, long arg3, long arg4, long arg5);
>
> I wonder if we'd be better off doing:
>
> long func(const struct pt_regs *regs);
>
> and autogenerating:
>
> static long SyS_read(const struct pt_regs *regs)
> {
>    return sys_reg(regs->di, ...);
> }

If you're rejiggering, can we also put in a mechanism for detecting
which registers to clear so that userspace can't inject useful values
into speculation paths?

https://patchwork.kernel.org/patch/10153753/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] x86/retpoline/entry: Disable the entire SYSCALL64 fast path with retpolines on
  2018-01-25 21:39               ` Dan Williams
@ 2018-01-25 21:53                 ` Andy Lutomirski
  2018-01-25 21:53                 ` Linus Torvalds
  1 sibling, 0 replies; 28+ messages in thread
From: Andy Lutomirski @ 2018-01-25 21:53 UTC (permalink / raw)
  To: Dan Williams
  Cc: Andy Lutomirski, Linus Torvalds, the arch/x86 maintainers, LKML,
	Greg Kroah-Hartman, Alan Cox, Jann Horn, Samuel Neves,
	Kernel Hardening, Borislav Petkov

On Thu, Jan 25, 2018 at 1:39 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> On Thu, Jan 25, 2018 at 1:31 PM, Andy Lutomirski <luto@kernel.org> wrote:
>> On Thu, Jan 25, 2018 at 1:20 PM, Linus Torvalds
>> <torvalds@linux-foundation.org> wrote:
>>> On Thu, Jan 25, 2018 at 1:08 PM, Andy Lutomirski <luto@kernel.org> wrote:
>>>>
>>>> With retpoline, the retpoline in the trampoline sucks.  I don't need
>>>> perf for that -- I've benchmarked it both ways.  It sucks.  I'll fix
>>>> it, but it'll be kind of complicated.
>>>
>>> Ahh, I'd forgotten about that (and obviously didn't see it in the profiles).
>>>
>>> But yeah, that is fixable even if it does require a page per CPU. Or
>>> did you have some clever scheme in mind?
>>
>> Nothing clever.  I was going to see if I could get actual
>> binutils-generated relocations to work in the trampoline.  We already
>> have code to parse ELF relocations and turn them into a simple table,
>> and it shouldn't be *that* hard to run a separate pass on the entry
>> trampoline.
>>
>> Another potentially useful if rather minor optimization would be to
>> rejigger the SYSCALL_DEFINE macros a bit.  Currently we treat all
>> syscalls like this:
>>
>> long func(long arg0, long arg1, long arg2, long arg3, long arg4, long arg5);
>>
>> I wonder if we'd be better off doing:
>>
>> long func(const struct pt_regs *regs);
>>
>> and autogenerating:
>>
>> static long SyS_read(const struct pt_regs *regs)
>> {
>>    return sys_reg(regs->di, ...);
>> }
>
> If you're rejiggering, can we also put in a mechanism for detecting
> which registers to clear so that userspace can't inject useful values
> into speculation paths?
>
> https://patchwork.kernel.org/patch/10153753/

My SYSCALL_DEFINE rejigger suggestion up-thread does this for free as
a side effect.

That being said, I think this would be more accurately characterized
as "so that userspace has a somewhat harder time injecting useful
values into speculation paths".

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] x86/retpoline/entry: Disable the entire SYSCALL64 fast path with retpolines on
  2018-01-25 21:39               ` Dan Williams
  2018-01-25 21:53                 ` Andy Lutomirski
@ 2018-01-25 21:53                 ` Linus Torvalds
  1 sibling, 0 replies; 28+ messages in thread
From: Linus Torvalds @ 2018-01-25 21:53 UTC (permalink / raw)
  To: Dan Williams
  Cc: Andy Lutomirski, the arch/x86 maintainers, LKML,
	Greg Kroah-Hartman, Alan Cox, Jann Horn, Samuel Neves,
	Kernel Hardening, Borislav Petkov

On Thu, Jan 25, 2018 at 1:39 PM, Dan Williams <dan.j.williams@intel.com> wrote:
>
> If you're rejiggering, can we also put in a mechanism for detecting
> which registers to clear so that userspace can't inject useful values
> into speculation paths?

That actually becomes trivial with just the "no fastpath" patch I sent
out. You can just clear all of them.

Sure, then do_syscall_64() will reload the six first ones, but since
those are the argument registers anyway, and since they are
caller-clobbered, they have very short lifetimes. So it would
effectively not really be an issue.

But yes, SYSCALL_DEFINEx() rejiggery would close even that tiny hole.

                   Linus

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH] x86/retpoline/entry: Disable the entire SYSCALL64 fast path with retpolines on
  2018-01-25 21:31             ` Andy Lutomirski
  2018-01-25 21:39               ` Dan Williams
@ 2018-01-26 11:17               ` David Laight
  2018-01-26 14:24                 ` Alan Cox
  1 sibling, 1 reply; 28+ messages in thread
From: David Laight @ 2018-01-26 11:17 UTC (permalink / raw)
  To: 'Andy Lutomirski', Linus Torvalds
  Cc: the arch/x86 maintainers, LKML, Greg Kroah-Hartman, Alan Cox,
	Jann Horn, Samuel Neves, Dan Williams, Kernel Hardening,
	Borislav Petkov

From: Andy Lutomirski
> Sent: 25 January 2018 21:31
...
> Another potentially useful if rather minor optimization would be to
> rejigger the SYSCALL_DEFINE macros a bit.  Currently we treat all
> syscalls like this:
> 
> long func(long arg0, long arg1, long arg2, long arg3, long arg4, long arg5);
> 
> I wonder if we'd be better off doing:
> 
> long func(const struct pt_regs *regs);
> 
> and autogenerating:
> 
> static long SyS_read(const struct pt_regs *regs)
> {
>    return sys_reg(regs->di, ...);
> }

Hmmm....
NetBSD (and the other BSD?) defines a structure for the arguments to each syscall.
On systems where all function arguments are put on stack the user stack that
contains the arguments is copied into a kernel buffer.
For amd64 I changed the register save area layout so that the arguments were in
the right order [1]. Then added an extra area for the extra arguments that had to be
read from the user stack.
Just passing a pointer into the save area has to be better than reading
all the values again.

[1] There was some horrid fallout from that :-(

	David


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] x86/retpoline/entry: Disable the entire SYSCALL64 fast path with retpolines on
  2018-01-26 11:17               ` David Laight
@ 2018-01-26 14:24                 ` Alan Cox
  2018-01-26 15:57                   ` Andy Lutomirski
  0 siblings, 1 reply; 28+ messages in thread
From: Alan Cox @ 2018-01-26 14:24 UTC (permalink / raw)
  To: David Laight, 'Andy Lutomirski', Linus Torvalds
  Cc: the arch/x86 maintainers, LKML, Greg Kroah-Hartman, Jann Horn,
	Samuel Neves, Dan Williams, Kernel Hardening, Borislav Petkov

> NetBSD (and the other BSD?) defines a structure for the arguments to
> each syscall.

Goes back to v7 or so but they put the syscall arguments into the uarea
so that no pointers were needed (uarea being a per process mapping at a
fixed address) in order to also reduce pointer dereferencing costs (not
that those matter much on modern processors)

Alan.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] x86/retpoline/entry: Disable the entire SYSCALL64 fast path with retpolines on
  2018-01-26 14:24                 ` Alan Cox
@ 2018-01-26 15:57                   ` Andy Lutomirski
  2018-01-26 17:40                     ` Linus Torvalds
  0 siblings, 1 reply; 28+ messages in thread
From: Andy Lutomirski @ 2018-01-26 15:57 UTC (permalink / raw)
  To: Alan Cox
  Cc: David Laight, Andy Lutomirski, Linus Torvalds,
	the arch/x86 maintainers, LKML, Greg Kroah-Hartman, Jann Horn,
	Samuel Neves, Dan Williams, Kernel Hardening, Borislav Petkov

On Fri, Jan 26, 2018 at 6:24 AM, Alan Cox <alan@linux.intel.com> wrote:
>> NetBSD (and the other BSD?) defines a structure for the arguments to
>> each syscall.
>
> Goes back to v7 or so but they put the syscall arguments into the uarea
> so that no pointers were needed (uarea being a per process mapping at a
> fixed address) in order to also reduce pointer dereferencing costs (not
> that those matter much on modern processors)
>

I gave the rearrangement like this a try yesterday and it's a bit of a
mess.  Part of the problem is that there are a bunch of pieces of code
that expect sys_xyz() to be actual callable functions.  The best way
to deal with that is probably to switch to calling normal functions.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] x86/retpoline/entry: Disable the entire SYSCALL64 fast path with retpolines on
  2018-01-26 15:57                   ` Andy Lutomirski
@ 2018-01-26 17:40                     ` Linus Torvalds
  2018-01-26 18:07                       ` Al Viro
  0 siblings, 1 reply; 28+ messages in thread
From: Linus Torvalds @ 2018-01-26 17:40 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Alan Cox, David Laight, the arch/x86 maintainers, LKML,
	Greg Kroah-Hartman, Jann Horn, Samuel Neves, Dan Williams,
	Kernel Hardening, Borislav Petkov

On Fri, Jan 26, 2018 at 7:57 AM, Andy Lutomirski <luto@kernel.org> wrote:
>
> I gave the rearrangement like this a try yesterday and it's a bit of a
> mess.  Part of the problem is that there are a bunch of pieces of code
> that expect sys_xyz() to be actual callable functions.

That's not supposed to be a mess.

That's part of why we do that whole indirection through SYSC##xyz to
sys##_xyz: the asm-callable ones will do the full casting of
troublesome arguments (some architectures have C calling sequence
rules that have security issues, so we need to make sure that the
arguments actually follow the right rules and 'int' arguments are
properly sign-extended etc).

So that whole indirection could be made to *also* create another
version of the syscall that instead took the arguments from ptregs.

We already do exactly that for the tracing events: look how
FTRACE_SYSCALLS ends up creating that extra metadata.

The ptreg version should be done the same way: don't make 'sys_xyz()'
take a struct ptregs, instead make those SYSCALL_DEFINE*() macros
create a _new_ function called 'ptregs_xyz()' and then that function
does the argument unpacking.

Then the x86 system call table can just be switched over to call those
ptreg versions instead.

                    Linus

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] x86/retpoline/entry: Disable the entire SYSCALL64 fast path with retpolines on
  2018-01-26 17:40                     ` Linus Torvalds
@ 2018-01-26 18:07                       ` Al Viro
  2018-01-26 18:13                         ` Linus Torvalds
  0 siblings, 1 reply; 28+ messages in thread
From: Al Viro @ 2018-01-26 18:07 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Alan Cox, David Laight,
	the arch/x86 maintainers, LKML, Greg Kroah-Hartman, Jann Horn,
	Samuel Neves, Dan Williams, Kernel Hardening, Borislav Petkov

On Fri, Jan 26, 2018 at 09:40:23AM -0800, Linus Torvalds wrote:
> On Fri, Jan 26, 2018 at 7:57 AM, Andy Lutomirski <luto@kernel.org> wrote:
> >
> > I gave the rearrangement like this a try yesterday and it's a bit of a
> > mess.  Part of the problem is that there are a bunch of pieces of code
> > that expect sys_xyz() to be actual callable functions.
> 
> That's not supposed to be a mess.
> 
> That's part of why we do that whole indirection through SYSC##xyz to
> sys##_xyz: the asm-callable ones will do the full casting of
> troublesome arguments (some architectures have C calling sequence
> rules that have security issues, so we need to make sure that the
> arguments actually follow the right rules and 'int' arguments are
> properly sign-extended etc).
> 
> So that whole indirection could be made to *also* create another
> version of the syscall that instead took the arguments from ptregs.
> 
> We already do exactly that for the tracing events: look how
> FTRACE_SYSCALLS ends up creating that extra metadata.
> 
> The ptreg version should be done the same way: don't make 'sys_xyz()'
> take a struct ptregs, instead make those SYSCALL_DEFINE*() macros
> create a _new_ function called 'ptregs_xyz()' and then that function
> does the argument unpacking.
> 
> Then the x86 system call table can just be switched over to call those
> ptreg versions instead.

Umm...  What about other architectures?  Or do you want SYSCALL_DEFINE...
to be per-arch?  I wonder how much would that "go through pt_regs" hurt
on something like sparc...

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] x86/retpoline/entry: Disable the entire SYSCALL64 fast path with retpolines on
  2018-01-26 18:07                       ` Al Viro
@ 2018-01-26 18:13                         ` Linus Torvalds
  2018-01-26 18:23                           ` Andy Lutomirski
  0 siblings, 1 reply; 28+ messages in thread
From: Linus Torvalds @ 2018-01-26 18:13 UTC (permalink / raw)
  To: Al Viro
  Cc: Andy Lutomirski, Alan Cox, David Laight,
	the arch/x86 maintainers, LKML, Greg Kroah-Hartman, Jann Horn,
	Samuel Neves, Dan Williams, Kernel Hardening, Borislav Petkov

On Fri, Jan 26, 2018 at 10:07 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Umm...  What about other architectures?  Or do you want SYSCALL_DEFINE...
> to be per-arch?  I wonder how much would that "go through pt_regs" hurt
> on something like sparc...

No, but I just talked to Will Deacon about register clearing on entry,
and so I suspect that arm64 might want something similar too.

So I think some opt-in for letting architectures add their own
function would be good. Because it wouldn't be all architectures, but
it probably _would_ be more than just x86.

You need to add architecture-specific "load argX from ptregs" macros anyway.

             Linus

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] x86/retpoline/entry: Disable the entire SYSCALL64 fast path with retpolines on
  2018-01-26 18:13                         ` Linus Torvalds
@ 2018-01-26 18:23                           ` Andy Lutomirski
  2018-01-26 18:54                             ` Linus Torvalds
  2018-01-29 13:19                             ` Will Deacon
  0 siblings, 2 replies; 28+ messages in thread
From: Andy Lutomirski @ 2018-01-26 18:23 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Al Viro, Andy Lutomirski, Alan Cox, David Laight,
	the arch/x86 maintainers, LKML, Greg Kroah-Hartman, Jann Horn,
	Samuel Neves, Dan Williams, Kernel Hardening, Borislav Petkov

On Fri, Jan 26, 2018 at 10:13 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Fri, Jan 26, 2018 at 10:07 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>>
>> Umm...  What about other architectures?  Or do you want SYSCALL_DEFINE...
>> to be per-arch?  I wonder how much would that "go through pt_regs" hurt
>> on something like sparc...
>
> No, but I just talked to Will Deacon about register clearing on entry,
> and so I suspect that arm64 might want something similar too.
>
> So I think some opt-in for letting architectures add their own
> function would be good. Because it wouldn't be all architectures, but
> it probably _would_ be more than just x86.
>
> You need to add architecture-specific "load argX from ptregs" macros anyway.

I mocked that up, and it's straightforward.  I ended up with something like:

#define __ARCH_SYSCALL_ARGS(n, ...) (regs->di, ...)

(obviously modified so it actually compiles.)

The issue is that doing it this way gives us, effectively:

long sys_foo(int a, int b)
{
  body here;
}

long SyS_foo(const struct pt_regs *regs)
{
  return sys_foo(regs->di, regs->si);
}

whereas what we want is *static* long sys_foo(...).  So I could split
the macros into:

DEFINE_SYSCALL2(foo, ....)

and

DEFINE_EXTERN_SYSCALL2(foo, ...)

or I could just fix up all the code that expects calling sys_foo()
across files to work.

My mockup patch doesn't actually work because of compat crap, but
that's manageable.

--Andy

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] x86/retpoline/entry: Disable the entire SYSCALL64 fast path with retpolines on
  2018-01-26 18:23                           ` Andy Lutomirski
@ 2018-01-26 18:54                             ` Linus Torvalds
  2018-01-26 19:02                               ` Andy Lutomirski
  2018-01-29 13:19                             ` Will Deacon
  1 sibling, 1 reply; 28+ messages in thread
From: Linus Torvalds @ 2018-01-26 18:54 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Al Viro, Alan Cox, David Laight, the arch/x86 maintainers, LKML,
	Greg Kroah-Hartman, Jann Horn, Samuel Neves, Dan Williams,
	Kernel Hardening, Borislav Petkov

On Fri, Jan 26, 2018 at 10:23 AM, Andy Lutomirski <luto@kernel.org> wrote:
>
> The issue is that doing it this way gives us, effectively:
>
> long sys_foo(int a, int b)
> {
>   body here;
> }
>
> long SyS_foo(const struct pt_regs *regs)
> {
>   return sys_foo(regs->di, regs->si);
> }
>
> whereas what we want is *static* long sys_foo(...).

How about just marking 'sys_foo()' as being always_inline (but still
not static)? Because the case that _matters_ is that SyS_foo(), thing
when this is enabled.

Sure, you'll get two copies of the code (one in SyS_foo(), the other
being the callable-from C 'sys_foo()' that is exported and almost
never used). But that seems a fairly small price to pay. We could
leave it for later to try to get rid of the unused copies entirely.

             Linus

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] x86/retpoline/entry: Disable the entire SYSCALL64 fast path with retpolines on
  2018-01-26 18:54                             ` Linus Torvalds
@ 2018-01-26 19:02                               ` Andy Lutomirski
  0 siblings, 0 replies; 28+ messages in thread
From: Andy Lutomirski @ 2018-01-26 19:02 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Al Viro, Alan Cox, David Laight,
	the arch/x86 maintainers, LKML, Greg Kroah-Hartman, Jann Horn,
	Samuel Neves, Dan Williams, Kernel Hardening, Borislav Petkov

On Fri, Jan 26, 2018 at 10:54 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Fri, Jan 26, 2018 at 10:23 AM, Andy Lutomirski <luto@kernel.org> wrote:
>>
>> The issue is that doing it this way gives us, effectively:
>>
>> long sys_foo(int a, int b)
>> {
>>   body here;
>> }
>>
>> long SyS_foo(const struct pt_regs *regs)
>> {
>>   return sys_foo(regs->di, regs->si);
>> }
>>
>> whereas what we want is *static* long sys_foo(...).
>
> How about just marking 'sys_foo()' as being always_inline (but still
> not static)? Because the case that _matters_ is that SyS_foo(), thing
> when this is enabled.
>
> Sure, you'll get two copies of the code (one in SyS_foo(), the other
> being the callable-from C 'sys_foo()' that is exported and almost
> never used). But that seems a fairly small price to pay. We could
> leave it for later to try to get rid of the unused copies entirely.
>

I could do that, but Josh Triplett will yell at me.

Anyway, I'll fiddle with it.  This isn't exactly high priority.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] x86/retpoline/entry: Disable the entire SYSCALL64 fast path with retpolines on
  2018-01-26 18:23                           ` Andy Lutomirski
  2018-01-26 18:54                             ` Linus Torvalds
@ 2018-01-29 13:19                             ` Will Deacon
  2018-01-29 15:23                               ` David Laight
  1 sibling, 1 reply; 28+ messages in thread
From: Will Deacon @ 2018-01-29 13:19 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, Al Viro, Alan Cox, David Laight,
	the arch/x86 maintainers, LKML, Greg Kroah-Hartman, Jann Horn,
	Samuel Neves, Dan Williams, Kernel Hardening, Borislav Petkov

Hi Andy,

On Fri, Jan 26, 2018 at 10:23:23AM -0800, Andy Lutomirski wrote:
> On Fri, Jan 26, 2018 at 10:13 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> > On Fri, Jan 26, 2018 at 10:07 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >>
> >> Umm...  What about other architectures?  Or do you want SYSCALL_DEFINE...
> >> to be per-arch?  I wonder how much would that "go through pt_regs" hurt
> >> on something like sparc...
> >
> > No, but I just talked to Will Deacon about register clearing on entry,
> > and so I suspect that arm64 might want something similar too.
> >
> > So I think some opt-in for letting architectures add their own
> > function would be good. Because it wouldn't be all architectures, but
> > it probably _would_ be more than just x86.
> >
> > You need to add architecture-specific "load argX from ptregs" macros anyway.
> 
> I mocked that up, and it's straightforward.  I ended up with something like:
> 
> #define __ARCH_SYSCALL_ARGS(n, ...) (regs->di, ...)
> 
> (obviously modified so it actually compiles.)
> 
> The issue is that doing it this way gives us, effectively:
> 
> long sys_foo(int a, int b)
> {
>   body here;
> }
> 
> long SyS_foo(const struct pt_regs *regs)
> {
>   return sys_foo(regs->di, regs->si);
> }
> 
> whereas what we want is *static* long sys_foo(...).  So I could split
> the macros into:
> 
> DEFINE_SYSCALL2(foo, ....)
> 
> and
> 
> DEFINE_EXTERN_SYSCALL2(foo, ...)
> 
> or I could just fix up all the code that expects calling sys_foo()
> across files to work.

Another issue with this style of macro definition exists on architectures
where the calling convention needs you to carry state around depending on
how you packed the previous parameters. For example, on 32-bit ARM, 64-bit
values are passed in adjacent pairs of registers but the low numbered
register needs to be even. This is what stopped me from trying to use
existing helpers such as syscall_get_arguments to unpack the pt_regs
and it generally means that anything that says "get me argument n" is going
to require constructing arguments 0..n-1 first.

To do this properly I think we'll either need to pass back the size and
current register offset to the arch code, or just allow the thing to be
overridden per syscall (the case above isn't especially frequent).

Will

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH] x86/retpoline/entry: Disable the entire SYSCALL64 fast path with retpolines on
  2018-01-29 13:19                             ` Will Deacon
@ 2018-01-29 15:23                               ` David Laight
  0 siblings, 0 replies; 28+ messages in thread
From: David Laight @ 2018-01-29 15:23 UTC (permalink / raw)
  To: 'Will Deacon', Andy Lutomirski
  Cc: Linus Torvalds, Al Viro, Alan Cox, the arch/x86 maintainers,
	LKML, Greg Kroah-Hartman, Jann Horn, Samuel Neves, Dan Williams,
	Kernel Hardening, Borislav Petkov

From: Will Deacon
> Sent: 29 January 2018 13:20
...
> Another issue with this style of macro definition exists on architectures
> where the calling convention needs you to carry state around depending on
> how you packed the previous parameters. For example, on 32-bit ARM, 64-bit
> values are passed in adjacent pairs of registers but the low numbered
> register needs to be even. This is what stopped me from trying to use
> existing helpers such as syscall_get_arguments to unpack the pt_regs
> and it generally means that anything that says "get me argument n" is going
> to require constructing arguments 0..n-1 first.
> 
> To do this properly I think we'll either need to pass back the size and
> current register offset to the arch code, or just allow the thing to be
> overridden per syscall (the case above isn't especially frequent).

If you generate a structure from the argument list that might work
'by magic'.
Certainly you can add explicit pads to any structure.

	David

	

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2018-01-29 15:23 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-22 18:04 [PATCH] x86/retpoline/entry: Disable the entire SYSCALL64 fast path with retpolines on Andy Lutomirski
2018-01-22 18:55 ` Linus Torvalds
2018-01-23  8:01   ` Ingo Molnar
2018-01-23 18:36   ` Alan Cox
2018-01-25 18:48   ` Linus Torvalds
2018-01-25 19:16     ` [kernel-hardening] " Linus Torvalds
2018-01-25 20:04       ` Brian Gerst
2018-01-25 20:54         ` Linus Torvalds
2018-01-25 21:02     ` Andy Lutomirski
2018-01-25 21:05       ` Thomas Gleixner
2018-01-25 21:06       ` Linus Torvalds
2018-01-25 21:08         ` Andy Lutomirski
2018-01-25 21:20           ` Linus Torvalds
2018-01-25 21:31             ` Andy Lutomirski
2018-01-25 21:39               ` Dan Williams
2018-01-25 21:53                 ` Andy Lutomirski
2018-01-25 21:53                 ` Linus Torvalds
2018-01-26 11:17               ` David Laight
2018-01-26 14:24                 ` Alan Cox
2018-01-26 15:57                   ` Andy Lutomirski
2018-01-26 17:40                     ` Linus Torvalds
2018-01-26 18:07                       ` Al Viro
2018-01-26 18:13                         ` Linus Torvalds
2018-01-26 18:23                           ` Andy Lutomirski
2018-01-26 18:54                             ` Linus Torvalds
2018-01-26 19:02                               ` Andy Lutomirski
2018-01-29 13:19                             ` Will Deacon
2018-01-29 15:23                               ` David Laight

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).