* [PATCH 0/3] x86/entry: clear registers to sanitize speculative usages @ 2018-02-03 23:21 Dan Williams 2018-02-03 23:21 ` [PATCH 1/3] x86/entry: Clear extra registers beyond syscall arguments for 64bit kernels Dan Williams ` (2 more replies) 0 siblings, 3 replies; 18+ messages in thread From: Dan Williams @ 2018-02-03 23:21 UTC (permalink / raw) To: tglx Cc: Andi Kleen, x86, linux-kernel, Ingo Molnar, luto, H. Peter Anvin, torvalds At entry userspace may have populated callee saved registers with values that could be useful in a speculative execution attack. Clear them to minimize the kernel's attack surface. Note, this is done to make it harder to find / manipulate exploitable sequences in the kernel. The clearing is limited to the 64-bit 'extra' registers since those are the most likely to survive with user populated values deep into the call chain. Normal register pressure likely clobbers values in the lower registers and the 32-bit case. As for cycle impact on my Sandy Bridge test system it can handle the xor sequence at 3.5 instructions per cycle. --- Andi Kleen (2): x86/entry: Clear registers for 64bit exceptions/interrupts x86/entry: Clear registers for compat syscalls Dan Williams (1): x86/entry: Clear extra registers beyond syscall arguments for 64bit kernels arch/x86/entry/calling.h | 17 +++++++++++++++++ arch/x86/entry/entry_64.S | 6 ++++++ arch/x86/entry/entry_64_compat.S | 3 +++ 3 files changed, 26 insertions(+) ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 1/3] x86/entry: Clear extra registers beyond syscall arguments for 64bit kernels 2018-02-03 23:21 [PATCH 0/3] x86/entry: clear registers to sanitize speculative usages Dan Williams @ 2018-02-03 23:21 ` Dan Williams 2018-02-04 0:14 ` Andy Lutomirski 2018-02-04 13:01 ` Brian Gerst 2018-02-03 23:21 ` [PATCH 2/3] x86/entry: Clear registers for 64bit exceptions/interrupts Dan Williams 2018-02-03 23:21 ` [PATCH 3/3] x86/entry: Clear registers for compat syscalls Dan Williams 2 siblings, 2 replies; 18+ messages in thread From: Dan Williams @ 2018-02-03 23:21 UTC (permalink / raw) To: tglx Cc: Andi Kleen, x86, linux-kernel, Ingo Molnar, luto, H. Peter Anvin, torvalds At entry userspace may have populated the extra registers outside the syscall calling convention with values that could be useful in a speculative execution attack. Clear them to minimize the kernel's attack surface. Note, this only clears the extra registers and not the unused registers for syscalls less than 6 arguments since those registers are likely to be clobbered well before their values could be put to use under speculation. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Cc: Andy Lutomirski <luto@kernel.org> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Reported-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- arch/x86/entry/calling.h | 17 +++++++++++++++++ arch/x86/entry/entry_64.S | 1 + 2 files changed, 18 insertions(+) diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h index 3f48f695d5e6..daee2d19e73d 100644 --- a/arch/x86/entry/calling.h +++ b/arch/x86/entry/calling.h @@ -147,6 +147,23 @@ For 32-bit we have the following conventions - kernel is built with UNWIND_HINT_REGS offset=\offset .endm + /* + * Sanitize extra registers of values that a speculation attack + * might want to exploit. In the CONFIG_FRAME_POINTER=y case, + * the expectation is that %ebp will be clobbered before it + * could be used. + */ + .macro CLEAR_EXTRA_REGS_NOSPEC + xorq %r15, %r15 + xorq %r14, %r14 + xorq %r13, %r13 + xorq %r12, %r12 + xorl %ebx, %ebx +#ifndef CONFIG_FRAME_POINTER + xorl %ebp, %ebp +#endif + .endm + .macro POP_EXTRA_REGS popq %r15 popq %r14 diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index c752abe89d80..46260e951da6 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -247,6 +247,7 @@ GLOBAL(entry_SYSCALL_64_after_hwframe) TRACE_IRQS_OFF /* IRQs are off. */ + CLEAR_EXTRA_REGS_NOSPEC movq %rsp, %rdi call do_syscall_64 /* returns with IRQs disabled */ ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 1/3] x86/entry: Clear extra registers beyond syscall arguments for 64bit kernels 2018-02-03 23:21 ` [PATCH 1/3] x86/entry: Clear extra registers beyond syscall arguments for 64bit kernels Dan Williams @ 2018-02-04 0:14 ` Andy Lutomirski 2018-02-04 1:25 ` Dan Williams 2018-02-04 13:01 ` Brian Gerst 1 sibling, 1 reply; 18+ messages in thread From: Andy Lutomirski @ 2018-02-04 0:14 UTC (permalink / raw) To: Dan Williams Cc: Thomas Gleixner, Andi Kleen, X86 ML, LKML, Ingo Molnar, Andrew Lutomirski, H. Peter Anvin, Linus Torvalds On Sat, Feb 3, 2018 at 11:21 PM, Dan Williams <dan.j.williams@intel.com> wrote: > At entry userspace may have populated the extra registers outside the > syscall calling convention with values that could be useful in a > speculative execution attack. Clear them to minimize the kernel's attack > surface. Note, this only clears the extra registers and not the unused > registers for syscalls less than 6 arguments since those registers are > likely to be clobbered well before their values could be put to use > under speculation. > > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: Ingo Molnar <mingo@redhat.com> > Cc: "H. Peter Anvin" <hpa@zytor.com> > Cc: x86@kernel.org > Cc: Andy Lutomirski <luto@kernel.org> > Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> > Reported-by: Andi Kleen <ak@linux.intel.com> > Signed-off-by: Dan Williams <dan.j.williams@intel.com> > --- > arch/x86/entry/calling.h | 17 +++++++++++++++++ > arch/x86/entry/entry_64.S | 1 + > 2 files changed, 18 insertions(+) > > diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h > index 3f48f695d5e6..daee2d19e73d 100644 > --- a/arch/x86/entry/calling.h > +++ b/arch/x86/entry/calling.h > @@ -147,6 +147,23 @@ For 32-bit we have the following conventions - kernel is built with > UNWIND_HINT_REGS offset=\offset > .endm > > + /* > + * Sanitize extra registers of values that a speculation attack > + * might want to exploit. In the CONFIG_FRAME_POINTER=y case, > + * the expectation is that %ebp will be clobbered before it > + * could be used. > + */ > + .macro CLEAR_EXTRA_REGS_NOSPEC > + xorq %r15, %r15 > + xorq %r14, %r14 > + xorq %r13, %r13 > + xorq %r12, %r12 > + xorl %ebx, %ebx > +#ifndef CONFIG_FRAME_POINTER > + xorl %ebp, %ebp > +#endif > + .endm > + Can we make the clears only happen if we have CONFIG_RETPOLINE? Or is there maybe some reason why we want this even without retpolines on? > .macro POP_EXTRA_REGS > popq %r15 > popq %r14 > diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S > index c752abe89d80..46260e951da6 100644 > --- a/arch/x86/entry/entry_64.S > +++ b/arch/x86/entry/entry_64.S > @@ -247,6 +247,7 @@ GLOBAL(entry_SYSCALL_64_after_hwframe) > TRACE_IRQS_OFF > > /* IRQs are off. */ > + CLEAR_EXTRA_REGS_NOSPEC Please put the clears before TRACE_IRQS_OFF to protect users that use tracing. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/3] x86/entry: Clear extra registers beyond syscall arguments for 64bit kernels 2018-02-04 0:14 ` Andy Lutomirski @ 2018-02-04 1:25 ` Dan Williams 2018-02-04 1:29 ` Andy Lutomirski 0 siblings, 1 reply; 18+ messages in thread From: Dan Williams @ 2018-02-04 1:25 UTC (permalink / raw) To: Andy Lutomirski Cc: Thomas Gleixner, Andi Kleen, X86 ML, LKML, Ingo Molnar, H. Peter Anvin, Linus Torvalds On Sat, Feb 3, 2018 at 4:14 PM, Andy Lutomirski <luto@kernel.org> wrote: > On Sat, Feb 3, 2018 at 11:21 PM, Dan Williams <dan.j.williams@intel.com> wrote: >> At entry userspace may have populated the extra registers outside the >> syscall calling convention with values that could be useful in a >> speculative execution attack. Clear them to minimize the kernel's attack >> surface. Note, this only clears the extra registers and not the unused >> registers for syscalls less than 6 arguments since those registers are >> likely to be clobbered well before their values could be put to use >> under speculation. >> >> Cc: Thomas Gleixner <tglx@linutronix.de> >> Cc: Ingo Molnar <mingo@redhat.com> >> Cc: "H. Peter Anvin" <hpa@zytor.com> >> Cc: x86@kernel.org >> Cc: Andy Lutomirski <luto@kernel.org> >> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> >> Reported-by: Andi Kleen <ak@linux.intel.com> >> Signed-off-by: Dan Williams <dan.j.williams@intel.com> >> --- >> arch/x86/entry/calling.h | 17 +++++++++++++++++ >> arch/x86/entry/entry_64.S | 1 + >> 2 files changed, 18 insertions(+) >> >> diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h >> index 3f48f695d5e6..daee2d19e73d 100644 >> --- a/arch/x86/entry/calling.h >> +++ b/arch/x86/entry/calling.h >> @@ -147,6 +147,23 @@ For 32-bit we have the following conventions - kernel is built with >> UNWIND_HINT_REGS offset=\offset >> .endm >> >> + /* >> + * Sanitize extra registers of values that a speculation attack >> + * might want to exploit. In the CONFIG_FRAME_POINTER=y case, >> + * the expectation is that %ebp will be clobbered before it >> + * could be used. >> + */ >> + .macro CLEAR_EXTRA_REGS_NOSPEC >> + xorq %r15, %r15 >> + xorq %r14, %r14 >> + xorq %r13, %r13 >> + xorq %r12, %r12 >> + xorl %ebx, %ebx >> +#ifndef CONFIG_FRAME_POINTER >> + xorl %ebp, %ebp >> +#endif >> + .endm >> + > > Can we make the clears only happen if we have CONFIG_RETPOLINE? Or is > there maybe some reason why we want this even without retpolines on? We have the other Spectre variant1 mitigations on by default. I'm not opposed to adding a config to turn them all off, but I think we should be consistent either way, and I don't think CONFIG_RETPOLINE is the right config to gate those. >> .macro POP_EXTRA_REGS >> popq %r15 >> popq %r14 >> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S >> index c752abe89d80..46260e951da6 100644 >> --- a/arch/x86/entry/entry_64.S >> +++ b/arch/x86/entry/entry_64.S >> @@ -247,6 +247,7 @@ GLOBAL(entry_SYSCALL_64_after_hwframe) >> TRACE_IRQS_OFF >> >> /* IRQs are off. */ >> + CLEAR_EXTRA_REGS_NOSPEC > > Please put the clears before TRACE_IRQS_OFF to protect users that use tracing. Ok. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/3] x86/entry: Clear extra registers beyond syscall arguments for 64bit kernels 2018-02-04 1:25 ` Dan Williams @ 2018-02-04 1:29 ` Andy Lutomirski 0 siblings, 0 replies; 18+ messages in thread From: Andy Lutomirski @ 2018-02-04 1:29 UTC (permalink / raw) To: Dan Williams Cc: Andy Lutomirski, Thomas Gleixner, Andi Kleen, X86 ML, LKML, Ingo Molnar, H. Peter Anvin, Linus Torvalds On Sun, Feb 4, 2018 at 1:25 AM, Dan Williams <dan.j.williams@intel.com> wrote: > On Sat, Feb 3, 2018 at 4:14 PM, Andy Lutomirski <luto@kernel.org> wrote: >> On Sat, Feb 3, 2018 at 11:21 PM, Dan Williams <dan.j.williams@intel.com> wrote: >>> At entry userspace may have populated the extra registers outside the >>> syscall calling convention with values that could be useful in a >>> speculative execution attack. Clear them to minimize the kernel's attack >>> surface. Note, this only clears the extra registers and not the unused >>> registers for syscalls less than 6 arguments since those registers are >>> likely to be clobbered well before their values could be put to use >>> under speculation. >>> >>> Cc: Thomas Gleixner <tglx@linutronix.de> >>> Cc: Ingo Molnar <mingo@redhat.com> >>> Cc: "H. Peter Anvin" <hpa@zytor.com> >>> Cc: x86@kernel.org >>> Cc: Andy Lutomirski <luto@kernel.org> >>> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> >>> Reported-by: Andi Kleen <ak@linux.intel.com> >>> Signed-off-by: Dan Williams <dan.j.williams@intel.com> >>> --- >>> arch/x86/entry/calling.h | 17 +++++++++++++++++ >>> arch/x86/entry/entry_64.S | 1 + >>> 2 files changed, 18 insertions(+) >>> >>> diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h >>> index 3f48f695d5e6..daee2d19e73d 100644 >>> --- a/arch/x86/entry/calling.h >>> +++ b/arch/x86/entry/calling.h >>> @@ -147,6 +147,23 @@ For 32-bit we have the following conventions - kernel is built with >>> UNWIND_HINT_REGS offset=\offset >>> .endm >>> >>> + /* >>> + * Sanitize extra registers of values that a speculation attack >>> + * might want to exploit. In the CONFIG_FRAME_POINTER=y case, >>> + * the expectation is that %ebp will be clobbered before it >>> + * could be used. >>> + */ >>> + .macro CLEAR_EXTRA_REGS_NOSPEC >>> + xorq %r15, %r15 >>> + xorq %r14, %r14 >>> + xorq %r13, %r13 >>> + xorq %r12, %r12 >>> + xorl %ebx, %ebx >>> +#ifndef CONFIG_FRAME_POINTER >>> + xorl %ebp, %ebp >>> +#endif >>> + .endm >>> + >> >> Can we make the clears only happen if we have CONFIG_RETPOLINE? Or is >> there maybe some reason why we want this even without retpolines on? > > We have the other Spectre variant1 mitigations on by default. I'm not > opposed to adding a config to turn them all off, but I think we should > be consistent either way, and I don't think CONFIG_RETPOLINE is the > right config to gate those. Fair enough. > >>> .macro POP_EXTRA_REGS >>> popq %r15 >>> popq %r14 >>> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S >>> index c752abe89d80..46260e951da6 100644 >>> --- a/arch/x86/entry/entry_64.S >>> +++ b/arch/x86/entry/entry_64.S >>> @@ -247,6 +247,7 @@ GLOBAL(entry_SYSCALL_64_after_hwframe) >>> TRACE_IRQS_OFF >>> >>> /* IRQs are off. */ >>> + CLEAR_EXTRA_REGS_NOSPEC >> >> Please put the clears before TRACE_IRQS_OFF to protect users that use tracing. > > Ok. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/3] x86/entry: Clear extra registers beyond syscall arguments for 64bit kernels 2018-02-03 23:21 ` [PATCH 1/3] x86/entry: Clear extra registers beyond syscall arguments for 64bit kernels Dan Williams 2018-02-04 0:14 ` Andy Lutomirski @ 2018-02-04 13:01 ` Brian Gerst 2018-02-04 17:42 ` Dan Williams 1 sibling, 1 reply; 18+ messages in thread From: Brian Gerst @ 2018-02-04 13:01 UTC (permalink / raw) To: Dan Williams Cc: Thomas Gleixner, Andi Kleen, the arch/x86 maintainers, Linux Kernel Mailing List, Ingo Molnar, Andy Lutomirski, H. Peter Anvin, Linus Torvalds On Sat, Feb 3, 2018 at 6:21 PM, Dan Williams <dan.j.williams@intel.com> wrote: > At entry userspace may have populated the extra registers outside the > syscall calling convention with values that could be useful in a > speculative execution attack. Clear them to minimize the kernel's attack > surface. Note, this only clears the extra registers and not the unused > registers for syscalls less than 6 arguments since those registers are > likely to be clobbered well before their values could be put to use > under speculation. > > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: Ingo Molnar <mingo@redhat.com> > Cc: "H. Peter Anvin" <hpa@zytor.com> > Cc: x86@kernel.org > Cc: Andy Lutomirski <luto@kernel.org> > Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> > Reported-by: Andi Kleen <ak@linux.intel.com> > Signed-off-by: Dan Williams <dan.j.williams@intel.com> > --- > arch/x86/entry/calling.h | 17 +++++++++++++++++ > arch/x86/entry/entry_64.S | 1 + > 2 files changed, 18 insertions(+) > > diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h > index 3f48f695d5e6..daee2d19e73d 100644 > --- a/arch/x86/entry/calling.h > +++ b/arch/x86/entry/calling.h > @@ -147,6 +147,23 @@ For 32-bit we have the following conventions - kernel is built with > UNWIND_HINT_REGS offset=\offset > .endm > > + /* > + * Sanitize extra registers of values that a speculation attack > + * might want to exploit. In the CONFIG_FRAME_POINTER=y case, > + * the expectation is that %ebp will be clobbered before it > + * could be used. > + */ > + .macro CLEAR_EXTRA_REGS_NOSPEC > + xorq %r15, %r15 > + xorq %r14, %r14 > + xorq %r13, %r13 > + xorq %r12, %r12 > + xorl %ebx, %ebx > +#ifndef CONFIG_FRAME_POINTER > + xorl %ebp, %ebp > +#endif > + .endm > + > .macro POP_EXTRA_REGS > popq %r15 > popq %r14 > diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S > index c752abe89d80..46260e951da6 100644 > --- a/arch/x86/entry/entry_64.S > +++ b/arch/x86/entry/entry_64.S > @@ -247,6 +247,7 @@ GLOBAL(entry_SYSCALL_64_after_hwframe) > TRACE_IRQS_OFF > > /* IRQs are off. */ > + CLEAR_EXTRA_REGS_NOSPEC > movq %rsp, %rdi > call do_syscall_64 /* returns with IRQs disabled */ > > Now that the fast syscall path is gone, all regs (except RSP obviously) are dead after being saved to pt_regs. -- Brian Gerst ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/3] x86/entry: Clear extra registers beyond syscall arguments for 64bit kernels 2018-02-04 13:01 ` Brian Gerst @ 2018-02-04 17:42 ` Dan Williams 2018-02-04 18:40 ` Linus Torvalds 0 siblings, 1 reply; 18+ messages in thread From: Dan Williams @ 2018-02-04 17:42 UTC (permalink / raw) To: Brian Gerst Cc: Thomas Gleixner, Andi Kleen, the arch/x86 maintainers, Linux Kernel Mailing List, Ingo Molnar, Andy Lutomirski, H. Peter Anvin, Linus Torvalds On Sun, Feb 4, 2018 at 5:01 AM, Brian Gerst <brgerst@gmail.com> wrote: > On Sat, Feb 3, 2018 at 6:21 PM, Dan Williams <dan.j.williams@intel.com> wrote: >> At entry userspace may have populated the extra registers outside the >> syscall calling convention with values that could be useful in a >> speculative execution attack. Clear them to minimize the kernel's attack >> surface. Note, this only clears the extra registers and not the unused >> registers for syscalls less than 6 arguments since those registers are >> likely to be clobbered well before their values could be put to use >> under speculation. >> >> Cc: Thomas Gleixner <tglx@linutronix.de> >> Cc: Ingo Molnar <mingo@redhat.com> >> Cc: "H. Peter Anvin" <hpa@zytor.com> >> Cc: x86@kernel.org >> Cc: Andy Lutomirski <luto@kernel.org> >> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> >> Reported-by: Andi Kleen <ak@linux.intel.com> >> Signed-off-by: Dan Williams <dan.j.williams@intel.com> >> --- >> arch/x86/entry/calling.h | 17 +++++++++++++++++ >> arch/x86/entry/entry_64.S | 1 + >> 2 files changed, 18 insertions(+) >> >> diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h >> index 3f48f695d5e6..daee2d19e73d 100644 >> --- a/arch/x86/entry/calling.h >> +++ b/arch/x86/entry/calling.h >> @@ -147,6 +147,23 @@ For 32-bit we have the following conventions - kernel is built with >> UNWIND_HINT_REGS offset=\offset >> .endm >> >> + /* >> + * Sanitize extra registers of values that a speculation attack >> + * might want to exploit. In the CONFIG_FRAME_POINTER=y case, >> + * the expectation is that %ebp will be clobbered before it >> + * could be used. >> + */ >> + .macro CLEAR_EXTRA_REGS_NOSPEC >> + xorq %r15, %r15 >> + xorq %r14, %r14 >> + xorq %r13, %r13 >> + xorq %r12, %r12 >> + xorl %ebx, %ebx >> +#ifndef CONFIG_FRAME_POINTER >> + xorl %ebp, %ebp >> +#endif >> + .endm >> + >> .macro POP_EXTRA_REGS >> popq %r15 >> popq %r14 >> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S >> index c752abe89d80..46260e951da6 100644 >> --- a/arch/x86/entry/entry_64.S >> +++ b/arch/x86/entry/entry_64.S >> @@ -247,6 +247,7 @@ GLOBAL(entry_SYSCALL_64_after_hwframe) >> TRACE_IRQS_OFF >> >> /* IRQs are off. */ >> + CLEAR_EXTRA_REGS_NOSPEC >> movq %rsp, %rdi >> call do_syscall_64 /* returns with IRQs disabled */ >> >> > > Now that the fast syscall path is gone, all regs (except RSP > obviously) are dead after being saved to pt_regs. They're saved, but not dead afaics. The concern is that when the CPU starts speculating it could be in a code path that has not yet touched the extra registers. Once that happens userspace could potentially load data from an arbitrary address. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/3] x86/entry: Clear extra registers beyond syscall arguments for 64bit kernels 2018-02-04 17:42 ` Dan Williams @ 2018-02-04 18:40 ` Linus Torvalds 2018-02-05 16:26 ` Ingo Molnar 0 siblings, 1 reply; 18+ messages in thread From: Linus Torvalds @ 2018-02-04 18:40 UTC (permalink / raw) To: Dan Williams Cc: Brian Gerst, Thomas Gleixner, Andi Kleen, the arch/x86 maintainers, Linux Kernel Mailing List, Ingo Molnar, Andy Lutomirski, H. Peter Anvin On Sun, Feb 4, 2018 at 9:42 AM, Dan Williams <dan.j.williams@intel.com> wrote: > On Sun, Feb 4, 2018 at 5:01 AM, Brian Gerst <brgerst@gmail.com> wrote: >> >> Now that the fast syscall path is gone, all regs (except RSP >> obviously) are dead after being saved to pt_regs. > > They're saved, but not dead afaics. Actually, they _are_ dead with the slow-path - it will reload them off the ptregs pointer instead. However, exactly because the slow-path will just reload the user-supplied values, there's no point in clearing them anyway. Those values will be live in the system call sequence anyway. So even for the slow system call case, there's no point in clearing them. If we ever end up only reloading the required values (ie we push the 'struct ptregs' pointer lower down into the syscall stack), at that point we might want to clear those registers, because many system calls will reload only a small subset. But as the commit message says, the system call argument registers are also likely to be aggressively clobbered unless used, since the low registers are preferred for code generation (smaller code, and many of them are special anyway in various ways and have forced uses for shifts, function arguments, or just are special in general like %rax). So the actual argument registers tend to not be an issue anyway. Linus ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/3] x86/entry: Clear extra registers beyond syscall arguments for 64bit kernels 2018-02-04 18:40 ` Linus Torvalds @ 2018-02-05 16:26 ` Ingo Molnar 2018-02-05 16:38 ` Andy Lutomirski 2018-02-05 20:05 ` Ingo Molnar 0 siblings, 2 replies; 18+ messages in thread From: Ingo Molnar @ 2018-02-05 16:26 UTC (permalink / raw) To: Linus Torvalds Cc: Dan Williams, Brian Gerst, Thomas Gleixner, Andi Kleen, the arch/x86 maintainers, Linux Kernel Mailing List, Ingo Molnar, Andy Lutomirski, H. Peter Anvin * Linus Torvalds <torvalds@linux-foundation.org> wrote: > [...] > > But as the commit message says, the system call argument registers are > also likely to be aggressively clobbered unless used, since the low > registers are preferred for code generation (smaller code, and many of > them are special anyway in various ways and have forced uses for > shifts, function arguments, or just are special in general like %rax). > > So the actual argument registers tend to not be an issue anyway. Btw., to underline these arguments, here's some statistical data about actual register usage the x86 kernel. I picked the latest upstream kernel and did a statistical analysis of the disassembly of an 'allyesconfig' 64-bit build. Here is the histogram of GP register usage, ordered by a calculated "average per-function usage ratio" (last column): # nr of =y .config options: 9553 # nr of functions: 249340 # nr of instructions: 20223765 # nr of register uses: 33413619 register | # of uses | avg uses per fn -------------------------------------- %r11 | 21564 | 0.1 %r10 | 65499 | 0.3 %r9 | 162040 | 0.6 %r8 | 292779 | 1.2 %rcx | 860528 | 3.5 %r15 | 1414816 | 5.7 %r14 | 1597952 | 6.4 %rsi | 1636660 | 6.6 %rdx | 1798109 | 7.2 %r13 | 1829557 | 7.3 %r12 | 2301476 | 9.2 %rbp | 3156682 | 12.7 %rbx | 4451880 | 17.9 %rdi | 4747951 | 19.0 %rax | 5370191 | 21.5 Here is the same histogram for a distro kernel (Fedora) config based build, with all =m modules changed to =y and thus built into the vmlinux for easier analysis: # nr of =y .config options: 4871 # nr of functions: 190477 # nr of instructions: 10329411 # nr of register uses: 16907185 register | # of uses | avg uses per fn -------------------------------------- %r11 | 64135 | 0.3 %r10 | 113366 | 0.6 %r9 | 196269 | 1.0 %r8 | 314812 | 1.7 %r15 | 404789 | 2.1 %r14 | 461266 | 2.4 %r13 | 569654 | 3.0 %r12 | 763973 | 4.0 %rcx | 920477 | 4.8 %rbp | 1161700 | 6.1 %rsi | 1257150 | 6.6 %rdi | 1625617 | 8.5 %rdx | 1667365 | 8.8 %rbx | 1739660 | 9.1 %rax | 3826187 | 20.1 Finally here's the histogram of a 'defconfig' build - which should be representative of 'device specific kernel builds': # nr of =y .config options: 1255 # nr of functions: 45490 # nr of instructions: 1963956 # nr of register uses: 3183680 register | # of uses | avg uses per fn -------------------------------------- %r11 | 11608 | 0.3 %r10 | 23398 | 0.5 %r9 | 37431 | 0.8 %r8 | 56140 | 1.2 %r15 | 77468 | 1.7 %r14 | 89285 | 2.0 %r13 | 111665 | 2.5 %r12 | 151977 | 3.3 %rcx | 166425 | 3.7 %rsi | 226536 | 5.0 %rbp | 238286 | 5.2 %rdi | 306709 | 6.7 %rdx | 313569 | 6.9 %rbx | 349496 | 7.7 %rax | 728036 | 16.0 (Note the various caveats listed further below.) These three builds I believe provide representative members of a wide spectrum of kernel options used in practice: from everything-enabled, through distro-enabled to device-specific minimal kernels. There's a consistent pattern in these histograms: the least used registers are R11, R10, R9 and R8. Registers R12-R15 are used almost as frequently as some of the GP registers (!). In practice R11-R10 is probably the most vulnerable ones to attack: their use is at least 1-2 orders of magnitude less common than that of the more common general purpose registers. So I submit that we should probably extend the register clearing/sanitization to R10 and R11 as well, because while they are technically caller-saved and freely clobberable, in practice they don't get clobbered all that often and there might be various code paths into complex system calls where these R10/R11 values survive just fine and can be used in Spectre gadgets. Thanks, Ingo P.S.: List of caveats/notes: Note #1: I collapsed all 32-bit register users which zero-extend by default. I did not collapse 8 and 16 bit uses as they don't automatically clobber the higher bits. ) Note #2: This histogram does not make a distinction between read and write uses of registers. Note #3: I did not include implicit register clobbering, only those registers that are explicitly listed in the disassembly. In the overwhelming majority of cases the affected registers are listed though, so the real numbers should be very close though. Note #4: The 'avg uses per fn' number is over-estimates the real uses per function, because I counted total number of uses, not rounded down to a per function register usage heat-map. I believe this does not change the _ordering_ of the register usage histograms, so it's a valid simplification. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/3] x86/entry: Clear extra registers beyond syscall arguments for 64bit kernels 2018-02-05 16:26 ` Ingo Molnar @ 2018-02-05 16:38 ` Andy Lutomirski 2018-02-05 18:29 ` Ingo Molnar 2018-02-05 20:05 ` Ingo Molnar 1 sibling, 1 reply; 18+ messages in thread From: Andy Lutomirski @ 2018-02-05 16:38 UTC (permalink / raw) To: Ingo Molnar Cc: Linus Torvalds, Dan Williams, Brian Gerst, Thomas Gleixner, Andi Kleen, the arch/x86 maintainers, Linux Kernel Mailing List, Ingo Molnar, Andy Lutomirski, H. Peter Anvin On Mon, Feb 5, 2018 at 4:26 PM, Ingo Molnar <mingo@kernel.org> wrote: > > * Linus Torvalds <torvalds@linux-foundation.org> wrote: > >> [...] >> >> But as the commit message says, the system call argument registers are >> also likely to be aggressively clobbered unless used, since the low >> registers are preferred for code generation (smaller code, and many of >> them are special anyway in various ways and have forced uses for >> shifts, function arguments, or just are special in general like %rax). >> >> So the actual argument registers tend to not be an issue anyway. > > > So I submit that we should probably extend the register clearing/sanitization to > R10 and R11 as well, because while they are technically caller-saved and freely > clobberable, in practice they don't get clobbered all that often and there might > be various code paths into complex system calls where these R10/R11 values survive > just fine and can be used in Spectre gadgets. Maybe R11, but we have to be careful, since R11 is used as scratch space in a bunch of the asm. Clearing R10 is mostly useless in the syscall path because we'll just unconditionally reload it in do_syscall_64(). If we manage to change the way syscall wrappers work, then we can think about clearing R10 and maybe even more regs. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/3] x86/entry: Clear extra registers beyond syscall arguments for 64bit kernels 2018-02-05 16:38 ` Andy Lutomirski @ 2018-02-05 18:29 ` Ingo Molnar 2018-02-05 18:47 ` Brian Gerst 0 siblings, 1 reply; 18+ messages in thread From: Ingo Molnar @ 2018-02-05 18:29 UTC (permalink / raw) To: Andy Lutomirski Cc: Linus Torvalds, Dan Williams, Brian Gerst, Thomas Gleixner, Andi Kleen, the arch/x86 maintainers, Linux Kernel Mailing List, Ingo Molnar, H. Peter Anvin * Andy Lutomirski <luto@kernel.org> wrote: > [...] Clearing R10 is mostly useless in the syscall path because we'll just > unconditionally reload it in do_syscall_64(). AFAICS do_syscall_64() doesn't touch R10 at all. So how does it reload R10? In fact do_syscall_64() as a C function does not touch R10, R11, R12, R13, R14, R15 - it passes their values through. What am I missing? Thanks, Ingo ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/3] x86/entry: Clear extra registers beyond syscall arguments for 64bit kernels 2018-02-05 18:29 ` Ingo Molnar @ 2018-02-05 18:47 ` Brian Gerst 2018-02-05 19:48 ` Ingo Molnar 0 siblings, 1 reply; 18+ messages in thread From: Brian Gerst @ 2018-02-05 18:47 UTC (permalink / raw) To: Ingo Molnar Cc: Andy Lutomirski, Linus Torvalds, Dan Williams, Thomas Gleixner, Andi Kleen, the arch/x86 maintainers, Linux Kernel Mailing List, Ingo Molnar, H. Peter Anvin On Mon, Feb 5, 2018 at 1:29 PM, Ingo Molnar <mingo@kernel.org> wrote: > > * Andy Lutomirski <luto@kernel.org> wrote: > >> [...] Clearing R10 is mostly useless in the syscall path because we'll just >> unconditionally reload it in do_syscall_64(). > > AFAICS do_syscall_64() doesn't touch R10 at all. So how does it reload R10? > > In fact do_syscall_64() as a C function does not touch R10, R11, R12, R13, R14, > R15 - it passes their values through. > > What am I missing? The syscall ABI uses R10 for the 4th argument instead of RCX, because RCX gets clobbered by the SYSCALL instruction for RIP. -- Brian Gerst ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/3] x86/entry: Clear extra registers beyond syscall arguments for 64bit kernels 2018-02-05 18:47 ` Brian Gerst @ 2018-02-05 19:48 ` Ingo Molnar 2018-02-05 20:25 ` Andy Lutomirski 0 siblings, 1 reply; 18+ messages in thread From: Ingo Molnar @ 2018-02-05 19:48 UTC (permalink / raw) To: Brian Gerst Cc: Andy Lutomirski, Linus Torvalds, Dan Williams, Thomas Gleixner, Andi Kleen, the arch/x86 maintainers, Linux Kernel Mailing List, Ingo Molnar, H. Peter Anvin * Brian Gerst <brgerst@gmail.com> wrote: > On Mon, Feb 5, 2018 at 1:29 PM, Ingo Molnar <mingo@kernel.org> wrote: > > > > * Andy Lutomirski <luto@kernel.org> wrote: > > > >> [...] Clearing R10 is mostly useless in the syscall path because we'll just > >> unconditionally reload it in do_syscall_64(). > > > > AFAICS do_syscall_64() doesn't touch R10 at all. So how does it reload R10? > > > > In fact do_syscall_64() as a C function does not touch R10, R11, R12, R13, R14, > > R15 - it passes their values through. > > > > What am I missing? > > The syscall ABI uses R10 for the 4th argument instead of RCX, because > RCX gets clobbered by the SYSCALL instruction for RIP. But we only reload the syscall-entry value of R10 it into RCX (4th C function argument): regs->ax = sys_call_table[nr]( regs->di, regs->si, regs->dx, regs->r10, regs->r8, regs->r9); while RCX is a clobbered register, so in practice, while it will be briefly present in do_syscall_64() and the high level syscall functions, the value in RCX will be cleared from RCX in the overwhelming majority of cases. But the real R10 will survive much longer, because it's only used in a very small minority of the C functions! So my point: if we clear R10 (and R11) from the _real_ registers, we can stop propagating these user controlled values further into the kernel. Thanks, Ingo ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/3] x86/entry: Clear extra registers beyond syscall arguments for 64bit kernels 2018-02-05 19:48 ` Ingo Molnar @ 2018-02-05 20:25 ` Andy Lutomirski 0 siblings, 0 replies; 18+ messages in thread From: Andy Lutomirski @ 2018-02-05 20:25 UTC (permalink / raw) To: Ingo Molnar Cc: Brian Gerst, Andy Lutomirski, Linus Torvalds, Dan Williams, Thomas Gleixner, Andi Kleen, the arch/x86 maintainers, Linux Kernel Mailing List, Ingo Molnar, H. Peter Anvin On Mon, Feb 5, 2018 at 7:48 PM, Ingo Molnar <mingo@kernel.org> wrote: > > * Brian Gerst <brgerst@gmail.com> wrote: > >> On Mon, Feb 5, 2018 at 1:29 PM, Ingo Molnar <mingo@kernel.org> wrote: >> > >> > * Andy Lutomirski <luto@kernel.org> wrote: >> > >> >> [...] Clearing R10 is mostly useless in the syscall path because we'll just >> >> unconditionally reload it in do_syscall_64(). >> > >> > AFAICS do_syscall_64() doesn't touch R10 at all. So how does it reload R10? >> > >> > In fact do_syscall_64() as a C function does not touch R10, R11, R12, R13, R14, >> > R15 - it passes their values through. >> > >> > What am I missing? >> >> The syscall ABI uses R10 for the 4th argument instead of RCX, because >> RCX gets clobbered by the SYSCALL instruction for RIP. > > But we only reload the syscall-entry value of R10 it into RCX (4th C function > argument): > > regs->ax = sys_call_table[nr]( > regs->di, regs->si, regs->dx, > regs->r10, regs->r8, regs->r9); > > while RCX is a clobbered register, so in practice, while it will be briefly > present in do_syscall_64() and the high level syscall functions, the value in RCX > will be cleared from RCX in the overwhelming majority of cases. > > But the real R10 will survive much longer, because it's only used in a very small > minority of the C functions! Yes, indeed, brain fart on my part. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/3] x86/entry: Clear extra registers beyond syscall arguments for 64bit kernels 2018-02-05 16:26 ` Ingo Molnar 2018-02-05 16:38 ` Andy Lutomirski @ 2018-02-05 20:05 ` Ingo Molnar 2018-02-06 8:48 ` Ingo Molnar 1 sibling, 1 reply; 18+ messages in thread From: Ingo Molnar @ 2018-02-05 20:05 UTC (permalink / raw) To: Linus Torvalds Cc: Dan Williams, Brian Gerst, Thomas Gleixner, Andi Kleen, the arch/x86 maintainers, Linux Kernel Mailing List, Ingo Molnar, Andy Lutomirski, H. Peter Anvin * Ingo Molnar <mingo@kernel.org> wrote: > Btw., to underline these arguments, here's some statistical data about actual > register usage the x86 kernel. So this caveat was bothering me: > List of caveats/notes: > > Note #4: > The 'avg uses per fn' number is over-estimates the real uses per function, > because I counted total number of uses, not rounded down to a per function > register usage heat-map. I believe this does not change the _ordering_ of the > register usage histograms, so it's a valid simplification. as averaging all uses over the functions over-estimates the per function usage of registers significantly, so I implemented a real, per function register usage tracking. For the x86 defconfig kernel the results are: r11: used in 1704 fns, not used in 43310 fns, usage ratio: 3.8% r10: used in 3809 fns, not used in 41205 fns, usage ratio: 8.5% r15: used in 6599 fns, not used in 38415 fns, usage ratio: 14.7% r9: used in 8120 fns, not used in 36894 fns, usage ratio: 18.0% r14: used in 9243 fns, not used in 35771 fns, usage ratio: 20.5% r8: used in 12614 fns, not used in 32400 fns, usage ratio: 28.0% r13: used in 12708 fns, not used in 32306 fns, usage ratio: 28.2% r12: used in 17144 fns, not used in 27870 fns, usage ratio: 38.1% rbp: used in 23289 fns, not used in 21725 fns, usage ratio: 51.7% rcx: used in 23897 fns, not used in 21117 fns, usage ratio: 53.1% rbx: used in 29226 fns, not used in 15788 fns, usage ratio: 64.9% rdx: used in 33205 fns, not used in 11809 fns, usage ratio: 73.8% rsi: used in 35415 fns, not used in 9599 fns, usage ratio: 78.7% rdi: used in 40628 fns, not used in 4386 fns, usage ratio: 90.3% rax: used in 43120 fns, not used in 1894 fns, usage ratio: 95.8% As a comparison, here's the previous estimate: # nr of =y .config options: 1255 # nr of functions: 45490 # nr of instructions: 1963956 # nr of register uses: 3183680 register | # of uses | avg uses per fn -------------------------------------- %r11 | 11608 | 0.3 %r10 | 23398 | 0.5 %r9 | 37431 | 0.8 %r8 | 56140 | 1.2 %r15 | 77468 | 1.7 %r14 | 89285 | 2.0 %r13 | 111665 | 2.5 %r12 | 151977 | 3.3 %rcx | 166425 | 3.7 %rsi | 226536 | 5.0 %rbp | 238286 | 5.2 %rdi | 306709 | 6.7 %rdx | 313569 | 6.9 %rbx | 349496 | 7.7 %rax | 728036 | 16.0 The ordering of the registers is similar, but it's not the same - and the actual percentages changed drastically. In particular R10 and R11 are used only in a small minority of the functions: r11: used in 1704 fns, not used in 43310 fns, usage ratio: 3.8% r10: used in 3809 fns, not used in 41205 fns, usage ratio: 8.5% and if deeper inside the kernel a Spectre-gadget uses R10 or R11 then chances are high that user-space R10/R11 values will be passed through to it without disturbance from other C functions. So I think these numbers increase the urgency of clearing R10/R11 as well. Note that this too is still only a statistical estimate, the _real_ impact could only be judged if we created a call graph of the whole kernel, with 100% coverage, and analyzed register use through this tree. The other registers: r15: used in 6599 fns, not used in 38415 fns, usage ratio: 14.7% r9: used in 8120 fns, not used in 36894 fns, usage ratio: 18.0% r14: used in 9243 fns, not used in 35771 fns, usage ratio: 20.5% r8: used in 12614 fns, not used in 32400 fns, usage ratio: 28.0% r13: used in 12708 fns, not used in 32306 fns, usage ratio: 28.2% r12: used in 17144 fns, not used in 27870 fns, usage ratio: 38.1% R12-R15 will be cleared with tis patchset, so that angle should be handled, which leaves us: r9: used in 8120 fns, not used in 36894 fns, usage ratio: 18.0% r8: used in 12614 fns, not used in 32400 fns, usage ratio: 28.0% These are arguments #5 and #6 of the x86-64 C calling convention ABI. In practice they will be overwritten in many code paths - especially they will be overwritten in some of the richer ABIs of Linux: ioctls, socketcalls and the other high level demux system calls. R8 is also spontaneously used by GCC once a function's size grows about ~130 instructions. For R9 the average threshold is about ~160 instructions. Thanks, Ingo ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/3] x86/entry: Clear extra registers beyond syscall arguments for 64bit kernels 2018-02-05 20:05 ` Ingo Molnar @ 2018-02-06 8:48 ` Ingo Molnar 0 siblings, 0 replies; 18+ messages in thread From: Ingo Molnar @ 2018-02-06 8:48 UTC (permalink / raw) To: Linus Torvalds Cc: Dan Williams, Brian Gerst, Thomas Gleixner, Andi Kleen, the arch/x86 maintainers, Linux Kernel Mailing List, Ingo Molnar, Andy Lutomirski, H. Peter Anvin * Ingo Molnar <mingo@kernel.org> wrote: > [...] so I implemented a real, per function register usage tracking. > > For the x86 defconfig kernel the results are: > > r11: used in 1704 fns, not used in 43310 fns, usage ratio: 3.8% > r10: used in 3809 fns, not used in 41205 fns, usage ratio: 8.5% > r15: used in 6599 fns, not used in 38415 fns, usage ratio: 14.7% > r9: used in 8120 fns, not used in 36894 fns, usage ratio: 18.0% > r14: used in 9243 fns, not used in 35771 fns, usage ratio: 20.5% > r8: used in 12614 fns, not used in 32400 fns, usage ratio: 28.0% > r13: used in 12708 fns, not used in 32306 fns, usage ratio: 28.2% > r12: used in 17144 fns, not used in 27870 fns, usage ratio: 38.1% > rbp: used in 23289 fns, not used in 21725 fns, usage ratio: 51.7% > rcx: used in 23897 fns, not used in 21117 fns, usage ratio: 53.1% > rbx: used in 29226 fns, not used in 15788 fns, usage ratio: 64.9% > rdx: used in 33205 fns, not used in 11809 fns, usage ratio: 73.8% > rsi: used in 35415 fns, not used in 9599 fns, usage ratio: 78.7% > rdi: used in 40628 fns, not used in 4386 fns, usage ratio: 90.3% > rax: used in 43120 fns, not used in 1894 fns, usage ratio: 95.8% So here's the next (and probably final) chapter of x86-64 register allocation statistics: out of curiosity I let this analysis run overnight on all 4 kernel configs, to see the register usage patterns of the distro and allyesconfig kernels as well. Here's all the per function register allocation probabilities in a single table: REG allnoconfig localconfig distroconfig allyesconfig -------------------------------------------------------------------------- rax: 94.6% 95.8% 94.3% 96.2% rbx: 46.9% 64.9% 67.6% 90.4% rcx: 47.8% 53.1% 57.9% 52.7% rdx: 66.0% 73.8% 76.0% 74.3% rbp: 36.2% 51.7% 55.5% 81.5% rsi: 64.8% 78.7% 81.3% 85.0% rdi: 79.9% 90.3% 92.1% 94.3% r8: 21.9% 28.0% 31.9% 29.7% r9: 13.9% 18.0% 20.4% 18.3% r10: 9.3% 8.5% 8.4% 4.7% r11: 4.9% 3.8% 4.5% 1.6% r12: 25.6% 38.1% 42.4% 69.3% r13: 18.3% 28.2% 31.5% 57.1% r14: 13.3% 20.5% 22.8% 46.1% r15: 9.3% 14.7% 16.4% 36.6% These numbers underline the overall conclusions that we have reached so far: - We should clear all of R10-R15 in syscalls and R8-R15 in parameter-less entries (IRQs, NMIs, exceptions, etc.) - like the latest series from Dan does. - We should probably strive to clear R8-R9 for system calls that don't use it - which is ~98% of them. In particular R9 with its comparatively low (~20%) allocation probability could survive deep into the kernel: 5-deep call chains still have a ~30% chance to have R9 intact - and call chains as deep as 10 could still realistically have a ~10% residual probability to have R9 intact. We don't do this yet. - Smaller kernels are statistically easier to attack via Spectre, as long as the gadget is present in the smaller kernel. In particular heavily stripped down 64-bit kernels might be attackable via R8-R9 (21%,14%) and also RBP (36%) to a certain degree. This means that the RBP clearing introduced by this series is very much relevant: because RBP is not part of the C function call calling arguments ABI its allocation frequency is much lower than that of other GP registers. Unfortunately R8/R9 values will survive through system calls, because we restore them in do_syscall_64(). There's a somewhat surprising pattern as well: the register allocation probability of R10 and R11 _decreases_ as the kernel gets more complex. For all other registers the allocation probability increases with increasing kernel complexity, which is intuitive: larger functions with higher register pressure will use more registers. So this result is counter-intuitive - my best guess is that it's some sort of GCC register allocation artifact. Here's the comparison of the code generation of a distro versus an allyesconfig kernel: # distro-config allyesconfig # # nr of =y .config options: 4871 9553 # nr of functions: 190477 249340 # nr of instructions: 10329411 20223765 # nr of register uses: 16907185 33413619 # # instructions per function: 54 81 # # # r10 used in: 15404 fns 11300 fns # r10 not used in: 167714 fns 228114 fns # r10 usage ratio: 8.4% 4.7% # # r11 used in: 8224 fns 3876 fns # r11 not used in: 174894 fns 235538 fns # r11 usage ratio: 4.5% 1.6% I don't know which kernel option (out of thousands) causes R10/R11 to be used much less frequently in a significantly larger kernel. Note that even the absolute count of functions with R10/R11 use decreases in the allyesconfig kernel, so I don't think it can be caused by the extra instrumentation bloat of features like CONFIG_GCOV_KERNEL=y. The basic inlining and optimization settings are the same and neither has branch-instrumentation enabled: # distro-config allyesconfig # CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE=y CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE=y # CONFIG_CC_OPTIMIZE_FOR_SIZE is not set # CONFIG_CC_OPTIMIZE_FOR_SIZE is not set CONFIG_OPTIMIZE_INLINING=y CONFIG_OPTIMIZE_INLINING=y CONFIG_BRANCH_PROFILE_NONE=y CONFIG_BRANCH_PROFILE_NONE=y While no-one will build and boot an allyesconfig kernel (other than me), the numbers are still indicative: we should keep in mind the possibility that a Linux distro enabling seemingly benign non-default kernel options can lower the allocation probability of R10/R11 significantly. Thanks, Ingo ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 2/3] x86/entry: Clear registers for 64bit exceptions/interrupts 2018-02-03 23:21 [PATCH 0/3] x86/entry: clear registers to sanitize speculative usages Dan Williams 2018-02-03 23:21 ` [PATCH 1/3] x86/entry: Clear extra registers beyond syscall arguments for 64bit kernels Dan Williams @ 2018-02-03 23:21 ` Dan Williams 2018-02-03 23:21 ` [PATCH 3/3] x86/entry: Clear registers for compat syscalls Dan Williams 2 siblings, 0 replies; 18+ messages in thread From: Dan Williams @ 2018-02-03 23:21 UTC (permalink / raw) To: tglx; +Cc: Andi Kleen, torvalds, linux-kernel, luto From: Andi Kleen <ak@linux.intel.com> Clear the 'extra' registers on entering the 64bit kernel for exceptions and interrupts. The common registers are not cleared since they are likely clobbered well before they can be exploited in a speculative execution attack. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- arch/x86/entry/entry_64.S | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 46260e951da6..d73eedf1eb47 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -563,6 +563,7 @@ END(irq_entries_start) ALLOC_PT_GPREGS_ON_STACK SAVE_C_REGS SAVE_EXTRA_REGS + CLEAR_EXTRA_REGS_NOSPEC ENCODE_FRAME_POINTER testb $3, CS(%rsp) @@ -1121,6 +1122,7 @@ ENTRY(xen_failsafe_callback) ALLOC_PT_GPREGS_ON_STACK SAVE_C_REGS SAVE_EXTRA_REGS + CLEAR_EXTRA_REGS_NOSPEC ENCODE_FRAME_POINTER jmp error_exit END(xen_failsafe_callback) @@ -1166,6 +1168,7 @@ ENTRY(paranoid_entry) cld SAVE_C_REGS 8 SAVE_EXTRA_REGS 8 + CLEAR_EXTRA_REGS_NOSPEC ENCODE_FRAME_POINTER 8 movl $1, %ebx movl $MSR_GS_BASE, %ecx @@ -1218,6 +1221,7 @@ ENTRY(error_entry) cld SAVE_C_REGS 8 SAVE_EXTRA_REGS 8 + CLEAR_EXTRA_REGS_NOSPEC ENCODE_FRAME_POINTER 8 xorl %ebx, %ebx testb $3, CS+8(%rsp) @@ -1416,6 +1420,7 @@ ENTRY(nmi) pushq %r14 /* pt_regs->r14 */ pushq %r15 /* pt_regs->r15 */ UNWIND_HINT_REGS + CLEAR_EXTRA_REGS_NOSPEC ENCODE_FRAME_POINTER /* ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 3/3] x86/entry: Clear registers for compat syscalls 2018-02-03 23:21 [PATCH 0/3] x86/entry: clear registers to sanitize speculative usages Dan Williams 2018-02-03 23:21 ` [PATCH 1/3] x86/entry: Clear extra registers beyond syscall arguments for 64bit kernels Dan Williams 2018-02-03 23:21 ` [PATCH 2/3] x86/entry: Clear registers for 64bit exceptions/interrupts Dan Williams @ 2018-02-03 23:21 ` Dan Williams 2 siblings, 0 replies; 18+ messages in thread From: Dan Williams @ 2018-02-03 23:21 UTC (permalink / raw) To: tglx Cc: Andi Kleen, x86, linux-kernel, Ingo Molnar, luto, H. Peter Anvin, torvalds From: Andi Kleen <ak@linux.intel.com> At entry userspace may have populated registers with values that could be useful in a speculative execution attack. Clear them to minimize the kernel's attack surface. [djbw: rename the macro, only clear the extra registers] Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Cc: Andy Lutomirski <luto@kernel.org> Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- arch/x86/entry/entry_64_compat.S | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S index 98d5358e4041..f55b018a580b 100644 --- a/arch/x86/entry/entry_64_compat.S +++ b/arch/x86/entry/entry_64_compat.S @@ -95,6 +95,7 @@ ENTRY(entry_SYSENTER_compat) pushq $0 /* pt_regs->r14 = 0 */ pushq $0 /* pt_regs->r15 = 0 */ cld + CLEAR_EXTRA_REGS_NOSPEC /* * SYSENTER doesn't filter flags, so we need to clear NT and AC @@ -223,6 +224,7 @@ GLOBAL(entry_SYSCALL_compat_after_hwframe) pushq $0 /* pt_regs->r13 = 0 */ pushq $0 /* pt_regs->r14 = 0 */ pushq $0 /* pt_regs->r15 = 0 */ + CLEAR_EXTRA_REGS_NOSPEC /* * User mode is traced as though IRQs are on, and SYSENTER @@ -348,6 +350,7 @@ ENTRY(entry_INT80_compat) pushq %r14 /* pt_regs->r14 */ pushq %r15 /* pt_regs->r15 */ cld + CLEAR_EXTRA_REGS_NOSPEC /* * User mode is traced as though IRQs are on, and the interrupt ^ permalink raw reply related [flat|nested] 18+ messages in thread
end of thread, other threads:[~2018-02-06 8:48 UTC | newest] Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-02-03 23:21 [PATCH 0/3] x86/entry: clear registers to sanitize speculative usages Dan Williams 2018-02-03 23:21 ` [PATCH 1/3] x86/entry: Clear extra registers beyond syscall arguments for 64bit kernels Dan Williams 2018-02-04 0:14 ` Andy Lutomirski 2018-02-04 1:25 ` Dan Williams 2018-02-04 1:29 ` Andy Lutomirski 2018-02-04 13:01 ` Brian Gerst 2018-02-04 17:42 ` Dan Williams 2018-02-04 18:40 ` Linus Torvalds 2018-02-05 16:26 ` Ingo Molnar 2018-02-05 16:38 ` Andy Lutomirski 2018-02-05 18:29 ` Ingo Molnar 2018-02-05 18:47 ` Brian Gerst 2018-02-05 19:48 ` Ingo Molnar 2018-02-05 20:25 ` Andy Lutomirski 2018-02-05 20:05 ` Ingo Molnar 2018-02-06 8:48 ` Ingo Molnar 2018-02-03 23:21 ` [PATCH 2/3] x86/entry: Clear registers for 64bit exceptions/interrupts Dan Williams 2018-02-03 23:21 ` [PATCH 3/3] x86/entry: Clear registers for compat syscalls Dan Williams
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.