Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation

All of lore.kernel.org
 help / color / mirror / Atom feed

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
@ 2018-01-26  2:50 Liran Alon
  2018-01-26  2:55   ` Van De Ven, Arjan
  0 siblings, 1 reply; 120+ messages in thread
From: Liran Alon @ 2018-01-26  2:50 UTC (permalink / raw)
  To: dave.hansen
  Cc: labbott, luto, Janakarajan.Natarajan, torvalds, bp,
	asit.k.mallick, rkrcmar, karahmed, hpa, mingo, jun.nakajima, x86,
	ashok.raj, arjan.van.de.ven, tim.c.chen, pbonzini, ak,
	linux-kernel, dwmw2, peterz, tglx, gregkh, mhiramat, arjan,
	thomas.lendacky, dan.j.williams, joro, kvm, aarcange

----- dave.hansen@intel.com wrote:

> On 01/25/2018 06:11 PM, Liran Alon wrote:
> > It is true that attacker cannot speculate to a kernel-address, but
> it
> > doesn't mean it cannot use the leaked kernel-address together with
> > another unrelated vulnerability to build a reliable exploit.
> 
> The address doesn't leak if you can't execute there.  It's the same
> reason that we don't worry about speculation to user addresses from
> the
> kernel when SMEP is in play.

Maybe I misunderstand BTB & BHB internals. Will be glad if you could pinpoint my error.

Google P0 blog-post (https://googleprojectzero.blogspot.co.il/2018/01/reading-privileged-memory-with-side.html) claims that BTB & BHB only use <31 low bits of the address of the source instruction to lookup into the BTB. In addition, it claims that the higher bits of the predicated destination change together with the higher bits of the source instruction.

Therefore, it should be possible to leak the low bits of high predicition-mode code BTB/BHB entries from low prediction-mode code. Because the predicted destination address will reside in user-space.

What am I missing?

Thanks,
-Liran

^ permalink raw reply	[flat|nested] 120+ messages in thread

* RE: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
  2018-01-26  2:50 [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation Liran Alon
@ 2018-01-26  2:55   ` Van De Ven, Arjan
  0 siblings, 0 replies; 120+ messages in thread
From: Van De Ven, Arjan @ 2018-01-26  2:55 UTC (permalink / raw)
  To: Liran Alon, Hansen, Dave
  Cc: labbott, luto, Janakarajan.Natarajan, torvalds, bp, Mallick,
	Asit K, rkrcmar, karahmed, hpa, mingo, Nakajima, Jun, x86, Raj,
	Ashok, tim.c.chen, pbonzini, ak, linux-kernel, dwmw2, peterz,
	tglx, gregkh, mhiramat, arjan, thomas.lendacky, Williams, Dan J,
	joro, kvm, aarcange





> -----Original Message-----
> From: Liran Alon [mailto:liran.alon@oracle.com]
> Sent: Thursday, January 25, 2018 6:50 PM
> To: Hansen, Dave <dave.hansen@intel.com>
> Cc: labbott@redhat.com; luto@kernel.org; Janakarajan.Natarajan@amd.com;
> torvalds@linux-foundation.org; bp@suse.de; Mallick, Asit K
> <asit.k.mallick@intel.com>; rkrcmar@redhat.com; karahmed@amazon.de;
> hpa@zytor.com; mingo@redhat.com; Nakajima, Jun
> <jun.nakajima@intel.com>; x86@kernel.org; Raj, Ashok <ashok.raj@intel.com>;
> Van De Ven, Arjan <arjan.van.de.ven@intel.com>; tim.c.chen@linux.intel.com;
> pbonzini@redhat.com; ak@linux.intel.com; linux-kernel@vger.kernel.org;
> dwmw2@infradead.org; peterz@infradead.org; tglx@linutronix.de;
> gregkh@linuxfoundation.org; mhiramat@kernel.org; arjan@linux.intel.com;
> thomas.lendacky@amd.com; Williams, Dan J <dan.j.williams@intel.com>;
> joro@8bytes.org; kvm@vger.kernel.org; aarcange@redhat.com
> Subject: Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect
> Branch Speculation
> 
> 
 
> Google P0 blog-post
> (https://googleprojectzero.blogspot.co.il/2018/01/reading-privileged-memory-
> with-side.html) claims that BTB & BHB only use <31 low bits of the address of
> the source instruction to lookup into the BTB. In addition, it claims that the
> higher bits of the predicated destination change together with the higher bits of
> the source instruction.
> 
> Therefore, it should be possible to leak the low bits of high predicition-mode
> code BTB/BHB entries from low prediction-mode code. Because the predicted
> destination address will reside in user-space.
> 
> What am I missing?


I thought this email thread was about the RSB...


^ permalink raw reply	[flat|nested] 120+ messages in thread

* RE: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
@ 2018-01-26  2:55   ` Van De Ven, Arjan
  0 siblings, 0 replies; 120+ messages in thread
From: Van De Ven, Arjan @ 2018-01-26  2:55 UTC (permalink / raw)
  To: Liran Alon, Hansen, Dave
  Cc: labbott, luto, Janakarajan.Natarajan, torvalds, bp, Mallick,
	Asit K, rkrcmar, karahmed, hpa, mingo, Nakajima, Jun, x86, Raj,
	Ashok, tim.c.chen, pbonzini, ak, linux-kernel





> -----Original Message-----
> From: Liran Alon [mailto:liran.alon@oracle.com]
> Sent: Thursday, January 25, 2018 6:50 PM
> To: Hansen, Dave <dave.hansen@intel.com>
> Cc: labbott@redhat.com; luto@kernel.org; Janakarajan.Natarajan@amd.com;
> torvalds@linux-foundation.org; bp@suse.de; Mallick, Asit K
> <asit.k.mallick@intel.com>; rkrcmar@redhat.com; karahmed@amazon.de;
> hpa@zytor.com; mingo@redhat.com; Nakajima, Jun
> <jun.nakajima@intel.com>; x86@kernel.org; Raj, Ashok <ashok.raj@intel.com>;
> Van De Ven, Arjan <arjan.van.de.ven@intel.com>; tim.c.chen@linux.intel.com;
> pbonzini@redhat.com; ak@linux.intel.com; linux-kernel@vger.kernel.org;
> dwmw2@infradead.org; peterz@infradead.org; tglx@linutronix.de;
> gregkh@linuxfoundation.org; mhiramat@kernel.org; arjan@linux.intel.com;
> thomas.lendacky@amd.com; Williams, Dan J <dan.j.williams@intel.com>;
> joro@8bytes.org; kvm@vger.kernel.org; aarcange@redhat.com
> Subject: Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect
> Branch Speculation
> 
> 
 
> Google P0 blog-post
> (https://googleprojectzero.blogspot.co.il/2018/01/reading-privileged-memory-
> with-side.html) claims that BTB & BHB only use <31 low bits of the address of
> the source instruction to lookup into the BTB. In addition, it claims that the
> higher bits of the predicated destination change together with the higher bits of
> the source instruction.
> 
> Therefore, it should be possible to leak the low bits of high predicition-mode
> code BTB/BHB entries from low prediction-mode code. Because the predicted
> destination address will reside in user-space.
> 
> What am I missing?


I thought this email thread was about the RSB...


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
  2018-02-04 18:43                         ` Thomas Gleixner
@ 2018-02-06  9:14                           ` David Woodhouse
  -1 siblings, 0 replies; 120+ messages in thread
From: David Woodhouse @ 2018-02-06  9:14 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar
  Cc: Linus Torvalds, KarimAllah Ahmed, Linux Kernel Mailing List,
	Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Arjan van de Ven,
	Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams,
	Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar,
	Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott,
	Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra,
	Radim Krčmář,
	Tim Chen, Tom Lendacky, KVM list, the arch/x86 maintainers,
	Arjan Van De Ven

[-- Attachment #1: Type: text/plain, Size: 1008 bytes --]



On Sun, 2018-02-04 at 19:43 +0100, Thomas Gleixner wrote:
> Yet another possibility is to avoid the function entry and accouting magic
> and use the generic gcc return thunk:
> 
> __x86_return_thunk:
>         call L2
> L1:
>         pause
>         lfence
>         jmp L1
> L2:
>         lea 8(%rsp), %rsp|lea 4(%esp), %esp
>         ret
> 
> which basically refills the RSB on every return. That can be inline or
> extern, but in both cases we should be able to patch it out.
> 
> I have no idea how that affects performance, but it might be worthwhile to
> experiment with that.

That was what I had in mind when I asked HJ to add -mfunction-return.

I suspect the performance hit would be significant because it would
cause a prediction miss on *every* return.

But as I said, let's implement what we can without IBRS for Skylake,
then we can compare the two options for performance, security coverage
and general fugliness.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
@ 2018-02-06  9:14                           ` David Woodhouse
  0 siblings, 0 replies; 120+ messages in thread
From: David Woodhouse @ 2018-02-06  9:14 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar
  Cc: Linus Torvalds, KarimAllah Ahmed, Linux Kernel Mailing List,
	Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Arjan van de Ven,
	Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams,
	Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar,
	Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott

[-- Attachment #1: Type: text/plain, Size: 1008 bytes --]



On Sun, 2018-02-04 at 19:43 +0100, Thomas Gleixner wrote:
> Yet another possibility is to avoid the function entry and accouting magic
> and use the generic gcc return thunk:
> 
> __x86_return_thunk:
>         call L2
> L1:
>         pause
>         lfence
>         jmp L1
> L2:
>         lea 8(%rsp), %rsp|lea 4(%esp), %esp
>         ret
> 
> which basically refills the RSB on every return. That can be inline or
> extern, but in both cases we should be able to patch it out.
> 
> I have no idea how that affects performance, but it might be worthwhile to
> experiment with that.

That was what I had in mind when I asked HJ to add -mfunction-return.

I suspect the performance hit would be significant because it would
cause a prediction miss on *every* return.

But as I said, let's implement what we can without IBRS for Skylake,
then we can compare the two options for performance, security coverage
and general fugliness.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
  2018-02-04 18:43                         ` Thomas Gleixner
@ 2018-02-04 20:22                           ` David Woodhouse
  -1 siblings, 0 replies; 120+ messages in thread
From: David Woodhouse @ 2018-02-04 20:22 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar
  Cc: Linus Torvalds, KarimAllah Ahmed, Linux Kernel Mailing List,
	Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Arjan van de Ven,
	Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams,
	Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar,
	Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott,
	Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra,
	Radim Krčmář,
	Tim Chen, Tom Lendacky, KVM list, the arch/x86 maintainers,
	Arjan Van De Ven

[-- Attachment #1: Type: text/plain, Size: 533 bytes --]

On Sun, 2018-02-04 at 19:43 +0100, Thomas Gleixner wrote:
> 
> __x86_return_thunk would look like this:
> 
> __x86_return_thunk:
>         testl   $0xf, PER_CPU_VAR(call_depth)
>         jnz     1f      
>         stuff_rsb
>    1:
>         decl    PER_CPU_VAR(call_depth)
>         ret
> 
> The call_depth variable would be reset on context switch.

Note that the 'jnz' can be predicted taken there, allowing the CPU to
speculate all the way to the 'ret'... and beyond.


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
@ 2018-02-04 20:22                           ` David Woodhouse
  0 siblings, 0 replies; 120+ messages in thread
From: David Woodhouse @ 2018-02-04 20:22 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar
  Cc: Linus Torvalds, KarimAllah Ahmed, Linux Kernel Mailing List,
	Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Arjan van de Ven,
	Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams,
	Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar,
	Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott

[-- Attachment #1: Type: text/plain, Size: 533 bytes --]

On Sun, 2018-02-04 at 19:43 +0100, Thomas Gleixner wrote:
> 
> __x86_return_thunk would look like this:
> 
> __x86_return_thunk:
>         testl   $0xf, PER_CPU_VAR(call_depth)
>         jnz     1f      
>         stuff_rsb
>    1:
>         decl    PER_CPU_VAR(call_depth)
>         ret
> 
> The call_depth variable would be reset on context switch.

Note that the 'jnz' can be predicted taken there, allowing the CPU to
speculate all the way to the 'ret'... and beyond.


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
  2018-01-23 10:23                       ` Ingo Molnar
@ 2018-02-04 18:43                         ` Thomas Gleixner
  -1 siblings, 0 replies; 120+ messages in thread
From: Thomas Gleixner @ 2018-02-04 18:43 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: David Woodhouse, Linus Torvalds, KarimAllah Ahmed,
	Linux Kernel Mailing List, Andi Kleen, Andrea Arcangeli,
	Andy Lutomirski, Arjan van de Ven, Ashok Raj, Asit Mallick,
	Borislav Petkov, Dan Williams, Dave Hansen, Greg Kroah-Hartman,
	H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan,
	Joerg Roedel, Jun Nakajima, Laura Abbott, Masami Hiramatsu,
	Paolo Bonzini, Peter Zijlstra, Radim Krčmář,
	Tim Chen, Tom Lendacky, KVM list, the arch/x86 maintainers,
	Arjan Van De Ven

[-- Attachment #1: Type: text/plain, Size: 4473 bytes --]

On Tue, 23 Jan 2018, Ingo Molnar wrote:
> * David Woodhouse <dwmw2@infradead.org> wrote:
> 
> > > On SkyLake this would add an overhead of maybe 2-3 cycles per function call and 
> > > obviously all this code and data would be very cache hot. Given that the average 
> > > number of function calls per system call is around a dozen, this would be _much_ 
> > > faster than any microcode/MSR based approach.
> > 
> > That's kind of neat, except you don't want it at the top of the
> > function; you want it at the bottom.
> > 
> > If you could hijack the *return* site, then you could check for
> > underflow and stuff the RSB right there. But in __fentry__ there's not
> > a lot you can do other than complain that something bad is going to
> > happen in the future. You know that a string of 16+ rets is going to
> > happen, but you've got no gadget in *there* to deal with it when it
> > does.
> 
> No, it can be done with the existing CALL instrumentation callback that 
> CONFIG_DYNAMIC_FTRACE=y provides, by pushing a RET trampoline on the stack from 
> the CALL trampoline - see my previous email.
> 
> > HJ did have patches to turn 'ret' into a form of retpoline, which I
> > don't think ever even got performance-tested.
> 
> Return instrumentation is possible as well, but there are two major drawbacks:
> 
>  - GCC support for it is not as widely available and return instrumentation is 
>    less tested in Linux kernel contexts
> 
>  - a major point of my suggestion is that CONFIG_DYNAMIC_FTRACE=y is already 
>    enabled in distros here and today, so the runtime overhead to non-SkyLake CPUs 
>    would be literally zero, while still allowing to fix the RSB vulnerability on 
>    SkyLake.

I played around with that a bit during the week and it turns out to be less
simple than you thought.

1) Injecting a trampoline return only works for functions which have all
   arguments in registers. For functions with arguments on stack like all
   varg functions this breaks because the function wont find its arguments
   anymore.

   I have not yet found a way to figure out reliably which functions have
   arguments on stack. That might be an option to simply ignore them.

   The workaround is to replace the original return on stack with the
   trampoline and store the original return in a per thread stack, which I
   implemented. But this sucks performance wise badly.

2) Doing the whole dance on function entry has a real down side because you
   refill RSB on every 15th return no matter whether its required or
   not. That really gives a very prominent performance hit.

An alternative idea is to do the following (not yet implemented):

__fentry__:
	incl	PER_CPU_VAR(call_depth)
	retq

and use -mfunction-return=thunk-extern which is available on retpoline
enabled compilers. That's a reasonable requirement because w/o retpoline
the whole SKL magic is pointless anyway.

-mfunction-return=thunk-extern issues

	jump	__x86_return_thunk

instead of ret. In the thunk we can do the whole shebang of mitigation.
That jump can be identified at build time and it can be patched into a ret
for unaffected CPUs. Ideally we do the patching at build time and only
patch the jump in when SKL is detected or paranoia requests it.

We could actually look into that for tracing as well. The only reason why
we don't do that is to select the ideal nop for the CPU the kernel runs on,
which obviously cannot be known at build time.

__x86_return_thunk would look like this:

__x86_return_thunk:
	testl	$0xf, PER_CPU_VAR(call_depth)
	jnz	1f	
	stuff_rsb
   1:
	decl	PER_CPU_VAR(call_depth)
   	ret

The call_depth variable would be reset on context switch.

Though that has another problem: tail calls. Tail calls will invoke the
__fentry__ call of the tail called function, which makes the call_depth
counter unbalanced. Tail calls can be prevented by using
-fno-optimize-sibling-calls, but that probably sucks as well.

Yet another possibility is to avoid the function entry and accouting magic
and use the generic gcc return thunk:

__x86_return_thunk:
	call L2
L1:
	pause
	lfence
	jmp L1
L2:
	lea 8(%rsp), %rsp|lea 4(%esp), %esp
	ret

which basically refills the RSB on every return. That can be inline or
extern, but in both cases we should be able to patch it out.

I have no idea how that affects performance, but it might be worthwhile to
experiment with that.

If nobody beats me to it, I'll play around with that some more after
vacation.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
@ 2018-02-04 18:43                         ` Thomas Gleixner
  0 siblings, 0 replies; 120+ messages in thread
From: Thomas Gleixner @ 2018-02-04 18:43 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: David Woodhouse, Linus Torvalds, KarimAllah Ahmed,
	Linux Kernel Mailing List, Andi Kleen, Andrea Arcangeli,
	Andy Lutomirski, Arjan van de Ven, Ashok Raj, Asit Mallick,
	Borislav Petkov, Dan Williams, Dave Hansen, Greg Kroah-Hartman,
	H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan,
	Joerg Roedel, Jun Nakajima

[-- Attachment #1: Type: text/plain, Size: 4473 bytes --]

On Tue, 23 Jan 2018, Ingo Molnar wrote:
> * David Woodhouse <dwmw2@infradead.org> wrote:
> 
> > > On SkyLake this would add an overhead of maybe 2-3 cycles per function call and 
> > > obviously all this code and data would be very cache hot. Given that the average 
> > > number of function calls per system call is around a dozen, this would be _much_ 
> > > faster than any microcode/MSR based approach.
> > 
> > That's kind of neat, except you don't want it at the top of the
> > function; you want it at the bottom.
> > 
> > If you could hijack the *return* site, then you could check for
> > underflow and stuff the RSB right there. But in __fentry__ there's not
> > a lot you can do other than complain that something bad is going to
> > happen in the future. You know that a string of 16+ rets is going to
> > happen, but you've got no gadget in *there* to deal with it when it
> > does.
> 
> No, it can be done with the existing CALL instrumentation callback that 
> CONFIG_DYNAMIC_FTRACE=y provides, by pushing a RET trampoline on the stack from 
> the CALL trampoline - see my previous email.
> 
> > HJ did have patches to turn 'ret' into a form of retpoline, which I
> > don't think ever even got performance-tested.
> 
> Return instrumentation is possible as well, but there are two major drawbacks:
> 
>  - GCC support for it is not as widely available and return instrumentation is 
>    less tested in Linux kernel contexts
> 
>  - a major point of my suggestion is that CONFIG_DYNAMIC_FTRACE=y is already 
>    enabled in distros here and today, so the runtime overhead to non-SkyLake CPUs 
>    would be literally zero, while still allowing to fix the RSB vulnerability on 
>    SkyLake.

I played around with that a bit during the week and it turns out to be less
simple than you thought.

1) Injecting a trampoline return only works for functions which have all
   arguments in registers. For functions with arguments on stack like all
   varg functions this breaks because the function wont find its arguments
   anymore.

   I have not yet found a way to figure out reliably which functions have
   arguments on stack. That might be an option to simply ignore them.

   The workaround is to replace the original return on stack with the
   trampoline and store the original return in a per thread stack, which I
   implemented. But this sucks performance wise badly.

2) Doing the whole dance on function entry has a real down side because you
   refill RSB on every 15th return no matter whether its required or
   not. That really gives a very prominent performance hit.

An alternative idea is to do the following (not yet implemented):

__fentry__:
	incl	PER_CPU_VAR(call_depth)
	retq

and use -mfunction-return=thunk-extern which is available on retpoline
enabled compilers. That's a reasonable requirement because w/o retpoline
the whole SKL magic is pointless anyway.

-mfunction-return=thunk-extern issues

	jump	__x86_return_thunk

instead of ret. In the thunk we can do the whole shebang of mitigation.
That jump can be identified at build time and it can be patched into a ret
for unaffected CPUs. Ideally we do the patching at build time and only
patch the jump in when SKL is detected or paranoia requests it.

We could actually look into that for tracing as well. The only reason why
we don't do that is to select the ideal nop for the CPU the kernel runs on,
which obviously cannot be known at build time.

__x86_return_thunk would look like this:

__x86_return_thunk:
	testl	$0xf, PER_CPU_VAR(call_depth)
	jnz	1f	
	stuff_rsb
   1:
	decl	PER_CPU_VAR(call_depth)
   	ret

The call_depth variable would be reset on context switch.

Though that has another problem: tail calls. Tail calls will invoke the
__fentry__ call of the tail called function, which makes the call_depth
counter unbalanced. Tail calls can be prevented by using
-fno-optimize-sibling-calls, but that probably sucks as well.

Yet another possibility is to avoid the function entry and accouting magic
and use the generic gcc return thunk:

__x86_return_thunk:
	call L2
L1:
	pause
	lfence
	jmp L1
L2:
	lea 8(%rsp), %rsp|lea 4(%esp), %esp
	ret

which basically refills the RSB on every return. That can be inline or
extern, but in both cases we should be able to patch it out.

I have no idea how that affects performance, but it might be worthwhile to
experiment with that.

If nobody beats me to it, I'll play around with that some more after
vacation.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
  2018-01-25 17:16                         ` Greg Kroah-Hartman
@ 2018-01-29 11:59                           ` Mason
  -1 siblings, 0 replies; 120+ messages in thread
From: Mason @ 2018-01-29 11:59 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: LKML, Linux ARM, Marc Zyngier, Will Deacon, Arnd Bergmann

[ Dropping large CC list ]

On 25/01/2018 18:16, Greg Kroah-Hartman wrote:

> On Thu, Jan 25, 2018 at 05:19:04PM +0100, Mason wrote:
> 
>> On 23/01/2018 10:30, David Woodhouse wrote:
>>
>>> Skylake takes predictions from the generic branch target buffer when
>>> the RSB underflows.
>>
>> Adding LAKML.
>>
>> AFAIU, some ARM Cortex cores have the same optimization.
>> (A9 maybe, A17 probably, some recent 64-bit cores)
>>
>> Are there software work-arounds for Spectre planned for arm32 and arm64?
> 
> Yes, I think they are currently buried in one of the arm64 trees, and
> they have been posted to the mailing list a few times in the past.

Found the burial ground, thanks Greg.

  https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=kpti

Via https://developer.arm.com/support/security-update

"For Cortex-R8, Cortex-A8, Cortex-A9, and Cortex-A17, invalidate
the branch predictor using a BPIALL instruction."

The latest arm32 patch series was submitted recently:

  https://www.spinics.net/lists/arm-kernel/msg630892.html

Regards.

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
@ 2018-01-29 11:59                           ` Mason
  0 siblings, 0 replies; 120+ messages in thread
From: Mason @ 2018-01-29 11:59 UTC (permalink / raw)
  To: linux-arm-kernel

[ Dropping large CC list ]

On 25/01/2018 18:16, Greg Kroah-Hartman wrote:

> On Thu, Jan 25, 2018 at 05:19:04PM +0100, Mason wrote:
> 
>> On 23/01/2018 10:30, David Woodhouse wrote:
>>
>>> Skylake takes predictions from the generic branch target buffer when
>>> the RSB underflows.
>>
>> Adding LAKML.
>>
>> AFAIU, some ARM Cortex cores have the same optimization.
>> (A9 maybe, A17 probably, some recent 64-bit cores)
>>
>> Are there software work-arounds for Spectre planned for arm32 and arm64?
> 
> Yes, I think they are currently buried in one of the arm64 trees, and
> they have been posted to the mailing list a few times in the past.

Found the burial ground, thanks Greg.

  https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=kpti

Via https://developer.arm.com/support/security-update

"For Cortex-R8, Cortex-A8, Cortex-A9, and Cortex-A17, invalidate
the branch predictor using a BPIALL instruction."

The latest arm32 patch series was submitted recently:

  https://www.spinics.net/lists/arm-kernel/msg630892.html

Regards.

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
  2018-01-27 13:42               ` Konrad Rzeszutek Wilk
@ 2018-01-27 15:55                 ` Dave Hansen
  -1 siblings, 0 replies; 120+ messages in thread
From: Dave Hansen @ 2018-01-27 15:55 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Konrad Rzeszutek Wilk, Andi Kleen, Linus Torvalds,
	David Woodhouse, Liran Alon, Laura Abbott, Andrew Lutomirski,
	Janakarajan Natarajan, Borislav Petkov, Mallick, Asit K,
	Radim Krčmář,
	KarimAllah Ahmed, Peter Anvin, Nakajima, Jun, Ingo Molnar,
	the arch/x86 maintainers, Raj, Ashok, Van De Ven, Arjan,
	Tim Chen, Paolo Bonzini, Linux Kernel Mailing List,
	Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman,
	Masami Hiramatsu, Arjan van de Ven, Tom Lendacky, Williams,
	Dan J, Joerg Roedel, Andrea Arcangeli, KVM list, Boris Ostrovsky

On 01/27/2018 05:42 AM, Konrad Rzeszutek Wilk wrote:
> On Fri, Jan 26, 2018 at 07:11:47PM +0000, Hansen, Dave wrote:
>> The need for RSB stuffing in all the various scenarios and what the heck it actually mitigates is freakishly complicated.  I've tried to write it all down in one place: https://goo.gl/pXbvBE
> Thank you for sharing that.
> 
> One question on the third from the top (' RSB Stuff (16) After
> irq/nmi/#PF/...').
> 
> It says that :"Return from interrupt path (more than 16 deep) can empty
> RSB".
> 
> Just to clarify - you mean all the returns ('ret') that are happening after
> we call do_IRQ and the stack unwinds - but before we do an 'iret' correct?

Correct.  The RSB is not used or updated by iret.

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
@ 2018-01-27 15:55                 ` Dave Hansen
  0 siblings, 0 replies; 120+ messages in thread
From: Dave Hansen @ 2018-01-27 15:55 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Konrad Rzeszutek Wilk, Andi Kleen, Linus Torvalds,
	David Woodhouse, Liran Alon, Laura Abbott, Andrew Lutomirski,
	Janakarajan Natarajan, Borislav Petkov, Mallick, Asit K,
	Radim Krčmář,
	KarimAllah Ahmed, Peter Anvin, Nakajima, Jun, Ingo Molnar,
	the arch/x86 maintainers, Raj, Ashok, Van De Ven, Arjan

On 01/27/2018 05:42 AM, Konrad Rzeszutek Wilk wrote:
> On Fri, Jan 26, 2018 at 07:11:47PM +0000, Hansen, Dave wrote:
>> The need for RSB stuffing in all the various scenarios and what the heck it actually mitigates is freakishly complicated.  I've tried to write it all down in one place: https://goo.gl/pXbvBE
> Thank you for sharing that.
> 
> One question on the third from the top (' RSB Stuff (16) After
> irq/nmi/#PF/...').
> 
> It says that :"Return from interrupt path (more than 16 deep) can empty
> RSB".
> 
> Just to clarify - you mean all the returns ('ret') that are happening after
> we call do_IRQ and the stack unwinds - but before we do an 'iret' correct?

Correct.  The RSB is not used or updated by iret.

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
  2018-01-26 19:11             ` Hansen, Dave
@ 2018-01-27 13:42               ` Konrad Rzeszutek Wilk
  -1 siblings, 0 replies; 120+ messages in thread
From: Konrad Rzeszutek Wilk @ 2018-01-27 13:42 UTC (permalink / raw)
  To: Hansen, Dave
  Cc: Konrad Rzeszutek Wilk, Andi Kleen, Linus Torvalds,
	David Woodhouse, Liran Alon, Laura Abbott, Andrew Lutomirski,
	Janakarajan Natarajan, Borislav Petkov, Mallick, Asit K,
	Radim Krčmář,
	KarimAllah Ahmed, Peter Anvin, Nakajima, Jun, Ingo Molnar,
	the arch/x86 maintainers, Raj, Ashok, Van De Ven, Arjan,
	Tim Chen, Paolo Bonzini, Linux Kernel Mailing List,
	Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman,
	Masami Hiramatsu, Arjan van de Ven, Tom Lendacky, Williams,
	Dan J, Joerg Roedel, Andrea Arcangeli, KVM list, Boris Ostrovsky

On Fri, Jan 26, 2018 at 07:11:47PM +0000, Hansen, Dave wrote:
> The need for RSB stuffing in all the various scenarios and what the heck it actually mitigates is freakishly complicated.  I've tried to write it all down in one place: https://goo.gl/pXbvBE

Thank you for sharing that.

One question on the third from the top (' RSB Stuff (16) After
irq/nmi/#PF/...').

It says that :"Return from interrupt path (more than 16 deep) can empty
RSB".

Just to clarify - you mean all the returns ('ret') that are happening after
we call do_IRQ and the stack unwinds - but before we do an 'iret' correct?

I am 99% sure that is what you mean, but just confirming as one could read
this as: 'Need to do RSB after an iret' (say you are in the kernel
and then get an interrupt and iret back to kernel).

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
@ 2018-01-27 13:42               ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 120+ messages in thread
From: Konrad Rzeszutek Wilk @ 2018-01-27 13:42 UTC (permalink / raw)
  To: Hansen, Dave
  Cc: Konrad Rzeszutek Wilk, Andi Kleen, Linus Torvalds,
	David Woodhouse, Liran Alon, Laura Abbott, Andrew Lutomirski,
	Janakarajan Natarajan, Borislav Petkov, Mallick, Asit K,
	Radim Krčmář,
	KarimAllah Ahmed, Peter Anvin, Nakajima, Jun, Ingo Molnar,
	the arch/x86 maintainers, Raj, Ashok, Van De Ven, Arjan

On Fri, Jan 26, 2018 at 07:11:47PM +0000, Hansen, Dave wrote:
> The need for RSB stuffing in all the various scenarios and what the heck it actually mitigates is freakishly complicated.  I've tried to write it all down in one place: https://goo.gl/pXbvBE

Thank you for sharing that.

One question on the third from the top (' RSB Stuff (16) After
irq/nmi/#PF/...').

It says that :"Return from interrupt path (more than 16 deep) can empty
RSB".

Just to clarify - you mean all the returns ('ret') that are happening after
we call do_IRQ and the stack unwinds - but before we do an 'iret' correct?

I am 99% sure that is what you mean, but just confirming as one could read
this as: 'Need to do RSB after an iret' (say you are in the kernel
and then get an interrupt and iret back to kernel).

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
  2018-01-26 19:02           ` Konrad Rzeszutek Wilk
@ 2018-01-26 19:11             ` David Woodhouse
  -1 siblings, 0 replies; 120+ messages in thread
From: David Woodhouse @ 2018-01-26 19:11 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Andi Kleen
  Cc: Linus Torvalds, Dave Hansen, Liran Alon, Laura Abbott,
	Andrew Lutomirski, Janakarajan Natarajan, Borislav Petkov,
	Mallick, Asit K, Radim Krčmář,
	KarimAllah Ahmed, Peter Anvin, Jun Nakajima, Ingo Molnar,
	the arch/x86 maintainers, Ashok Raj, Van De Ven, Arjan, Tim Chen,
	Paolo Bonzini, Linux Kernel Mailing List, Peter Zijlstra,
	Thomas Gleixner, Greg Kroah-Hartman, Masami Hiramatsu,
	Arjan van de Ven, Tom Lendacky, Dan Williams, Joerg Roedel,
	Andrea Arcangeli, KVM list, Boris Ostrovsky

[-- Attachment #1: Type: text/plain, Size: 1223 bytes --]

On Fri, 2018-01-26 at 14:02 -0500, Konrad Rzeszutek Wilk wrote:
> 
> -ECONFUSED, see ==>
> 
> Is this incorrect then?
> I see:
> 
> 241          * Skylake era CPUs have a separate issue with *underflow* of the       
> 242          * RSB, when they will predict 'ret' targets from the generic BTB.      
> 243          * The proper mitigation for this is IBRS. If IBRS is not supported     
> 244          * or deactivated in favour of retpolines the RSB fill on context       
> 245          * switch is required.                                                  
> 246          */                       

No, that's correct (well, except that it's kind of written for a world
where Linus is going to let IBRS anywhere near his kernel, and could
survive being rephrased a little :)

The RSB-stuffing on context switch (or kernel entry) is one of a
*litany* of additional hacks we need on Skylake to make retpolines
safe.

We were adding the RSB-stuffing in this case *anyway* for !SMEP, so it
was trivial enough to add in the (|| Skylake) condition while we were
at it.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
@ 2018-01-26 19:11             ` David Woodhouse
  0 siblings, 0 replies; 120+ messages in thread
From: David Woodhouse @ 2018-01-26 19:11 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Andi Kleen
  Cc: Linus Torvalds, Dave Hansen, Liran Alon, Laura Abbott,
	Andrew Lutomirski, Janakarajan Natarajan, Borislav Petkov,
	Mallick, Asit K, Radim Krčmář,
	KarimAllah Ahmed, Peter Anvin, Jun Nakajima, Ingo Molnar,
	the arch/x86 maintainers, Ashok Raj, Van De Ven, Arjan, Tim Chen,
	Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 1223 bytes --]

On Fri, 2018-01-26 at 14:02 -0500, Konrad Rzeszutek Wilk wrote:
> 
> -ECONFUSED, see ==>
> 
> Is this incorrect then?
> I see:
> 
> 241          * Skylake era CPUs have a separate issue with *underflow* of the       
> 242          * RSB, when they will predict 'ret' targets from the generic BTB.      
> 243          * The proper mitigation for this is IBRS. If IBRS is not supported     
> 244          * or deactivated in favour of retpolines the RSB fill on context       
> 245          * switch is required.                                                  
> 246          */                       

No, that's correct (well, except that it's kind of written for a world
where Linus is going to let IBRS anywhere near his kernel, and could
survive being rephrased a little :)

The RSB-stuffing on context switch (or kernel entry) is one of a
*litany* of additional hacks we need on Skylake to make retpolines
safe.

We were adding the RSB-stuffing in this case *anyway* for !SMEP, so it
was trivial enough to add in the (|| Skylake) condition while we were
at it.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 120+ messages in thread

* RE: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
  2018-01-26 19:02           ` Konrad Rzeszutek Wilk
@ 2018-01-26 19:11             ` Hansen, Dave
  -1 siblings, 0 replies; 120+ messages in thread
From: Hansen, Dave @ 2018-01-26 19:11 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Andi Kleen
  Cc: Linus Torvalds, David Woodhouse, Liran Alon, Laura Abbott,
	Andrew Lutomirski, Janakarajan Natarajan, Borislav Petkov,
	Mallick, Asit K, Radim Krčmář,
	KarimAllah Ahmed, Peter Anvin, Nakajima, Jun, Ingo Molnar,
	the arch/x86 maintainers, Raj, Ashok, Van De Ven, Arjan,
	Tim Chen, Paolo Bonzini, Linux Kernel Mailing List,
	Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman,
	Masami Hiramatsu, Arjan van de Ven, Tom Lendacky, Williams,
	Dan J, Joerg Roedel, Andrea Arcangeli, KVM list, Boris Ostrovsky

The need for RSB stuffing in all the various scenarios and what the heck it actually mitigates is freakishly complicated.  I've tried to write it all down in one place: https://goo.gl/pXbvBE

^ permalink raw reply	[flat|nested] 120+ messages in thread

* RE: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
@ 2018-01-26 19:11             ` Hansen, Dave
  0 siblings, 0 replies; 120+ messages in thread
From: Hansen, Dave @ 2018-01-26 19:11 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Andi Kleen
  Cc: Linus Torvalds, David Woodhouse, Liran Alon, Laura Abbott,
	Andrew Lutomirski, Janakarajan Natarajan, Borislav Petkov,
	Mallick, Asit K, Radim Krčmář,
	KarimAllah Ahmed, Peter Anvin, Nakajima, Jun, Ingo Molnar,
	the arch/x86 maintainers, Raj, Ashok, Van De Ven, Arjan,
	Tim Chen, Paolo Bonzini

The need for RSB stuffing in all the various scenarios and what the heck it actually mitigates is freakishly complicated.  I've tried to write it all down in one place: https://goo.gl/pXbvBE

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
  2018-01-26 17:59         ` Andi Kleen
@ 2018-01-26 19:02           ` Konrad Rzeszutek Wilk
  -1 siblings, 0 replies; 120+ messages in thread
From: Konrad Rzeszutek Wilk @ 2018-01-26 19:02 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Linus Torvalds, David Woodhouse, Dave Hansen, Liran Alon,
	Laura Abbott, Andrew Lutomirski, Janakarajan Natarajan,
	Borislav Petkov, Mallick, Asit K, Radim Krčmář,
	KarimAllah Ahmed, Peter Anvin, Jun Nakajima, Ingo Molnar,
	the arch/x86 maintainers, Ashok Raj, Van De Ven, Arjan, Tim Chen,
	Paolo Bonzini, Linux Kernel Mailing List, Peter Zijlstra,
	Thomas Gleixner, Greg Kroah-Hartman, Masami Hiramatsu,
	Arjan van de Ven, Tom Lendacky, Dan Williams, Joerg Roedel,
	Andrea Arcangeli, KVM list, Boris Ostrovsky

On Fri, Jan 26, 2018 at 09:59:01AM -0800, Andi Kleen wrote:
> On Fri, Jan 26, 2018 at 09:19:09AM -0800, Linus Torvalds wrote:
> > On Fri, Jan 26, 2018 at 1:11 AM, David Woodhouse <dwmw2@infradead.org> wrote:
> > >
> > > Do we need to look again at the fact that we've disabled the RSB-
> > > stuffing for SMEP?
> > 
> > Absolutely. SMEP helps make people a lot less worried about things,
> > but it doesn't fix the "BTB only contains partial addresses" case.
> > 
> > But did we do that "disable stuffing with SMEP"? I'm not seeing it. In
> > my tree, it's only conditional on X86_FEATURE_RETPOLINE.
> 
> For Skylake we need RSB stuffing even with SMEP to avoid falling back to the
> BTB on underflow.
> 
> It's also always needed with virtualization.

-ECONFUSED, see ==>

Is this incorrect then?
I see:

241          * Skylake era CPUs have a separate issue with *underflow* of the       
242          * RSB, when they will predict 'ret' targets from the generic BTB.      
243          * The proper mitigation for this is IBRS. If IBRS is not supported     
244          * or deactivated in favour of retpolines the RSB fill on context       
245          * switch is required.                                                  
246          */                        

which came from this:

commit c995efd5a740d9cbafbf58bde4973e8b50b4d761
Author: David Woodhouse <dwmw@amazon.co.uk>
Date:   Fri Jan 12 17:49:25 2018 +0000

    x86/retpoline: Fill RSB on context switch for affected CPUs
    
    On context switch from a shallow call stack to a deeper one, as the CPU
    does 'ret' up the deeper side it may encounter RSB entries (predictions for
    where the 'ret' goes to) which were populated in userspace.
    
    This is problematic if neither SMEP nor KPTI (the latter of which marks
    userspace pages as NX for the kernel) are active, as malicious code in
    userspace may then be executed speculatively.
    
    Overwrite the CPU's return prediction stack with calls which are predicted
    to return to an infinite loop, to "capture" speculation if this
    happens. This is required both for retpoline, and also in conjunction with
    IBRS for !SMEP && !KPTI.
    
    On Skylake+ the problem is slightly different, and an *underflow* of the
    RSB may cause errant branch predictions to occur. So there it's not so much
    overwrite, as *filling* the RSB to attempt to prevent it getting
    empty. This is only a partial solution for Skylake+ since there are many
==>other conditions which may result in the RSB becoming empty. The full	<==
==>solution on Skylake+ is to use IBRS, which will prevent the problem even	<==
    when the RSB becomes empty. With IBRS, the RSB-stuffing will not be
    required on context switch.
    
    
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Acked-by: Arjan van de Ven <arjan@linux.intel.com>


The "full solution" is what is making me confused.
> 
> -Andi

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
@ 2018-01-26 19:02           ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 120+ messages in thread
From: Konrad Rzeszutek Wilk @ 2018-01-26 19:02 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Linus Torvalds, David Woodhouse, Dave Hansen, Liran Alon,
	Laura Abbott, Andrew Lutomirski, Janakarajan Natarajan,
	Borislav Petkov, Mallick, Asit K, Radim Krčmář,
	KarimAllah Ahmed, Peter Anvin, Jun Nakajima, Ingo Molnar,
	the arch/x86 maintainers, Ashok Raj, Van De Ven, Arjan, Tim Chen,
	Paolo Bonzini

On Fri, Jan 26, 2018 at 09:59:01AM -0800, Andi Kleen wrote:
> On Fri, Jan 26, 2018 at 09:19:09AM -0800, Linus Torvalds wrote:
> > On Fri, Jan 26, 2018 at 1:11 AM, David Woodhouse <dwmw2@infradead.org> wrote:
> > >
> > > Do we need to look again at the fact that we've disabled the RSB-
> > > stuffing for SMEP?
> > 
> > Absolutely. SMEP helps make people a lot less worried about things,
> > but it doesn't fix the "BTB only contains partial addresses" case.
> > 
> > But did we do that "disable stuffing with SMEP"? I'm not seeing it. In
> > my tree, it's only conditional on X86_FEATURE_RETPOLINE.
> 
> For Skylake we need RSB stuffing even with SMEP to avoid falling back to the
> BTB on underflow.
> 
> It's also always needed with virtualization.

-ECONFUSED, see ==>

Is this incorrect then?
I see:

241          * Skylake era CPUs have a separate issue with *underflow* of the       
242          * RSB, when they will predict 'ret' targets from the generic BTB.      
243          * The proper mitigation for this is IBRS. If IBRS is not supported     
244          * or deactivated in favour of retpolines the RSB fill on context       
245          * switch is required.                                                  
246          */                        

which came from this:

commit c995efd5a740d9cbafbf58bde4973e8b50b4d761
Author: David Woodhouse <dwmw@amazon.co.uk>
Date:   Fri Jan 12 17:49:25 2018 +0000

    x86/retpoline: Fill RSB on context switch for affected CPUs
    
    On context switch from a shallow call stack to a deeper one, as the CPU
    does 'ret' up the deeper side it may encounter RSB entries (predictions for
    where the 'ret' goes to) which were populated in userspace.
    
    This is problematic if neither SMEP nor KPTI (the latter of which marks
    userspace pages as NX for the kernel) are active, as malicious code in
    userspace may then be executed speculatively.
    
    Overwrite the CPU's return prediction stack with calls which are predicted
    to return to an infinite loop, to "capture" speculation if this
    happens. This is required both for retpoline, and also in conjunction with
    IBRS for !SMEP && !KPTI.
    
    On Skylake+ the problem is slightly different, and an *underflow* of the
    RSB may cause errant branch predictions to occur. So there it's not so much
    overwrite, as *filling* the RSB to attempt to prevent it getting
    empty. This is only a partial solution for Skylake+ since there are many
==>other conditions which may result in the RSB becoming empty. The full	<==
==>solution on Skylake+ is to use IBRS, which will prevent the problem even	<==
    when the RSB becomes empty. With IBRS, the RSB-stuffing will not be
    required on context switch.
    
    
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Acked-by: Arjan van de Ven <arjan@linux.intel.com>


The "full solution" is what is making me confused.
> 
> -Andi

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
  2018-01-26 18:44                     ` Van De Ven, Arjan
@ 2018-01-26 18:53                       ` David Woodhouse
  -1 siblings, 0 replies; 120+ messages in thread
From: David Woodhouse @ 2018-01-26 18:53 UTC (permalink / raw)
  To: Van De Ven, Arjan, Arjan van de Ven, Andi Kleen, Linus Torvalds
  Cc: Hansen, Dave, Liran Alon, Laura Abbott, Andrew Lutomirski,
	Janakarajan Natarajan, Borislav Petkov, Mallick, Asit K,
	Radim Krcmár, KarimAllah Ahmed, Peter Anvin, Nakajima, Jun,
	Ingo Molnar, the arch/x86 maintainers, Raj, Ashok, Tim Chen,
	Paolo Bonzini, Linux Kernel Mailing List, Peter Zijlstra,
	Thomas Gleixner, Greg Kroah-Hartman, Masami Hiramatsu,
	Tom Lendacky, Williams, Dan J, Joerg Roedel, Andrea Arcangeli,
	KVM list

[-- Attachment #1: Type: text/plain, Size: 841 bytes --]

On Fri, 2018-01-26 at 18:44 +0000, Van De Ven, Arjan wrote:
> your question was specific to RSB not BTB. But please show the empirical evidence for RSB ?

We were hypothesising, which should have been clear from:

On Fri, 2018-01-26 at 09:11 +0000, David Woodhouse wrote:
> Likewise if the RSB only stores the low 31 bits of the target, SMEP
> isn't much help there either.
> 
> Do we need to look again at the fact that we've disabled the RSB-
> stuffing for SMEP?

... and later... 

On Fri, 2018-01-26 at 17:31 +0000, David Woodhouse wrote:
> Note, we've switched from talking about BTB to RSB here, so this is a
> valid concern if the *RSB* only has the low bits of the target.

I'm glad to hear that it *isn't* a valid concern for the RSB and the
code in Linus' tree is correct.

Thank you for clearing that up.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
@ 2018-01-26 18:53                       ` David Woodhouse
  0 siblings, 0 replies; 120+ messages in thread
From: David Woodhouse @ 2018-01-26 18:53 UTC (permalink / raw)
  To: Van De Ven, Arjan, Arjan van de Ven, Andi Kleen, Linus Torvalds
  Cc: Hansen, Dave, Liran Alon, Laura Abbott, Andrew Lutomirski,
	Janakarajan Natarajan, Borislav Petkov, Mallick, Asit K,
	Radim Krcmár, KarimAllah Ahmed, Peter Anvin, Nakajima, Jun,
	Ingo Molnar, the arch/x86 maintainers, Raj, Ashok, Tim Chen,
	Paolo Bonzini, Linux Kernel Mailing List, Peter Zijlstra,
	Thomas Gleixner

[-- Attachment #1: Type: text/plain, Size: 841 bytes --]

On Fri, 2018-01-26 at 18:44 +0000, Van De Ven, Arjan wrote:
> your question was specific to RSB not BTB. But please show the empirical evidence for RSB ?

We were hypothesising, which should have been clear from:

On Fri, 2018-01-26 at 09:11 +0000, David Woodhouse wrote:
> Likewise if the RSB only stores the low 31 bits of the target, SMEP
> isn't much help there either.
> 
> Do we need to look again at the fact that we've disabled the RSB-
> stuffing for SMEP?

... and later... 

On Fri, 2018-01-26 at 17:31 +0000, David Woodhouse wrote:
> Note, we've switched from talking about BTB to RSB here, so this is a
> valid concern if the *RSB* only has the low bits of the target.

I'm glad to hear that it *isn't* a valid concern for the RSB and the
code in Linus' tree is correct.

Thank you for clearing that up.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 120+ messages in thread

* RE: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
  2018-01-26 18:43                   ` David Woodhouse
@ 2018-01-26 18:44                     ` Van De Ven, Arjan
  -1 siblings, 0 replies; 120+ messages in thread
From: Van De Ven, Arjan @ 2018-01-26 18:44 UTC (permalink / raw)
  To: David Woodhouse, Arjan van de Ven, Andi Kleen, Linus Torvalds
  Cc: Hansen, Dave, Liran Alon, Laura Abbott, Andrew Lutomirski,
	Janakarajan Natarajan, Borislav Petkov, Mallick, Asit K,
	Radim Krcmár, KarimAllah Ahmed, Peter Anvin, Nakajima, Jun,
	Ingo Molnar, the arch/x86 maintainers, Raj, Ashok, Tim Chen,
	Paolo Bonzini, Linux Kernel Mailing List, Peter Zijlstra,
	Thomas Gleixner, Greg Kroah-Hartman, Masami Hiramatsu,
	Tom Lendacky, Williams, Dan J, Joerg Roedel, Andrea Arcangeli,
	KVM list

> > you asked before and even before you sent the email I confirmed to
> > you that the document is correct
> >
> > I'm not sure what the point is to then question that again 15 minutes
> > later other than creating more noise.
> 
> Apologies, I hadn't seen the comment on IRC.
> 
> Sometimes the docs *don't* get it right, especially when they're
> released in a hurry as that one was. I note there's a *fourth* version
> of microcode-update-guidance.pdf available now, for example :)
> 
> So it is useful that you have explicitly stated that for *this*
> specific concern, the document is in fact correct that SMEP saves us
> from BTB and RSB pollution, *despite* the empirical evidence that those
> structures only hold the low 31 bits.

your question was specific to RSB not BTB. But please show the empirical evidence for RSB ?


^ permalink raw reply	[flat|nested] 120+ messages in thread

* RE: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
@ 2018-01-26 18:44                     ` Van De Ven, Arjan
  0 siblings, 0 replies; 120+ messages in thread
From: Van De Ven, Arjan @ 2018-01-26 18:44 UTC (permalink / raw)
  To: David Woodhouse, Arjan van de Ven, Andi Kleen, Linus Torvalds
  Cc: Hansen, Dave, Liran Alon, Laura Abbott, Andrew Lutomirski,
	Janakarajan Natarajan, Borislav Petkov, Mallick, Asit K,
	Radim Krcmár, KarimAllah Ahmed, Peter Anvin, Nakajima, Jun,
	Ingo Molnar, the arch/x86 maintainers, Raj, Ashok, Tim Chen,
	Paolo Bonzini, Linux Kernel Mailing List, Peter Zijlstra

> > you asked before and even before you sent the email I confirmed to
> > you that the document is correct
> >
> > I'm not sure what the point is to then question that again 15 minutes
> > later other than creating more noise.
> 
> Apologies, I hadn't seen the comment on IRC.
> 
> Sometimes the docs *don't* get it right, especially when they're
> released in a hurry as that one was. I note there's a *fourth* version
> of microcode-update-guidance.pdf available now, for example :)
> 
> So it is useful that you have explicitly stated that for *this*
> specific concern, the document is in fact correct that SMEP saves us
> from BTB and RSB pollution, *despite* the empirical evidence that those
> structures only hold the low 31 bits.

your question was specific to RSB not BTB. But please show the empirical evidence for RSB ?


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
  2018-01-26 18:28                 ` Van De Ven, Arjan
@ 2018-01-26 18:43                   ` David Woodhouse
  -1 siblings, 0 replies; 120+ messages in thread
From: David Woodhouse @ 2018-01-26 18:43 UTC (permalink / raw)
  To: Van De Ven, Arjan, Arjan van de Ven, Andi Kleen, Linus Torvalds
  Cc: Hansen, Dave, Liran Alon, Laura Abbott, Andrew Lutomirski,
	Janakarajan Natarajan, Borislav Petkov, Mallick, Asit K,
	Radim Krcmár, KarimAllah Ahmed, Peter Anvin, Nakajima, Jun,
	Ingo Molnar, the arch/x86 maintainers, Raj, Ashok, Tim Chen,
	Paolo Bonzini, Linux Kernel Mailing List, Peter Zijlstra,
	Thomas Gleixner, Greg Kroah-Hartman, Masami Hiramatsu,
	Tom Lendacky, Williams, Dan J, Joerg Roedel, Andrea Arcangeli,
	KVM list

[-- Attachment #1: Type: text/plain, Size: 1121 bytes --]

On Fri, 2018-01-26 at 18:28 +0000, Van De Ven, Arjan wrote:
> > As you know well, I mean "we think Intel's document is not
> > correct".
>
> you asked before and even before you sent the email I confirmed to
> you that the document is correct
> 
> I'm not sure what the point is to then question that again 15 minutes
> later other than creating more noise.

Apologies, I hadn't seen the comment on IRC.

Sometimes the docs *don't* get it right, especially when they're
released in a hurry as that one was. I note there's a *fourth* version
of microcode-update-guidance.pdf available now, for example :)

So it is useful that you have explicitly stated that for *this*
specific concern, the document is in fact correct that SMEP saves us
from BTB and RSB pollution, *despite* the empirical evidence that those
structures only hold the low 31 bits.

I'm going to get back to other things now, although I'm sure others may
be very interested to reconcile the empirical evidence with what you
say, and want to know *how* that can be the case. Which I'm sure you
won't be able to say in public anyway.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
@ 2018-01-26 18:43                   ` David Woodhouse
  0 siblings, 0 replies; 120+ messages in thread
From: David Woodhouse @ 2018-01-26 18:43 UTC (permalink / raw)
  To: Van De Ven, Arjan, Arjan van de Ven, Andi Kleen, Linus Torvalds
  Cc: Hansen, Dave, Liran Alon, Laura Abbott, Andrew Lutomirski,
	Janakarajan Natarajan, Borislav Petkov, Mallick, Asit K,
	Radim Krcmár, KarimAllah Ahmed, Peter Anvin, Nakajima, Jun,
	Ingo Molnar, the arch/x86 maintainers, Raj, Ashok, Tim Chen,
	Paolo Bonzini, Linux Kernel Mailing List, Peter Zijlstra,
	Thomas Gleixner

[-- Attachment #1: Type: text/plain, Size: 1121 bytes --]

On Fri, 2018-01-26 at 18:28 +0000, Van De Ven, Arjan wrote:
> > As you know well, I mean "we think Intel's document is not
> > correct".
>
> you asked before and even before you sent the email I confirmed to
> you that the document is correct
> 
> I'm not sure what the point is to then question that again 15 minutes
> later other than creating more noise.

Apologies, I hadn't seen the comment on IRC.

Sometimes the docs *don't* get it right, especially when they're
released in a hurry as that one was. I note there's a *fourth* version
of microcode-update-guidance.pdf available now, for example :)

So it is useful that you have explicitly stated that for *this*
specific concern, the document is in fact correct that SMEP saves us
from BTB and RSB pollution, *despite* the empirical evidence that those
structures only hold the low 31 bits.

I'm going to get back to other things now, although I'm sure others may
be very interested to reconcile the empirical evidence with what you
say, and want to know *how* that can be the case. Which I'm sure you
won't be able to say in public anyway.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 120+ messages in thread

* RE: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
  2018-01-26 18:26               ` David Woodhouse
@ 2018-01-26 18:28                 ` Van De Ven, Arjan
  -1 siblings, 0 replies; 120+ messages in thread
From: Van De Ven, Arjan @ 2018-01-26 18:28 UTC (permalink / raw)
  To: David Woodhouse, Arjan van de Ven, Andi Kleen, Linus Torvalds
  Cc: Hansen, Dave, Liran Alon, Laura Abbott, Andrew Lutomirski,
	Janakarajan Natarajan, Borislav Petkov, Mallick, Asit K,
	Radim Krcmár, KarimAllah Ahmed, Peter Anvin, Nakajima, Jun,
	Ingo Molnar, the arch/x86 maintainers, Raj, Ashok, Tim Chen,
	Paolo Bonzini, Linux Kernel Mailing List, Peter Zijlstra,
	Thomas Gleixner, Greg Kroah-Hartman, Masami Hiramatsu,
	Tom Lendacky, Williams, Dan J, Joerg Roedel, Andrea Arcangeli,
	KVM list

> On Fri, 2018-01-26 at 10:12 -0800, Arjan van de Ven wrote:
> > On 1/26/2018 10:11 AM, David Woodhouse wrote:
> > >
> > > I am *actively* ignoring Skylake right now. This is about per-SKL
> > > userspace even with SMEP, because we think Intel's document lies to us.
> >
> > if you think we lie to you then I think we're done with the conversation?
> >
> > Please tell us then what you deploy in AWS for your customers ?
> >
> > or show us research that shows we lied to you?
> 
> As you know well, I mean "we think Intel's document is not correct".

you asked before and even before you sent the email I confirmed to you that the document is correct

I'm not sure what the point is to then question that again 15 minutes later other than creating more noise.



^ permalink raw reply	[flat|nested] 120+ messages in thread

* RE: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
@ 2018-01-26 18:28                 ` Van De Ven, Arjan
  0 siblings, 0 replies; 120+ messages in thread
From: Van De Ven, Arjan @ 2018-01-26 18:28 UTC (permalink / raw)
  To: David Woodhouse, Arjan van de Ven, Andi Kleen, Linus Torvalds
  Cc: Hansen, Dave, Liran Alon, Laura Abbott, Andrew Lutomirski,
	Janakarajan Natarajan, Borislav Petkov, Mallick, Asit K,
	Radim Krcmár, KarimAllah Ahmed, Peter Anvin, Nakajima, Jun,
	Ingo Molnar, the arch/x86 maintainers, Raj, Ashok, Tim Chen,
	Paolo Bonzini, Linux Kernel Mailing List, Peter Zijlstra

> On Fri, 2018-01-26 at 10:12 -0800, Arjan van de Ven wrote:
> > On 1/26/2018 10:11 AM, David Woodhouse wrote:
> > >
> > > I am *actively* ignoring Skylake right now. This is about per-SKL
> > > userspace even with SMEP, because we think Intel's document lies to us.
> >
> > if you think we lie to you then I think we're done with the conversation?
> >
> > Please tell us then what you deploy in AWS for your customers ?
> >
> > or show us research that shows we lied to you?
> 
> As you know well, I mean "we think Intel's document is not correct".

you asked before and even before you sent the email I confirmed to you that the document is correct

I'm not sure what the point is to then question that again 15 minutes later other than creating more noise.



^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
  2018-01-26 18:12             ` Arjan van de Ven
@ 2018-01-26 18:26               ` David Woodhouse
  -1 siblings, 0 replies; 120+ messages in thread
From: David Woodhouse @ 2018-01-26 18:26 UTC (permalink / raw)
  To: Arjan van de Ven, Andi Kleen, Linus Torvalds
  Cc: Dave Hansen, Liran Alon, Laura Abbott, Andrew Lutomirski,
	Janakarajan Natarajan, Borislav Petkov, Mallick, Asit K,
	Radim Krčmář,
	KarimAllah Ahmed, Peter Anvin, Jun Nakajima, Ingo Molnar,
	the arch/x86 maintainers, Ashok Raj, Van De Ven, Arjan, Tim Chen,
	Paolo Bonzini, Linux Kernel Mailing List, Peter Zijlstra,
	Thomas Gleixner, Greg Kroah-Hartman, Masami Hiramatsu,
	Tom Lendacky, Dan Williams, Joerg Roedel, Andrea Arcangeli,
	KVM list

[-- Attachment #1: Type: text/plain, Size: 807 bytes --]

On Fri, 2018-01-26 at 10:12 -0800, Arjan van de Ven wrote:
> On 1/26/2018 10:11 AM, David Woodhouse wrote:
> > 
> > I am *actively* ignoring Skylake right now. This is about per-SKL
> > userspace even with SMEP, because we think Intel's document lies to us.
> 
> if you think we lie to you then I think we're done with the conversation?
> 
> Please tell us then what you deploy in AWS for your customers ?
> 
> or show us research that shows we lied to you?

As you know well, I mean "we think Intel's document is not correct". 

The evidence which made us suspect that is fairly clear in the last few
emails in this thread — it's about the BTB/RSB only having the low bits
of the target, which would mean that userspace *can* put malicious
targets into the RSB, regardless of SMEP.


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
@ 2018-01-26 18:26               ` David Woodhouse
  0 siblings, 0 replies; 120+ messages in thread
From: David Woodhouse @ 2018-01-26 18:26 UTC (permalink / raw)
  To: Arjan van de Ven, Andi Kleen, Linus Torvalds
  Cc: Dave Hansen, Liran Alon, Laura Abbott, Andrew Lutomirski,
	Janakarajan Natarajan, Borislav Petkov, Mallick, Asit K,
	Radim Krčmář,
	KarimAllah Ahmed, Peter Anvin, Jun Nakajima, Ingo Molnar,
	the arch/x86 maintainers, Ashok Raj, Van De Ven, Arjan, Tim Chen,
	Paolo Bonzini, Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 807 bytes --]

On Fri, 2018-01-26 at 10:12 -0800, Arjan van de Ven wrote:
> On 1/26/2018 10:11 AM, David Woodhouse wrote:
> > 
> > I am *actively* ignoring Skylake right now. This is about per-SKL
> > userspace even with SMEP, because we think Intel's document lies to us.
> 
> if you think we lie to you then I think we're done with the conversation?
> 
> Please tell us then what you deploy in AWS for your customers ?
> 
> or show us research that shows we lied to you?

As you know well, I mean "we think Intel's document is not correct". 

The evidence which made us suspect that is fairly clear in the last few
emails in this thread — it's about the BTB/RSB only having the low bits
of the target, which would mean that userspace *can* put malicious
targets into the RSB, regardless of SMEP.


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
  2018-01-26 18:11           ` David Woodhouse
@ 2018-01-26 18:12             ` Arjan van de Ven
  -1 siblings, 0 replies; 120+ messages in thread
From: Arjan van de Ven @ 2018-01-26 18:12 UTC (permalink / raw)
  To: David Woodhouse, Andi Kleen, Linus Torvalds
  Cc: Dave Hansen, Liran Alon, Laura Abbott, Andrew Lutomirski,
	Janakarajan Natarajan, Borislav Petkov, Mallick, Asit K,
	Radim Krčmář,
	KarimAllah Ahmed, Peter Anvin, Jun Nakajima, Ingo Molnar,
	the arch/x86 maintainers, Ashok Raj, Van De Ven, Arjan, Tim Chen,
	Paolo Bonzini, Linux Kernel Mailing List, Peter Zijlstra,
	Thomas Gleixner, Greg Kroah-Hartman, Masami Hiramatsu,
	Tom Lendacky, Dan Williams, Joerg Roedel, Andrea Arcangeli,
	KVM list

On 1/26/2018 10:11 AM, David Woodhouse wrote:
> 
> I am *actively* ignoring Skylake right now. This is about per-SKL
> userspace even with SMEP, because we think Intel's document lies to us.

if you think we lie to you then I think we're done with the conversation?

Please tell us then what you deploy in AWS for your customers ?

or show us research that shows we lied to you?

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
@ 2018-01-26 18:12             ` Arjan van de Ven
  0 siblings, 0 replies; 120+ messages in thread
From: Arjan van de Ven @ 2018-01-26 18:12 UTC (permalink / raw)
  To: David Woodhouse, Andi Kleen, Linus Torvalds
  Cc: Dave Hansen, Liran Alon, Laura Abbott, Andrew Lutomirski,
	Janakarajan Natarajan, Borislav Petkov, Mallick, Asit K,
	Radim Krčmář,
	KarimAllah Ahmed, Peter Anvin, Jun Nakajima, Ingo Molnar,
	the arch/x86 maintainers, Ashok Raj, Van De Ven, Arjan, Tim Chen,
	Paolo Bonzini, Linux Kernel Mailing List

On 1/26/2018 10:11 AM, David Woodhouse wrote:
> 
> I am *actively* ignoring Skylake right now. This is about per-SKL
> userspace even with SMEP, because we think Intel's document lies to us.

if you think we lie to you then I think we're done with the conversation?

Please tell us then what you deploy in AWS for your customers ?

or show us research that shows we lied to you?

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
  2018-01-26 17:59         ` Andi Kleen
@ 2018-01-26 18:11           ` David Woodhouse
  -1 siblings, 0 replies; 120+ messages in thread
From: David Woodhouse @ 2018-01-26 18:11 UTC (permalink / raw)
  To: Andi Kleen, Linus Torvalds
  Cc: Dave Hansen, Liran Alon, Laura Abbott, Andrew Lutomirski,
	Janakarajan Natarajan, Borislav Petkov, Mallick, Asit K,
	Radim Krčmář,
	KarimAllah Ahmed, Peter Anvin, Jun Nakajima, Ingo Molnar,
	the arch/x86 maintainers, Ashok Raj, Van De Ven, Arjan, Tim Chen,
	Paolo Bonzini, Linux Kernel Mailing List, Peter Zijlstra,
	Thomas Gleixner, Greg Kroah-Hartman, Masami Hiramatsu,
	Arjan van de Ven, Tom Lendacky, Dan Williams, Joerg Roedel,
	Andrea Arcangeli, KVM list

[-- Attachment #1: Type: text/plain, Size: 1025 bytes --]

On Fri, 2018-01-26 at 09:59 -0800, Andi Kleen wrote:
> On Fri, Jan 26, 2018 at 09:19:09AM -0800, Linus Torvalds wrote:
> > 
> > On Fri, Jan 26, 2018 at 1:11 AM, David Woodhouse  wrote:
> > > 
> > > 
> > > Do we need to look again at the fact that we've disabled the RSB-
> > > stuffing for SMEP?
> > Absolutely. SMEP helps make people a lot less worried about things,
> > but it doesn't fix the "BTB only contains partial addresses" case.
> > 
> > But did we do that "disable stuffing with SMEP"? I'm not seeing it. In
> > my tree, it's only conditional on X86_FEATURE_RETPOLINE.
>
> For Skylake we need RSB stuffing even with SMEP to avoid falling back to the
> BTB on underflow.

I am *actively* ignoring Skylake right now. This is about per-SKL
userspace even with SMEP, because we think Intel's document lies to us.

If the RSB only holds the low bits of the target, then a userspace
attacker can populate an RSB entry which points to a kernel gadget of
her choice, even with SMEP or KPTI enabled.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
@ 2018-01-26 18:11           ` David Woodhouse
  0 siblings, 0 replies; 120+ messages in thread
From: David Woodhouse @ 2018-01-26 18:11 UTC (permalink / raw)
  To: Andi Kleen, Linus Torvalds
  Cc: Dave Hansen, Liran Alon, Laura Abbott, Andrew Lutomirski,
	Janakarajan Natarajan, Borislav Petkov, Mallick, Asit K,
	Radim Krčmář,
	KarimAllah Ahmed, Peter Anvin, Jun Nakajima, Ingo Molnar,
	the arch/x86 maintainers, Ashok Raj, Van De Ven, Arjan, Tim Chen,
	Paolo Bonzini, Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 1025 bytes --]

On Fri, 2018-01-26 at 09:59 -0800, Andi Kleen wrote:
> On Fri, Jan 26, 2018 at 09:19:09AM -0800, Linus Torvalds wrote:
> > 
> > On Fri, Jan 26, 2018 at 1:11 AM, David Woodhouse  wrote:
> > > 
> > > 
> > > Do we need to look again at the fact that we've disabled the RSB-
> > > stuffing for SMEP?
> > Absolutely. SMEP helps make people a lot less worried about things,
> > but it doesn't fix the "BTB only contains partial addresses" case.
> > 
> > But did we do that "disable stuffing with SMEP"? I'm not seeing it. In
> > my tree, it's only conditional on X86_FEATURE_RETPOLINE.
>
> For Skylake we need RSB stuffing even with SMEP to avoid falling back to the
> BTB on underflow.

I am *actively* ignoring Skylake right now. This is about per-SKL
userspace even with SMEP, because we think Intel's document lies to us.

If the RSB only holds the low bits of the target, then a userspace
attacker can populate an RSB entry which points to a kernel gadget of
her choice, even with SMEP or KPTI enabled.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
  2018-01-26 17:19       ` Linus Torvalds
@ 2018-01-26 17:59         ` Andi Kleen
  -1 siblings, 0 replies; 120+ messages in thread
From: Andi Kleen @ 2018-01-26 17:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Woodhouse, Dave Hansen, Liran Alon, Laura Abbott,
	Andrew Lutomirski, Janakarajan Natarajan, Borislav Petkov,
	Mallick, Asit K, Radim Krčmář,
	KarimAllah Ahmed, Peter Anvin, Jun Nakajima, Ingo Molnar,
	the arch/x86 maintainers, Ashok Raj, Van De Ven, Arjan, Tim Chen,
	Paolo Bonzini, Linux Kernel Mailing List, Peter Zijlstra,
	Thomas Gleixner, Greg Kroah-Hartman, Masami Hiramatsu,
	Arjan van de Ven, Tom Lendacky, Dan Williams, Joerg Roedel,
	Andrea Arcangeli, KVM list

On Fri, Jan 26, 2018 at 09:19:09AM -0800, Linus Torvalds wrote:
> On Fri, Jan 26, 2018 at 1:11 AM, David Woodhouse <dwmw2@infradead.org> wrote:
> >
> > Do we need to look again at the fact that we've disabled the RSB-
> > stuffing for SMEP?
> 
> Absolutely. SMEP helps make people a lot less worried about things,
> but it doesn't fix the "BTB only contains partial addresses" case.
> 
> But did we do that "disable stuffing with SMEP"? I'm not seeing it. In
> my tree, it's only conditional on X86_FEATURE_RETPOLINE.

For Skylake we need RSB stuffing even with SMEP to avoid falling back to the
BTB on underflow.

It's also always needed with virtualization.

-Andi

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
@ 2018-01-26 17:59         ` Andi Kleen
  0 siblings, 0 replies; 120+ messages in thread
From: Andi Kleen @ 2018-01-26 17:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Woodhouse, Dave Hansen, Liran Alon, Laura Abbott,
	Andrew Lutomirski, Janakarajan Natarajan, Borislav Petkov,
	Mallick, Asit K, Radim Krčmář,
	KarimAllah Ahmed, Peter Anvin, Jun Nakajima, Ingo Molnar,
	the arch/x86 maintainers, Ashok Raj, Van De Ven, Arjan, Tim Chen,
	Paolo Bonzini, Linux Kernel Mailing List

On Fri, Jan 26, 2018 at 09:19:09AM -0800, Linus Torvalds wrote:
> On Fri, Jan 26, 2018 at 1:11 AM, David Woodhouse <dwmw2@infradead.org> wrote:
> >
> > Do we need to look again at the fact that we've disabled the RSB-
> > stuffing for SMEP?
> 
> Absolutely. SMEP helps make people a lot less worried about things,
> but it doesn't fix the "BTB only contains partial addresses" case.
> 
> But did we do that "disable stuffing with SMEP"? I'm not seeing it. In
> my tree, it's only conditional on X86_FEATURE_RETPOLINE.

For Skylake we need RSB stuffing even with SMEP to avoid falling back to the
BTB on underflow.

It's also always needed with virtualization.

-Andi

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
  2018-01-26 17:29         ` David Woodhouse
@ 2018-01-26 17:31           ` David Woodhouse
  -1 siblings, 0 replies; 120+ messages in thread
From: David Woodhouse @ 2018-01-26 17:31 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Hansen, Liran Alon, Laura Abbott, Andrew Lutomirski,
	Janakarajan Natarajan, Borislav Petkov, Mallick, Asit K,
	Radim Krčmář,
	KarimAllah Ahmed, Peter Anvin, Jun Nakajima, Ingo Molnar,
	the arch/x86 maintainers, Ashok Raj, Van De Ven, Arjan, Tim Chen,
	Paolo Bonzini, Andi Kleen, Linux Kernel Mailing List,
	Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman,
	Masami Hiramatsu, Arjan van de Ven, Tom Lendacky, Dan Williams,
	Joerg Roedel, Andrea Arcangeli, KVM list

[-- Attachment #1: Type: text/plain, Size: 1226 bytes --]

On Fri, 2018-01-26 at 17:29 +0000, David Woodhouse wrote:
> On Fri, 2018-01-26 at 09:19 -0800, Linus Torvalds wrote:
> > On Fri, Jan 26, 2018 at 1:11 AM, David Woodhouse <dwmw2@infradead.org> wrote:
> > > Do we need to look again at the fact that we've disabled the RSB-
> > > stuffing for SMEP?
> >
> > Absolutely. SMEP helps make people a lot less worried about things,
> > but it doesn't fix the "BTB only contains partial addresses" case.
> > 
> > But did we do that "disable stuffing with SMEP"? I'm not seeing it. In
> > my tree, it's only conditional on X86_FEATURE_RETPOLINE.
>
> That's the vmexit one. The one on context switch is in
> commit c995efd5a7 and has its own X86_FEATURE_RSB_CTXSW which in
> kernel/cpu/bugs.c is turned on for (!SMEP || Skylake).
> 
> The "low bits of the BTB" issue probably means that wants to be
> X86_FEATURE_RETPOLINE too. Despite Intel's doc saying otherwise.
> 
> (Intel's doc also says to do it on kernel entry, but we elected to do
> it on context switch instead since *that's* when the imbalances show up
> in the RSB.)

Note, we've switched from talking about BTB to RSB here, so this is a
valid concern if the *RSB* only has the low bits of the target.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
@ 2018-01-26 17:31           ` David Woodhouse
  0 siblings, 0 replies; 120+ messages in thread
From: David Woodhouse @ 2018-01-26 17:31 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Hansen, Liran Alon, Laura Abbott, Andrew Lutomirski,
	Janakarajan Natarajan, Borislav Petkov, Mallick, Asit K,
	Radim Krčmář,
	KarimAllah Ahmed, Peter Anvin, Jun Nakajima, Ingo Molnar,
	the arch/x86 maintainers, Ashok Raj, Van De Ven, Arjan, Tim Chen,
	Paolo Bonzini, Andi Kleen

[-- Attachment #1: Type: text/plain, Size: 1226 bytes --]

On Fri, 2018-01-26 at 17:29 +0000, David Woodhouse wrote:
> On Fri, 2018-01-26 at 09:19 -0800, Linus Torvalds wrote:
> > On Fri, Jan 26, 2018 at 1:11 AM, David Woodhouse <dwmw2@infradead.org> wrote:
> > > Do we need to look again at the fact that we've disabled the RSB-
> > > stuffing for SMEP?
> >
> > Absolutely. SMEP helps make people a lot less worried about things,
> > but it doesn't fix the "BTB only contains partial addresses" case.
> > 
> > But did we do that "disable stuffing with SMEP"? I'm not seeing it. In
> > my tree, it's only conditional on X86_FEATURE_RETPOLINE.
>
> That's the vmexit one. The one on context switch is in
> commit c995efd5a7 and has its own X86_FEATURE_RSB_CTXSW which in
> kernel/cpu/bugs.c is turned on for (!SMEP || Skylake).
> 
> The "low bits of the BTB" issue probably means that wants to be
> X86_FEATURE_RETPOLINE too. Despite Intel's doc saying otherwise.
> 
> (Intel's doc also says to do it on kernel entry, but we elected to do
> it on context switch instead since *that's* when the imbalances show up
> in the RSB.)

Note, we've switched from talking about BTB to RSB here, so this is a
valid concern if the *RSB* only has the low bits of the target.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
  2018-01-26 17:19       ` Linus Torvalds
@ 2018-01-26 17:29         ` David Woodhouse
  -1 siblings, 0 replies; 120+ messages in thread
From: David Woodhouse @ 2018-01-26 17:29 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Hansen, Liran Alon, Laura Abbott, Andrew Lutomirski,
	Janakarajan Natarajan, Borislav Petkov, Mallick, Asit K,
	Radim Krčmář,
	KarimAllah Ahmed, Peter Anvin, Jun Nakajima, Ingo Molnar,
	the arch/x86 maintainers, Ashok Raj, Van De Ven, Arjan, Tim Chen,
	Paolo Bonzini, Andi Kleen, Linux Kernel Mailing List,
	Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman,
	Masami Hiramatsu, Arjan van de Ven, Tom Lendacky, Dan Williams,
	Joerg Roedel, Andrea Arcangeli, KVM list

[-- Attachment #1: Type: text/plain, Size: 997 bytes --]

On Fri, 2018-01-26 at 09:19 -0800, Linus Torvalds wrote:
> On Fri, Jan 26, 2018 at 1:11 AM, David Woodhouse <dwmw2@infradead.org> wrote:
> > 
> > 
> > Do we need to look again at the fact that we've disabled the RSB-
> > stuffing for SMEP?
> Absolutely. SMEP helps make people a lot less worried about things,
> but it doesn't fix the "BTB only contains partial addresses" case.
> 
> But did we do that "disable stuffing with SMEP"? I'm not seeing it. In
> my tree, it's only conditional on X86_FEATURE_RETPOLINE.

That's the vmexit one. The one on context switch is in
commit c995efd5a7 and has its own X86_FEATURE_RSB_CTXSW which in
kernel/cpu/bugs.c is turned on for (!SMEP || Skylake).

The "low bits of the BTB" issue probably means that wants to be
X86_FEATURE_RETPOLINE too. Despite Intel's doc saying otherwise.

(Intel's doc also says to do it on kernel entry, but we elected to do
it on context switch instead since *that's* when the imbalances show up
in the RSB.)

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
@ 2018-01-26 17:29         ` David Woodhouse
  0 siblings, 0 replies; 120+ messages in thread
From: David Woodhouse @ 2018-01-26 17:29 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Hansen, Liran Alon, Laura Abbott, Andrew Lutomirski,
	Janakarajan Natarajan, Borislav Petkov, Mallick, Asit K,
	Radim Krčmář,
	KarimAllah Ahmed, Peter Anvin, Jun Nakajima, Ingo Molnar,
	the arch/x86 maintainers, Ashok Raj, Van De Ven, Arjan, Tim Chen,
	Paolo Bonzini, Andi Kleen

[-- Attachment #1: Type: text/plain, Size: 997 bytes --]

On Fri, 2018-01-26 at 09:19 -0800, Linus Torvalds wrote:
> On Fri, Jan 26, 2018 at 1:11 AM, David Woodhouse <dwmw2@infradead.org> wrote:
> > 
> > 
> > Do we need to look again at the fact that we've disabled the RSB-
> > stuffing for SMEP?
> Absolutely. SMEP helps make people a lot less worried about things,
> but it doesn't fix the "BTB only contains partial addresses" case.
> 
> But did we do that "disable stuffing with SMEP"? I'm not seeing it. In
> my tree, it's only conditional on X86_FEATURE_RETPOLINE.

That's the vmexit one. The one on context switch is in
commit c995efd5a7 and has its own X86_FEATURE_RSB_CTXSW which in
kernel/cpu/bugs.c is turned on for (!SMEP || Skylake).

The "low bits of the BTB" issue probably means that wants to be
X86_FEATURE_RETPOLINE too. Despite Intel's doc saying otherwise.

(Intel's doc also says to do it on kernel entry, but we elected to do
it on context switch instead since *that's* when the imbalances show up
in the RSB.)

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
  2018-01-26 17:19       ` Linus Torvalds
@ 2018-01-26 17:27         ` Borislav Petkov
  -1 siblings, 0 replies; 120+ messages in thread
From: Borislav Petkov @ 2018-01-26 17:27 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Woodhouse, Dave Hansen, Liran Alon, Laura Abbott,
	Andrew Lutomirski, Janakarajan Natarajan, Mallick, Asit K,
	Radim Krčmář,
	KarimAllah Ahmed, Peter Anvin, Jun Nakajima, Ingo Molnar,
	the arch/x86 maintainers, Ashok Raj, Van De Ven, Arjan, Tim Chen,
	Paolo Bonzini, Andi Kleen, Linux Kernel Mailing List,
	Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman,
	Masami Hiramatsu, Arjan van de Ven, Tom Lendacky, Dan Williams,
	Joerg Roedel, Andrea Arcangeli, KVM list

On Fri, Jan 26, 2018 at 09:19:09AM -0800, Linus Torvalds wrote:
> But did we do that "disable stuffing with SMEP"? I'm not seeing it. In
> my tree, it's only conditional on X86_FEATURE_RETPOLINE.

Or rather, enable stuffing on !SMEP:

+       if ((!boot_cpu_has(X86_FEATURE_PTI) &&
+            !boot_cpu_has(X86_FEATURE_SMEP)) || is_skylake_era()) {
+               setup_force_cpu_cap(X86_FEATURE_RSB_CTXSW);
+               pr_info("Filling RSB on context switch\n");
+       }

Should be

c995efd5a740 ("x86/retpoline: Fill RSB on context switch for affected CPUs")

in your tree.

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
@ 2018-01-26 17:27         ` Borislav Petkov
  0 siblings, 0 replies; 120+ messages in thread
From: Borislav Petkov @ 2018-01-26 17:27 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Woodhouse, Dave Hansen, Liran Alon, Laura Abbott,
	Andrew Lutomirski, Janakarajan Natarajan, Mallick, Asit K,
	Radim Krčmář,
	KarimAllah Ahmed, Peter Anvin, Jun Nakajima, Ingo Molnar,
	the arch/x86 maintainers, Ashok Raj, Van De Ven, Arjan, Tim Chen,
	Paolo Bonzini, Andi Kleen

On Fri, Jan 26, 2018 at 09:19:09AM -0800, Linus Torvalds wrote:
> But did we do that "disable stuffing with SMEP"? I'm not seeing it. In
> my tree, it's only conditional on X86_FEATURE_RETPOLINE.

Or rather, enable stuffing on !SMEP:

+       if ((!boot_cpu_has(X86_FEATURE_PTI) &&
+            !boot_cpu_has(X86_FEATURE_SMEP)) || is_skylake_era()) {
+               setup_force_cpu_cap(X86_FEATURE_RSB_CTXSW);
+               pr_info("Filling RSB on context switch\n");
+       }

Should be

c995efd5a740 ("x86/retpoline: Fill RSB on context switch for affected CPUs")

in your tree.

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
  2018-01-26  9:11     ` David Woodhouse
@ 2018-01-26 17:19       ` Linus Torvalds
  -1 siblings, 0 replies; 120+ messages in thread
From: Linus Torvalds @ 2018-01-26 17:19 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Dave Hansen, Liran Alon, Laura Abbott, Andrew Lutomirski,
	Janakarajan Natarajan, Borislav Petkov, Mallick, Asit K,
	Radim Krčmář,
	KarimAllah Ahmed, Peter Anvin, Jun Nakajima, Ingo Molnar,
	the arch/x86 maintainers, Ashok Raj, Van De Ven, Arjan, Tim Chen,
	Paolo Bonzini, Andi Kleen, Linux Kernel Mailing List,
	Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman,
	Masami Hiramatsu, Arjan van de Ven, Tom Lendacky, Dan Williams,
	Joerg Roedel, Andrea Arcangeli, KVM list

On Fri, Jan 26, 2018 at 1:11 AM, David Woodhouse <dwmw2@infradead.org> wrote:
>
> Do we need to look again at the fact that we've disabled the RSB-
> stuffing for SMEP?

Absolutely. SMEP helps make people a lot less worried about things,
but it doesn't fix the "BTB only contains partial addresses" case.

But did we do that "disable stuffing with SMEP"? I'm not seeing it. In
my tree, it's only conditional on X86_FEATURE_RETPOLINE.

                   Linus

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
@ 2018-01-26 17:19       ` Linus Torvalds
  0 siblings, 0 replies; 120+ messages in thread
From: Linus Torvalds @ 2018-01-26 17:19 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Dave Hansen, Liran Alon, Laura Abbott, Andrew Lutomirski,
	Janakarajan Natarajan, Borislav Petkov, Mallick, Asit K,
	Radim Krčmář,
	KarimAllah Ahmed, Peter Anvin, Jun Nakajima, Ingo Molnar,
	the arch/x86 maintainers, Ashok Raj, Van De Ven, Arjan, Tim Chen,
	Paolo Bonzini, Andi Kleen, Linux Kernel Mailing List

On Fri, Jan 26, 2018 at 1:11 AM, David Woodhouse <dwmw2@infradead.org> wrote:
>
> Do we need to look again at the fact that we've disabled the RSB-
> stuffing for SMEP?

Absolutely. SMEP helps make people a lot less worried about things,
but it doesn't fix the "BTB only contains partial addresses" case.

But did we do that "disable stuffing with SMEP"? I'm not seeing it. In
my tree, it's only conditional on X86_FEATURE_RETPOLINE.

                   Linus

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
  2018-01-26  2:23 ` Dave Hansen
@ 2018-01-26  9:11     ` David Woodhouse
  0 siblings, 0 replies; 120+ messages in thread
From: David Woodhouse @ 2018-01-26  9:11 UTC (permalink / raw)
  To: Dave Hansen, Liran Alon
  Cc: labbott, luto, Janakarajan.Natarajan, bp, torvalds,
	asit.k.mallick, rkrcmar, karahmed, hpa, jun.nakajima, mingo, x86,
	ashok.raj, arjan.van.de.ven, tim.c.chen, pbonzini, ak,
	linux-kernel, peterz, tglx, gregkh, mhiramat, arjan,
	thomas.lendacky, dan.j.williams, joro, aarcange, kvm

[-- Attachment #1: Type: text/plain, Size: 1029 bytes --]

On Thu, 2018-01-25 at 18:23 -0800, Dave Hansen wrote:
> On 01/25/2018 06:11 PM, Liran Alon wrote:
> > 
> > It is true that attacker cannot speculate to a kernel-address, but it
> > doesn't mean it cannot use the leaked kernel-address together with
> > another unrelated vulnerability to build a reliable exploit.
>
> The address doesn't leak if you can't execute there.  It's the same
> reason that we don't worry about speculation to user addresses from the
> kernel when SMEP is in play.

If both tags and target in the BTB are only 31 bits, then surely a
user-learned prediction of a branch from

  0x01234567 → 0x07654321

would be equivalent to a kernel-mode branch from

 0xffffffff81234567 → 0xffffffff87654321

... and interpreted in kernel mode as the latter? So I'm not sure why
SMEP saves us there?

Likewise if the RSB only stores the low 31 bits of the target, SMEP
isn't much help there either.

Do we need to look again at the fact that we've disabled the RSB-
stuffing for SMEP?

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
@ 2018-01-26  9:11     ` David Woodhouse
  0 siblings, 0 replies; 120+ messages in thread
From: David Woodhouse @ 2018-01-26  9:11 UTC (permalink / raw)
  To: Dave Hansen, Liran Alon
  Cc: labbott, luto, Janakarajan.Natarajan, bp, torvalds,
	asit.k.mallick, rkrcmar, karahmed, hpa, jun.nakajima, mingo, x86,
	ashok.raj, arjan.van.de.ven, tim.c.chen, pbonzini, ak,
	linux-kernel, peterz, tglx, gregkh, mhiramat, arjan,
	thomas.lendacky, dan.j.williams, joro, aarcange, kvm

[-- Attachment #1: Type: text/plain, Size: 1029 bytes --]

On Thu, 2018-01-25 at 18:23 -0800, Dave Hansen wrote:
> On 01/25/2018 06:11 PM, Liran Alon wrote:
> > 
> > It is true that attacker cannot speculate to a kernel-address, but it
> > doesn't mean it cannot use the leaked kernel-address together with
> > another unrelated vulnerability to build a reliable exploit.
>
> The address doesn't leak if you can't execute there.  It's the same
> reason that we don't worry about speculation to user addresses from the
> kernel when SMEP is in play.

If both tags and target in the BTB are only 31 bits, then surely a
user-learned prediction of a branch from

  0x01234567 → 0x07654321

would be equivalent to a kernel-mode branch from

 0xffffffff81234567 → 0xffffffff87654321

... and interpreted in kernel mode as the latter? So I'm not sure why
SMEP saves us there?

Likewise if the RSB only stores the low 31 bits of the target, SMEP
isn't much help there either.

Do we need to look again at the fact that we've disabled the RSB-
stuffing for SMEP?

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
  2018-01-26  2:11 Liran Alon
@ 2018-01-26  8:46   ` David Woodhouse
  2018-01-26  8:46   ` David Woodhouse
  1 sibling, 0 replies; 120+ messages in thread
From: David Woodhouse @ 2018-01-26  8:46 UTC (permalink / raw)
  To: Liran Alon, dave.hansen
  Cc: labbott, luto, Janakarajan.Natarajan, bp, torvalds,
	asit.k.mallick, rkrcmar, karahmed, hpa, jun.nakajima, mingo, x86,
	ashok.raj, arjan.van.de.ven, tim.c.chen, pbonzini, ak,
	linux-kernel, peterz, tglx, gregkh, mhiramat, arjan,
	thomas.lendacky, dan.j.williams, joro, aarcange, kvm

[-- Attachment #1: Type: text/plain, Size: 1180 bytes --]

On Thu, 2018-01-25 at 18:11 -0800, Liran Alon wrote:
> 
> P.S:
> It seems to me that all these issues could be resolved completely at
> hardware in future CPUs if BTB/BHB/RSB entries were tagged with
> prediction-mode (or similar metadata). It will be nice if Intel/AMD
> could share if that is the planned long-term solution instead of
> IBRS-all-the-time.

IBRS-all-the-time is tagging with the ring and VMX root/non-root mode,
it seems. That much they could slip into the upcoming generation of
CPUs. And it's supposed to be fast¹; none of the dirty hacks in
microcode that they needed to implement the first-generation IBRS.

But we still need to tag with ASID/VMID and do proper flushing for
those, before we can completely ditch the need to do IBPB at the right
times.

Reading between the lines, I don't think they could add *that* without
stopping the fabs for a year or so while they go back to the drawing
board. But yes, I sincerely hope they *are* planning to do it, and
expose a 'SPECTRE_NO' bit in IA32_ARCH_CAPABILITIES, as soon as is
humanly possible.

¹ Fast enough that we'll want to use it and ALTERNATIVE out the 
  retpolines.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
@ 2018-01-26  8:46   ` David Woodhouse
  0 siblings, 0 replies; 120+ messages in thread
From: David Woodhouse @ 2018-01-26  8:46 UTC (permalink / raw)
  To: Liran Alon, dave.hansen
  Cc: labbott, luto, Janakarajan.Natarajan, bp, torvalds,
	asit.k.mallick, rkrcmar, karahmed, hpa, jun.nakajima, mingo, x86,
	ashok.raj, arjan.van.de.ven, tim.c.chen, pbonzini, ak,
	linux-kernel, peterz, tglx, gregkh, mhiramat, arjan,
	thomas.lendacky, dan.j.williams, joro, aarcange, kvm

[-- Attachment #1: Type: text/plain, Size: 1180 bytes --]

On Thu, 2018-01-25 at 18:11 -0800, Liran Alon wrote:
> 
> P.S:
> It seems to me that all these issues could be resolved completely at
> hardware in future CPUs if BTB/BHB/RSB entries were tagged with
> prediction-mode (or similar metadata). It will be nice if Intel/AMD
> could share if that is the planned long-term solution instead of
> IBRS-all-the-time.

IBRS-all-the-time is tagging with the ring and VMX root/non-root mode,
it seems. That much they could slip into the upcoming generation of
CPUs. And it's supposed to be fast¹; none of the dirty hacks in
microcode that they needed to implement the first-generation IBRS.

But we still need to tag with ASID/VMID and do proper flushing for
those, before we can completely ditch the need to do IBPB at the right
times.

Reading between the lines, I don't think they could add *that* without
stopping the fabs for a year or so while they go back to the drawing
board. But yes, I sincerely hope they *are* planning to do it, and
expose a 'SPECTRE_NO' bit in IA32_ARCH_CAPABILITIES, as soon as is
humanly possible.

¹ Fast enough that we'll want to use it and ALTERNATIVE out the 
  retpolines.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
  2018-01-26  2:11 Liran Alon
@ 2018-01-26  2:23 ` Dave Hansen
  2018-01-26  9:11     ` David Woodhouse
  2018-01-26  8:46   ` David Woodhouse
  1 sibling, 1 reply; 120+ messages in thread
From: Dave Hansen @ 2018-01-26  2:23 UTC (permalink / raw)
  To: Liran Alon
  Cc: labbott, luto, Janakarajan.Natarajan, bp, torvalds,
	asit.k.mallick, rkrcmar, karahmed, hpa, jun.nakajima, mingo, x86,
	ashok.raj, arjan.van.de.ven, tim.c.chen, pbonzini, ak,
	linux-kernel, dwmw2, peterz, tglx, gregkh, mhiramat, arjan,
	thomas.lendacky, dan.j.williams, joro, aarcange, kvm

On 01/25/2018 06:11 PM, Liran Alon wrote:
> It is true that attacker cannot speculate to a kernel-address, but it
> doesn't mean it cannot use the leaked kernel-address together with
> another unrelated vulnerability to build a reliable exploit.

The address doesn't leak if you can't execute there.  It's the same
reason that we don't worry about speculation to user addresses from the
kernel when SMEP is in play.

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
@ 2018-01-26  2:11 Liran Alon
  2018-01-26  2:23 ` Dave Hansen
  2018-01-26  8:46   ` David Woodhouse
  0 siblings, 2 replies; 120+ messages in thread
From: Liran Alon @ 2018-01-26  2:11 UTC (permalink / raw)
  To: dave.hansen
  Cc: labbott, luto, Janakarajan.Natarajan, bp, torvalds,
	asit.k.mallick, rkrcmar, karahmed, hpa, jun.nakajima, mingo, x86,
	ashok.raj, arjan.van.de.ven, tim.c.chen, pbonzini, ak,
	linux-kernel, dwmw2, peterz, tglx, gregkh, mhiramat, arjan,
	thomas.lendacky, dan.j.williams, joro, aarcange, kvm

----- dave.hansen@intel.com wrote:

> On 01/23/2018 03:13 AM, Liran Alon wrote:
> > Therefore, breaking KASLR. In order to handle this, every exit from
> > kernel-mode to user-mode should stuff RSB. In addition, this
> stuffing
> > of RSB may need to be done from a fixed address to avoid leaking
> the
> > address of the RSB stuffing itself.
> 
> With PTI alone in place, I don't see how userspace could do anything
> with this information.  Even if userspace started to speculate to a
> kernel address, there is nothing at the kernel address to execute: no
> TLB entry, no PTE to load, nothing.
> 
> You probably have a valid point about host->guest, though.

I see it differently.

It is true that attacker cannot speculate to a kernel-address, but it doesn't mean it cannot use the leaked kernel-address together with another unrelated vulnerability to build a reliable exploit.

Security is built in layers.
The purpose of KASLR is to break the reliablity of an exploit which relies on vulnerability primitives such as: memory-corruption of a kernel-address, hijack kernel control-flow to a kernel-address or even just read a kernel-address. In modern exploitation, it is common to chain multiple different vulnerabilities in order to build a reliable exploit. Therefore, leaking a kernel-address could be exactly the missing primitive to complete a vulnerability-chain of a reliable exploit.

I don't see a big difference between leaking a kernel-address from user-mode vs. leaking a hypervisor-address from guest. They are both useful just as a primitive which is part of an exploit chain.

One could argue though, that currently KASLR is fundementally broken and therefore should not be considered a security boundary anymore. This argument could be legit as there were some well-known techniques that could break KASLR before KPTI patch-set was introduced (e.g. Timing memory accesses to kernel-addresses and messure reliably by leveraging TSX). Another well-known argument against KASLR is that it is a non-deterministic mitigation which some argue is not good enough. However, I think that if we decide KASLR is not a security boundary anymore, it should be made loud and clear.

In general, I think there are some info-leak vulnerabilities in our current mitigation plan which doesn't seem to be addressed. I will be glad if we could address them clearly. These are all the open issues as I see them:

1) Because IBRS doesn't restrict low prediction-mode code from using BTB of high prediction-mode code, It is possible to info-leak addresses from high prediction-mode code to low prediciton-mode code.
This is the KASLR breakage discussed above. Again, could be ignored if we discard KASLR as a security boundary.

2) Both IBRS & retpoline don't prevent use of BHB of high prediction-mode code from being used by low prediction-mode code. Therefore, low prediction-mode code could deduce the conditional branches taken by high prediction-mode code.

3) Similar leak to (1) exists from the fact that RSB entries of high prediction-mode code could be leaked by low prediction-mode code which may reveal kernel-addresses. Again, we could decide that this isn't a security boundary. An alternative to solve this could be to just stuff RSB from a fixed address between prediction-mode transitions.

-Liran

P.S:
It seems to me that all these issues could be resolved completely at hardware in future CPUs if BTB/BHB/RSB entries were tagged with prediction-mode (or similar metadata). It will be nice if Intel/AMD could share if that is the planned long-term solution instead of IBRS-all-the-time.

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
  2018-01-23 11:13 Liran Alon
@ 2018-01-25 22:20 ` Dave Hansen
  0 siblings, 0 replies; 120+ messages in thread
From: Dave Hansen @ 2018-01-25 22:20 UTC (permalink / raw)
  To: Liran Alon, dwmw2
  Cc: labbott, luto, Janakarajan.Natarajan, torvalds, bp,
	asit.k.mallick, rkrcmar, karahmed, hpa, mingo, jun.nakajima, x86,
	ashok.raj, arjan.van.de.ven, tim.c.chen, pbonzini, ak,
	linux-kernel, peterz, tglx, gregkh, mhiramat, arjan,
	thomas.lendacky, dan.j.williams, joro, kvm, aarcange

On 01/23/2018 03:13 AM, Liran Alon wrote:
> Therefore, breaking KASLR. In order to handle this, every exit from
> kernel-mode to user-mode should stuff RSB. In addition, this stuffing
> of RSB may need to be done from a fixed address to avoid leaking the
> address of the RSB stuffing itself.

With PTI alone in place, I don't see how userspace could do anything
with this information.  Even if userspace started to speculate to a
kernel address, there is nothing at the kernel address to execute: no
TLB entry, no PTE to load, nothing.

You probably have a valid point about host->guest, though.

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
  2018-01-25 16:19                       ` Mason
@ 2018-01-25 17:16                         ` Greg Kroah-Hartman
  -1 siblings, 0 replies; 120+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-25 17:16 UTC (permalink / raw)
  To: Mason
  Cc: Linux ARM, David Woodhouse, Ingo Molnar, Linus Torvalds,
	KarimAllah Ahmed, Andi Kleen, Andrea Arcangeli, Andy Lutomirski,
	Arjan van de Ven, Ashok Raj, Asit Mallick, Borislav Petkov,
	Dan Williams, Dave Hansen, H . Peter Anvin, Ingo Molnar,
	Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott,
	LKML

On Thu, Jan 25, 2018 at 05:19:04PM +0100, Mason wrote:
> On 23/01/2018 10:30, David Woodhouse wrote:
> 
> > Skylake takes predictions from the generic branch target buffer when
> > the RSB underflows.
> 
> Adding LAKML.
> 
> AFAIU, some ARM Cortex cores have the same optimization.
> (A9 maybe, A17 probably, some recent 64-bit cores)
> 
> Are there software work-arounds for Spectre planned for arm32 and arm64?

Yes, I think they are currently burried in one of the arm64 trees, and
they have been posted to the mailing list a few times in the past.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
@ 2018-01-25 17:16                         ` Greg Kroah-Hartman
  0 siblings, 0 replies; 120+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-25 17:16 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jan 25, 2018 at 05:19:04PM +0100, Mason wrote:
> On 23/01/2018 10:30, David Woodhouse wrote:
> 
> > Skylake takes predictions from the generic branch target buffer when
> > the RSB underflows.
> 
> Adding LAKML.
> 
> AFAIU, some ARM Cortex cores have the same optimization.
> (A9 maybe, A17 probably, some recent 64-bit cores)
> 
> Are there software work-arounds for Spectre planned for arm32 and arm64?

Yes, I think they are currently burried in one of the arm64 trees, and
they have been posted to the mailing list a few times in the past.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
  2018-01-23  9:30                     ` David Woodhouse
@ 2018-01-25 16:19                       ` Mason
  -1 siblings, 0 replies; 120+ messages in thread
From: Mason @ 2018-01-25 16:19 UTC (permalink / raw)
  To: Linux ARM
  Cc: David Woodhouse, Ingo Molnar, Linus Torvalds, KarimAllah Ahmed,
	Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Arjan van de Ven,
	Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams,
	Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar,
	Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott,
	LKML

On 23/01/2018 10:30, David Woodhouse wrote:

> Skylake takes predictions from the generic branch target buffer when
> the RSB underflows.

Adding LAKML.

AFAIU, some ARM Cortex cores have the same optimization.
(A9 maybe, A17 probably, some recent 64-bit cores)

Are there software work-arounds for Spectre planned for arm32 and arm64?

Regards.

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
@ 2018-01-25 16:19                       ` Mason
  0 siblings, 0 replies; 120+ messages in thread
From: Mason @ 2018-01-25 16:19 UTC (permalink / raw)
  To: linux-arm-kernel

On 23/01/2018 10:30, David Woodhouse wrote:

> Skylake takes predictions from the generic branch target buffer when
> the RSB underflows.

Adding LAKML.

AFAIU, some ARM Cortex cores have the same optimization.
(A9 maybe, A17 probably, some recent 64-bit cores)

Are there software work-arounds for Spectre planned for arm32 and arm64?

Regards.

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
  2018-01-24  1:59                     ` Van De Ven, Arjan
@ 2018-01-24  3:25                       ` Andy Lutomirski
  -1 siblings, 0 replies; 120+ messages in thread
From: Andy Lutomirski @ 2018-01-24  3:25 UTC (permalink / raw)
  To: Van De Ven, Arjan
  Cc: Andy Lutomirski, Tim Chen, Woodhouse, David, Andi Kleen,
	Tom Lendacky, KarimAllah Ahmed, LKML, Andrea Arcangeli,
	Arjan van de Ven, Raj, Ashok, Mallick, Asit K, Borislav Petkov,
	Williams, Dan J, Hansen, Dave, Greg Kroah-Hartman,
	H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan,
	Joerg Roedel, Nakajima, Jun, Laura Abbott, Linus Torvalds,
	Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra,
	Radim Krcmár, Thomas Gleixner, kvm list, X86 ML



> On Jan 23, 2018, at 5:59 PM, Van De Ven, Arjan <arjan.van.de.ven@intel.com> wrote:
> 
> 
>>> It is a reasonable approach.  Let a process who needs max security
>>> opt in with disabled dumpable. It can have a flush with IBPB clear before
>>> starting to run, and have STIBP set while running.
>>> 
>> 
>> Do we maybe want a separate opt in?  I can easily imagine things like
>> web browsers that *don't* want to be non-dumpable but do want this
>> opt-in.
> 
> eventually we need something better. Probably in addition.
> dumpable is used today for things that want this.
> 
>> 
>> Also, what's the performance hit of STIBP?
> 
> pretty steep, but it depends on the CPU generation, for some it's cheaper than others. (yes I realize this is a vague answer, but the range is really from just about zero to oh my god)
> 
> I'm not a fan of doing this right now to be honest. We really need to not piece meal some of this, and come up with a better concept of protection on a higher level.
> For example, you mention web browsers, but the threat model for browsers is generally internet content. For V2 to work you need to get some "evil pointer" into the app from the observer and browsers usually aren't doing that.
> The most likely user would be some software-TPM-like service that has magic keys.
> 
> And for keys we want something else... we want an madvice() sort of thing that does a few things, like equivalent of mlock (so the key does not end up in swap),

I'd love to see a slight variant: encrypt that page against some ephemeral key if it gets swapped.

> not having the page (but potentially the rest) end up in core dumps, and the kernel making sure that if the program exits (say for segv) that the key page gets zeroed before going into the free pool. Once you do that as feature, making the key speculation safe is not too hard (intel and arm have cpu options to mark pages for that)
> 
> 

How do we do that on Intel?  Make it UC?

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
@ 2018-01-24  3:25                       ` Andy Lutomirski
  0 siblings, 0 replies; 120+ messages in thread
From: Andy Lutomirski @ 2018-01-24  3:25 UTC (permalink / raw)
  To: Van De Ven, Arjan
  Cc: Andy Lutomirski, Tim Chen, Woodhouse, David, Andi Kleen,
	Tom Lendacky, KarimAllah Ahmed, LKML, Andrea Arcangeli,
	Arjan van de Ven, Raj, Ashok, Mallick, Asit K, Borislav Petkov,
	Williams, Dan J, Hansen, Dave, Greg Kroah-Hartman,
	H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan,
	Joerg Roedel,



> On Jan 23, 2018, at 5:59 PM, Van De Ven, Arjan <arjan.van.de.ven@intel.com> wrote:
> 
> 
>>> It is a reasonable approach.  Let a process who needs max security
>>> opt in with disabled dumpable. It can have a flush with IBPB clear before
>>> starting to run, and have STIBP set while running.
>>> 
>> 
>> Do we maybe want a separate opt in?  I can easily imagine things like
>> web browsers that *don't* want to be non-dumpable but do want this
>> opt-in.
> 
> eventually we need something better. Probably in addition.
> dumpable is used today for things that want this.
> 
>> 
>> Also, what's the performance hit of STIBP?
> 
> pretty steep, but it depends on the CPU generation, for some it's cheaper than others. (yes I realize this is a vague answer, but the range is really from just about zero to oh my god)
> 
> I'm not a fan of doing this right now to be honest. We really need to not piece meal some of this, and come up with a better concept of protection on a higher level.
> For example, you mention web browsers, but the threat model for browsers is generally internet content. For V2 to work you need to get some "evil pointer" into the app from the observer and browsers usually aren't doing that.
> The most likely user would be some software-TPM-like service that has magic keys.
> 
> And for keys we want something else... we want an madvice() sort of thing that does a few things, like equivalent of mlock (so the key does not end up in swap),

I'd love to see a slight variant: encrypt that page against some ephemeral key if it gets swapped.

> not having the page (but potentially the rest) end up in core dumps, and the kernel making sure that if the program exits (say for segv) that the key page gets zeroed before going into the free pool. Once you do that as feature, making the key speculation safe is not too hard (intel and arm have cpu options to mark pages for that)
> 
> 

How do we do that on Intel?  Make it UC?

^ permalink raw reply	[flat|nested] 120+ messages in thread

* RE: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
  2018-01-24  1:00                   ` Andy Lutomirski
@ 2018-01-24  1:59                     ` Van De Ven, Arjan
  -1 siblings, 0 replies; 120+ messages in thread
From: Van De Ven, Arjan @ 2018-01-24  1:59 UTC (permalink / raw)
  To: Andy Lutomirski, Tim Chen
  Cc: Woodhouse, David, Andi Kleen, Tom Lendacky, KarimAllah Ahmed,
	LKML, Andrea Arcangeli, Arjan van de Ven, Raj, Ashok, Mallick,
	Asit K, Borislav Petkov, Williams, Dan J, Hansen, Dave,
	Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar,
	Janakarajan Natarajan, Joerg Roedel, Nakajima, Jun, Laura Abbott,
	Linus Torvalds, Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra,
	Radim Krcmár, Thomas Gleixner, kvm list, X86 ML

> > It is a reasonable approach.  Let a process who needs max security
> > opt in with disabled dumpable. It can have a flush with IBPB clear before
> > starting to run, and have STIBP set while running.
> >
> 
> Do we maybe want a separate opt in?  I can easily imagine things like
> web browsers that *don't* want to be non-dumpable but do want this
> opt-in.

eventually we need something better. Probably in addition.
dumpable is used today for things that want this.

> 
> Also, what's the performance hit of STIBP?

pretty steep, but it depends on the CPU generation, for some it's cheaper than others. (yes I realize this is a vague answer, but the range is really from just about zero to oh my god)

I'm not a fan of doing this right now to be honest. We really need to not piece meal some of this, and come up with a better concept of protection on a higher level.
For example, you mention web browsers, but the threat model for browsers is generally internet content. For V2 to work you need to get some "evil pointer" into the app from the observer and browsers usually aren't doing that.
The most likely user would be some software-TPM-like service that has magic keys.

And for keys we want something else... we want an madvice() sort of thing that does a few things, like equivalent of mlock (so the key does not end up in swap), not having the page (but potentially the rest) end up in core dumps, and the kernel making sure that if the program exits (say for segv) that the key page gets zeroed before going into the free pool. Once you do that as feature, making the key speculation safe is not too hard (intel and arm have cpu options to mark pages for that)

^ permalink raw reply	[flat|nested] 120+ messages in thread

* RE: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
@ 2018-01-24  1:59                     ` Van De Ven, Arjan
  0 siblings, 0 replies; 120+ messages in thread
From: Van De Ven, Arjan @ 2018-01-24  1:59 UTC (permalink / raw)
  To: Andy Lutomirski, Tim Chen
  Cc: Woodhouse, David, Andi Kleen, Tom Lendacky, KarimAllah Ahmed,
	LKML, Andrea Arcangeli, Arjan van de Ven, Raj, Ashok, Mallick,
	Asit K, Borislav Petkov, Williams, Dan J, Hansen, Dave,
	Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar,
	Janakarajan Natarajan, Joerg Roedel, Nakajima, Jun, Laura Abbott,
	Lin

> > It is a reasonable approach.  Let a process who needs max security
> > opt in with disabled dumpable. It can have a flush with IBPB clear before
> > starting to run, and have STIBP set while running.
> >
> 
> Do we maybe want a separate opt in?  I can easily imagine things like
> web browsers that *don't* want to be non-dumpable but do want this
> opt-in.

eventually we need something better. Probably in addition.
dumpable is used today for things that want this.

> 
> Also, what's the performance hit of STIBP?

pretty steep, but it depends on the CPU generation, for some it's cheaper than others. (yes I realize this is a vague answer, but the range is really from just about zero to oh my god)

I'm not a fan of doing this right now to be honest. We really need to not piece meal some of this, and come up with a better concept of protection on a higher level.
For example, you mention web browsers, but the threat model for browsers is generally internet content. For V2 to work you need to get some "evil pointer" into the app from the observer and browsers usually aren't doing that.
The most likely user would be some software-TPM-like service that has magic keys.

And for keys we want something else... we want an madvice() sort of thing that does a few things, like equivalent of mlock (so the key does not end up in swap), not having the page (but potentially the rest) end up in core dumps, and the kernel making sure that if the program exits (say for segv) that the key page gets zeroed before going into the free pool. Once you do that as feature, making the key speculation safe is not too hard (intel and arm have cpu options to mark pages for that)

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
  2018-01-24  1:00                   ` Andy Lutomirski
@ 2018-01-24  1:22                     ` David Woodhouse
  -1 siblings, 0 replies; 120+ messages in thread
From: David Woodhouse @ 2018-01-24  1:22 UTC (permalink / raw)
  To: Andy Lutomirski, Tim Chen
  Cc: Andi Kleen, Tom Lendacky, KarimAllah Ahmed, LKML,
	Andrea Arcangeli, Arjan van de Ven, Ashok Raj, Asit Mallick,
	Borislav Petkov, Dan Williams, Dave Hansen, Greg Kroah-Hartman,
	H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan,
	Joerg Roedel, Jun Nakajima, Laura Abbott, Linus Torvalds,
	Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra,
	Radim Krčmář,
	Thomas Gleixner, kvm list, X86 ML, Arjan Van De Ven

[-- Attachment #1: Type: text/plain, Size: 1823 bytes --]

On Tue, 2018-01-23 at 17:00 -0800, Andy Lutomirski wrote:
> On Tue, Jan 23, 2018 at 4:47 PM, Tim Chen <tim.c.chen@linux.intel.com> wrote:
> > 
> > On 01/23/2018 03:14 PM, Woodhouse, David wrote:
> > > 
> > > On Tue, 2018-01-23 at 14:49 -0800, Andi Kleen wrote:
> > > > 
> > > > > 
> > > > > Not sure.  Maybe to start, the answer might be to allow it to be set for
> > > > > the ultra-paranoid, but in general don't enable it by default.  Having it
> > > > > enabled would be an alternative to someone deciding to disable SMT, since
> > > > > that would have even more of a performance impact.
> > > > I agree. A reasonable strategy would be to only enable it for
> > > > processes that have dumpable disabled. This should be already set for
> > > > high value processes like GPG, and allows others to opt-in if
> > > > they need to.
> > > That seems to make sense, and I think was the solution we were
> > > approaching for IBPB on context switch too, right?
> > > 
> > > Are we generally agreed on dumpable as the criterion for both of those?
> > > 
> > It is a reasonable approach.  Let a process who needs max security
> > opt in with disabled dumpable. It can have a flush with IBPB clear before
> > starting to run, and have STIBP set while running.
> > 
> Do we maybe want a separate opt in?  I can easily imagine things like
> web browsers that *don't* want to be non-dumpable but do want this
> opt-in.
 
This is to protect you from another local process running on a HT
sibling. Not the kind of thing that web browsers are normally worrying
about.

> Also, what's the performance hit of STIBP?

Varies per CPU generation, but generally approaching that of full IBRS
I think? I don't recall looking at this specifically (since we haven't
actually used it for this yet).

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
@ 2018-01-24  1:22                     ` David Woodhouse
  0 siblings, 0 replies; 120+ messages in thread
From: David Woodhouse @ 2018-01-24  1:22 UTC (permalink / raw)
  To: Andy Lutomirski, Tim Chen
  Cc: Andi Kleen, Tom Lendacky, KarimAllah Ahmed, LKML,
	Andrea Arcangeli, Arjan van de Ven, Ashok Raj, Asit Mallick,
	Borislav Petkov, Dan Williams, Dave Hansen, Greg Kroah-Hartman,
	H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan,
	Joerg Roedel, Jun Nakajima, Laura Abbott, Linus Torvalds, Masami

[-- Attachment #1: Type: text/plain, Size: 1823 bytes --]

On Tue, 2018-01-23 at 17:00 -0800, Andy Lutomirski wrote:
> On Tue, Jan 23, 2018 at 4:47 PM, Tim Chen <tim.c.chen@linux.intel.com> wrote:
> > 
> > On 01/23/2018 03:14 PM, Woodhouse, David wrote:
> > > 
> > > On Tue, 2018-01-23 at 14:49 -0800, Andi Kleen wrote:
> > > > 
> > > > > 
> > > > > Not sure.  Maybe to start, the answer might be to allow it to be set for
> > > > > the ultra-paranoid, but in general don't enable it by default.  Having it
> > > > > enabled would be an alternative to someone deciding to disable SMT, since
> > > > > that would have even more of a performance impact.
> > > > I agree. A reasonable strategy would be to only enable it for
> > > > processes that have dumpable disabled. This should be already set for
> > > > high value processes like GPG, and allows others to opt-in if
> > > > they need to.
> > > That seems to make sense, and I think was the solution we were
> > > approaching for IBPB on context switch too, right?
> > > 
> > > Are we generally agreed on dumpable as the criterion for both of those?
> > > 
> > It is a reasonable approach.  Let a process who needs max security
> > opt in with disabled dumpable. It can have a flush with IBPB clear before
> > starting to run, and have STIBP set while running.
> > 
> Do we maybe want a separate opt in?  I can easily imagine things like
> web browsers that *don't* want to be non-dumpable but do want this
> opt-in.
 
This is to protect you from another local process running on a HT
sibling. Not the kind of thing that web browsers are normally worrying
about.

> Also, what's the performance hit of STIBP?

Varies per CPU generation, but generally approaching that of full IBRS
I think? I don't recall looking at this specifically (since we haven't
actually used it for this yet).

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
  2018-01-24  0:47                 ` Tim Chen
@ 2018-01-24  1:00                   ` Andy Lutomirski
  -1 siblings, 0 replies; 120+ messages in thread
From: Andy Lutomirski @ 2018-01-24  1:00 UTC (permalink / raw)
  To: Tim Chen
  Cc: Woodhouse, David, Andi Kleen, Tom Lendacky, KarimAllah Ahmed,
	LKML, Andrea Arcangeli, Andy Lutomirski, Arjan van de Ven,
	Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams,
	Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar,
	Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott,
	Linus Torvalds, Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra,
	Radim Krčmář,
	Thomas Gleixner, kvm list, X86 ML, Arjan Van De Ven

On Tue, Jan 23, 2018 at 4:47 PM, Tim Chen <tim.c.chen@linux.intel.com> wrote:
> On 01/23/2018 03:14 PM, Woodhouse, David wrote:
>> On Tue, 2018-01-23 at 14:49 -0800, Andi Kleen wrote:
>>>> Not sure.  Maybe to start, the answer might be to allow it to be set for
>>>> the ultra-paranoid, but in general don't enable it by default.  Having it
>>>> enabled would be an alternative to someone deciding to disable SMT, since
>>>> that would have even more of a performance impact.
>>>
>>> I agree. A reasonable strategy would be to only enable it for
>>> processes that have dumpable disabled. This should be already set for
>>> high value processes like GPG, and allows others to opt-in if
>>> they need to.
>>
>> That seems to make sense, and I think was the solution we were
>> approaching for IBPB on context switch too, right?
>>
>> Are we generally agreed on dumpable as the criterion for both of those?
>>
>
> It is a reasonable approach.  Let a process who needs max security
> opt in with disabled dumpable. It can have a flush with IBPB clear before
> starting to run, and have STIBP set while running.
>

Do we maybe want a separate opt in?  I can easily imagine things like
web browsers that *don't* want to be non-dumpable but do want this
opt-in.

Also, what's the performance hit of STIBP?

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
@ 2018-01-24  1:00                   ` Andy Lutomirski
  0 siblings, 0 replies; 120+ messages in thread
From: Andy Lutomirski @ 2018-01-24  1:00 UTC (permalink / raw)
  To: Tim Chen
  Cc: Woodhouse, David, Andi Kleen, Tom Lendacky, KarimAllah Ahmed,
	LKML, Andrea Arcangeli, Andy Lutomirski, Arjan van de Ven,
	Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams,
	Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar,
	Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott

On Tue, Jan 23, 2018 at 4:47 PM, Tim Chen <tim.c.chen@linux.intel.com> wrote:
> On 01/23/2018 03:14 PM, Woodhouse, David wrote:
>> On Tue, 2018-01-23 at 14:49 -0800, Andi Kleen wrote:
>>>> Not sure.  Maybe to start, the answer might be to allow it to be set for
>>>> the ultra-paranoid, but in general don't enable it by default.  Having it
>>>> enabled would be an alternative to someone deciding to disable SMT, since
>>>> that would have even more of a performance impact.
>>>
>>> I agree. A reasonable strategy would be to only enable it for
>>> processes that have dumpable disabled. This should be already set for
>>> high value processes like GPG, and allows others to opt-in if
>>> they need to.
>>
>> That seems to make sense, and I think was the solution we were
>> approaching for IBPB on context switch too, right?
>>
>> Are we generally agreed on dumpable as the criterion for both of those?
>>
>
> It is a reasonable approach.  Let a process who needs max security
> opt in with disabled dumpable. It can have a flush with IBPB clear before
> starting to run, and have STIBP set while running.
>

Do we maybe want a separate opt in?  I can easily imagine things like
web browsers that *don't* want to be non-dumpable but do want this
opt-in.

Also, what's the performance hit of STIBP?

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
  2018-01-23 23:14               ` Woodhouse, David
@ 2018-01-24  0:47                 ` Tim Chen
  -1 siblings, 0 replies; 120+ messages in thread
From: Tim Chen @ 2018-01-24  0:47 UTC (permalink / raw)
  To: Woodhouse, David, Andi Kleen, Tom Lendacky
  Cc: Andy Lutomirski, KarimAllah Ahmed, linux-kernel,
	Andrea Arcangeli, Andy Lutomirski, Arjan van de Ven, Ashok Raj,
	Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen,
	Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar,
	Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott,
	Linus Torvalds, Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra,
	Radim Krčmář,
	Thomas Gleixner, kvm, x86, Arjan Van De Ven

On 01/23/2018 03:14 PM, Woodhouse, David wrote:
> On Tue, 2018-01-23 at 14:49 -0800, Andi Kleen wrote:
>>> Not sure.  Maybe to start, the answer might be to allow it to be set for
>>> the ultra-paranoid, but in general don't enable it by default.  Having it
>>> enabled would be an alternative to someone deciding to disable SMT, since
>>> that would have even more of a performance impact.
>>
>> I agree. A reasonable strategy would be to only enable it for
>> processes that have dumpable disabled. This should be already set for
>> high value processes like GPG, and allows others to opt-in if
>> they need to.
> 
> That seems to make sense, and I think was the solution we were
> approaching for IBPB on context switch too, right?
> 
> Are we generally agreed on dumpable as the criterion for both of those?
> 

It is a reasonable approach.  Let a process who needs max security
opt in with disabled dumpable. It can have a flush with IBPB clear before
starting to run, and have STIBP set while running.

Tim

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
@ 2018-01-24  0:47                 ` Tim Chen
  0 siblings, 0 replies; 120+ messages in thread
From: Tim Chen @ 2018-01-24  0:47 UTC (permalink / raw)
  To: Woodhouse, David, Andi Kleen, Tom Lendacky
  Cc: Andy Lutomirski, KarimAllah Ahmed, linux-kernel,
	Andrea Arcangeli, Andy Lutomirski, Arjan van de Ven, Ashok Raj,
	Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen,
	Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar,
	Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott,
	Linus Torvalds, Masami Hiramatsu

On 01/23/2018 03:14 PM, Woodhouse, David wrote:
> On Tue, 2018-01-23 at 14:49 -0800, Andi Kleen wrote:
>>> Not sure.  Maybe to start, the answer might be to allow it to be set for
>>> the ultra-paranoid, but in general don't enable it by default.  Having it
>>> enabled would be an alternative to someone deciding to disable SMT, since
>>> that would have even more of a performance impact.
>>
>> I agree. A reasonable strategy would be to only enable it for
>> processes that have dumpable disabled. This should be already set for
>> high value processes like GPG, and allows others to opt-in if
>> they need to.
> 
> That seems to make sense, and I think was the solution we were
> approaching for IBPB on context switch too, right?
> 
> Are we generally agreed on dumpable as the criterion for both of those?
> 

It is a reasonable approach.  Let a process who needs max security
opt in with disabled dumpable. It can have a flush with IBPB clear before
starting to run, and have STIBP set while running.

Tim

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
  2018-01-23  7:29                 ` Ingo Molnar
  (?)
  (?)
@ 2018-01-24  0:05                 ` Andi Kleen
  -1 siblings, 0 replies; 120+ messages in thread
From: Andi Kleen @ 2018-01-24  0:05 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: David Woodhouse, Linus Torvalds, KarimAllah Ahmed,
	Linux Kernel Mailing List, Andrea Arcangeli, Andy Lutomirski,
	Arjan van de Ven, Ashok Raj, Asit Mallick, Borislav Petkov,
	Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin,
	Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima,
	Laura Abbott

Ingo Molnar <mingo@kernel.org> writes:
>
> Is there any reason why this wouldn't work?

To actually maintain the true call depth you would need to intercept the
return of the function too, because the counter has to be decremented
at the end of the function.

Plain ftrace cannot do that because it only intercepts the function
entry.

The function graph tracer can do this, but only at the cost of
overwriting the return address (and saving return in a special stack)

This always causes a mispredict on every return, and other
overhead, and is one of the reasons why function graph
is so much slower than the plain function tracer.

I suspect the overhead would be significant.

To make your scheme work efficiently work likely we would
need custom gcc instrumentation for the returns.

FWIW our plan was to add enough manual stuffing at strategic
points, until we're sure enough of good enough coverage.

-Andi

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
  2018-01-23 23:14               ` Woodhouse, David
@ 2018-01-23 23:22                 ` Andi Kleen
  -1 siblings, 0 replies; 120+ messages in thread
From: Andi Kleen @ 2018-01-23 23:22 UTC (permalink / raw)
  To: Woodhouse, David
  Cc: thomas.lendacky, kvm, linux-kernel, peterz, ashok.raj, Raslan,
	KarimAllah, arjan.van.de.ven, arjan, bp, tglx,
	Janakarajan.Natarajan, tim.c.chen, torvalds, joro,
	dan.j.williams, x86, hpa, aarcange, mingo, luto, pbonzini,
	gregkh, dave.hansen, luto, mhiramat, asit.k.mallick,
	jun.nakajima, labbott, rkrcmar

On Tue, Jan 23, 2018 at 11:14:36PM +0000, Woodhouse, David wrote:
> On Tue, 2018-01-23 at 14:49 -0800, Andi Kleen wrote:
> > > Not sure.  Maybe to start, the answer might be to allow it to be set for
> > > the ultra-paranoid, but in general don't enable it by default.  Having it
> > > enabled would be an alternative to someone deciding to disable SMT, since
> > > that would have even more of a performance impact.
> > 
> > I agree. A reasonable strategy would be to only enable it for
> > processes that have dumpable disabled. This should be already set for
> > high value processes like GPG, and allows others to opt-in if
> > they need to.
> 
> That seems to make sense, and I think was the solution we were
> approaching for IBPB on context switch too, right?

Right.

-Andi

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
@ 2018-01-23 23:22                 ` Andi Kleen
  0 siblings, 0 replies; 120+ messages in thread
From: Andi Kleen @ 2018-01-23 23:22 UTC (permalink / raw)
  To: Woodhouse, David
  Cc: thomas.lendacky, kvm, linux-kernel, peterz, ashok.raj, Raslan,
	KarimAllah, arjan.van.de.ven, arjan, bp, tglx,
	Janakarajan.Natarajan, tim.c.chen, torvalds, joro,
	dan.j.williams, x86@kernel.org

On Tue, Jan 23, 2018 at 11:14:36PM +0000, Woodhouse, David wrote:
> On Tue, 2018-01-23 at 14:49 -0800, Andi Kleen wrote:
> > > Not sure.  Maybe to start, the answer might be to allow it to be set for
> > > the ultra-paranoid, but in general don't enable it by default.  Having it
> > > enabled would be an alternative to someone deciding to disable SMT, since
> > > that would have even more of a performance impact.
> > 
> > I agree. A reasonable strategy would be to only enable it for
> > processes that have dumpable disabled. This should be already set for
> > high value processes like GPG, and allows others to opt-in if
> > they need to.
> 
> That seems to make sense, and I think was the solution we were
> approaching for IBPB on context switch too, right?

Right.

-Andi

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
  2018-01-23 22:49             ` Andi Kleen
@ 2018-01-23 23:14               ` Woodhouse, David
  -1 siblings, 0 replies; 120+ messages in thread
From: Woodhouse, David @ 2018-01-23 23:14 UTC (permalink / raw)
  To: thomas.lendacky, ak
  Cc: kvm, linux-kernel, peterz, ashok.raj, Raslan, KarimAllah,
	arjan.van.de.ven, arjan, bp, tglx, Janakarajan.Natarajan,
	tim.c.chen, torvalds, joro, dan.j.williams, x86, hpa, aarcange,
	mingo, luto, pbonzini, gregkh, dave.hansen, luto, mhiramat,
	asit.k.mallick, jun.nakajima, labbott, rkrcmar


[-- Attachment #1.1: Type: text/plain, Size: 763 bytes --]

On Tue, 2018-01-23 at 14:49 -0800, Andi Kleen wrote:
> > Not sure.  Maybe to start, the answer might be to allow it to be set for
> > the ultra-paranoid, but in general don't enable it by default.  Having it
> > enabled would be an alternative to someone deciding to disable SMT, since
> > that would have even more of a performance impact.
> 
> I agree. A reasonable strategy would be to only enable it for
> processes that have dumpable disabled. This should be already set for
> high value processes like GPG, and allows others to opt-in if
> they need to.

That seems to make sense, and I think was the solution we were
approaching for IBPB on context switch too, right?

Are we generally agreed on dumpable as the criterion for both of those?

[-- Attachment #1.2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5210 bytes --]

[-- Attachment #2.1: Type: text/plain, Size: 197 bytes --]




Amazon Web Services UK Limited. Registered in England and Wales with registration number 08650665 and which has its registered office at 60 Holborn Viaduct, London EC1A 2FD, United Kingdom.

[-- Attachment #2.2: Type: text/html, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
@ 2018-01-23 23:14               ` Woodhouse, David
  0 siblings, 0 replies; 120+ messages in thread
From: Woodhouse, David @ 2018-01-23 23:14 UTC (permalink / raw)
  To: Andi Kleen, Tom Lendacky
  Cc: Andy Lutomirski, KarimAllah Ahmed, linux-kernel,
	Andrea Arcangeli, Andy Lutomirski, Arjan van de Ven, Ashok Raj,
	Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen,
	Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar,
	Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott,
	Linus Torvalds, Masami

[-- Attachment #1: Type: text/plain, Size: 763 bytes --]

On Tue, 2018-01-23 at 14:49 -0800, Andi Kleen wrote:
> > Not sure.  Maybe to start, the answer might be to allow it to be set for
> > the ultra-paranoid, but in general don't enable it by default.  Having it
> > enabled would be an alternative to someone deciding to disable SMT, since
> > that would have even more of a performance impact.
> 
> I agree. A reasonable strategy would be to only enable it for
> processes that have dumpable disabled. This should be already set for
> high value processes like GPG, and allows others to opt-in if
> they need to.

That seems to make sense, and I think was the solution we were
approaching for IBPB on context switch too, right?

Are we generally agreed on dumpable as the criterion for both of those?

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5210 bytes --]

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
  2018-01-23 22:37           ` Tom Lendacky
@ 2018-01-23 22:49             ` Andi Kleen
  -1 siblings, 0 replies; 120+ messages in thread
From: Andi Kleen @ 2018-01-23 22:49 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: Woodhouse, David, Andy Lutomirski, KarimAllah Ahmed,
	linux-kernel, Andrea Arcangeli, Andy Lutomirski,
	Arjan van de Ven, Ashok Raj, Asit Mallick, Borislav Petkov,
	Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin,
	Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima,
	Laura Abbott, Linus Torvalds, Masami Hiramatsu, Paolo Bonzini,
	Peter Zijlstra, Radim Krčmář,
	Thomas Gleixner, Tim Chen, kvm, x86, Arjan Van De Ven

> Not sure.  Maybe to start, the answer might be to allow it to be set for
> the ultra-paranoid, but in general don't enable it by default.  Having it
> enabled would be an alternative to someone deciding to disable SMT, since
> that would have even more of a performance impact.

I agree. A reasonable strategy would be to only enable it for
processes that have dumpable disabled. This should be already set for
high value processes like GPG, and allows others to opt-in if
they need to.

-Andi

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
@ 2018-01-23 22:49             ` Andi Kleen
  0 siblings, 0 replies; 120+ messages in thread
From: Andi Kleen @ 2018-01-23 22:49 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: Woodhouse, David, Andy Lutomirski, KarimAllah Ahmed,
	linux-kernel, Andrea Arcangeli, Andy Lutomirski,
	Arjan van de Ven, Ashok Raj, Asit Mallick, Borislav Petkov,
	Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin,
	Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima,
	Laura Abbott, Linus Torvalds

> Not sure.  Maybe to start, the answer might be to allow it to be set for
> the ultra-paranoid, but in general don't enable it by default.  Having it
> enabled would be an alternative to someone deciding to disable SMT, since
> that would have even more of a performance impact.

I agree. A reasonable strategy would be to only enable it for
processes that have dumpable disabled. This should be already set for
high value processes like GPG, and allows others to opt-in if
they need to.

-Andi

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
  2018-01-23 16:20         ` Woodhouse, David
@ 2018-01-23 22:37           ` Tom Lendacky
  -1 siblings, 0 replies; 120+ messages in thread
From: Tom Lendacky @ 2018-01-23 22:37 UTC (permalink / raw)
  To: Woodhouse, David, Andy Lutomirski, KarimAllah Ahmed
  Cc: linux-kernel, Andi Kleen, Andrea Arcangeli, Andy Lutomirski,
	Arjan van de Ven, Ashok Raj, Asit Mallick, Borislav Petkov,
	Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin,
	Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima,
	Laura Abbott, Linus Torvalds, Masami Hiramatsu, Paolo Bonzini,
	Peter Zijlstra, Radim Krčmář,
	Thomas Gleixner, Tim Chen, kvm, x86, Arjan Van De Ven

On 1/23/2018 10:20 AM, Woodhouse, David wrote:
> On Tue, 2018-01-23 at 10:12 -0600, Tom Lendacky wrote:
>>
>>>> +.macro UNRESTRICT_IB_SPEC
>>>> +    ALTERNATIVE "jmp .Lskip_\@", "", X86_FEATURE_IBRS
>>>> +    PUSH_MSR_REGS
>>>> +    WRMSR_ASM $MSR_IA32_SPEC_CTRL, $0, $0
>>>  
>> I think you should be writing 2, not 0, since I'm reasonably
>> confident that we want STIBP on.  Can you explain why you're writing
>> 0?
>>
>> Do we want to talk about STIBP in general?  Should it be (yet another)
>> boot option to enable or disable?  If there is STIBP support without
>> IBRS support, it could be a set and forget at boot time.
> 
> We haven't got patches which enable STIBP in general. The kernel itself
> is safe either way with retpoline, or because IBRS implies STIBP too
> (that is, there's no difference between writing 1 and 3).
> 
> So STIBP is purely about protecting userspace processes from one
> another, and VM guests from one another, when they run on HT siblings.
> 
> There's an argument that there are so many other information leaks
> between HT siblings that we might not care. Especially as it's hard to
> *tell* when you're scheduling, whether you trust all the processes (or
> guests) on your HT siblings right now... let alone later when
> scheduling another process if you need to *now* set STIBP on a sibling
> which is no longer save from this process now running.
> 
> I'm not sure we want to set STIBP *unconditionally* either because of
> the performance implications.
> 
> For IBRS we had an answer and it was just ugly. For STIBP we don't
> actually have an answer for "how do we use this?". Do we?

Not sure.  Maybe to start, the answer might be to allow it to be set for
the ultra-paranoid, but in general don't enable it by default.  Having it
enabled would be an alternative to someone deciding to disable SMT, since
that would have even more of a performance impact.

Thanks,
Tom

> 
> 

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
@ 2018-01-23 22:37           ` Tom Lendacky
  0 siblings, 0 replies; 120+ messages in thread
From: Tom Lendacky @ 2018-01-23 22:37 UTC (permalink / raw)
  To: Woodhouse, David, Andy Lutomirski, KarimAllah Ahmed
  Cc: linux-kernel, Andi Kleen, Andrea Arcangeli, Andy Lutomirski,
	Arjan van de Ven, Ashok Raj, Asit Mallick, Borislav Petkov,
	Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin,
	Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima,
	Laura Abbott, Linus Torvalds, Masami Hiramatsu, Paolo Bonzini

On 1/23/2018 10:20 AM, Woodhouse, David wrote:
> On Tue, 2018-01-23 at 10:12 -0600, Tom Lendacky wrote:
>>
>>>> +.macro UNRESTRICT_IB_SPEC
>>>> +    ALTERNATIVE "jmp .Lskip_\@", "", X86_FEATURE_IBRS
>>>> +    PUSH_MSR_REGS
>>>> +    WRMSR_ASM $MSR_IA32_SPEC_CTRL, $0, $0
>>>  
>> I think you should be writing 2, not 0, since I'm reasonably
>> confident that we want STIBP on.  Can you explain why you're writing
>> 0?
>>
>> Do we want to talk about STIBP in general?  Should it be (yet another)
>> boot option to enable or disable?  If there is STIBP support without
>> IBRS support, it could be a set and forget at boot time.
> 
> We haven't got patches which enable STIBP in general. The kernel itself
> is safe either way with retpoline, or because IBRS implies STIBP too
> (that is, there's no difference between writing 1 and 3).
> 
> So STIBP is purely about protecting userspace processes from one
> another, and VM guests from one another, when they run on HT siblings.
> 
> There's an argument that there are so many other information leaks
> between HT siblings that we might not care. Especially as it's hard to
> *tell* when you're scheduling, whether you trust all the processes (or
> guests) on your HT siblings right now... let alone later when
> scheduling another process if you need to *now* set STIBP on a sibling
> which is no longer save from this process now running.
> 
> I'm not sure we want to set STIBP *unconditionally* either because of
> the performance implications.
> 
> For IBRS we had an answer and it was just ugly. For STIBP we don't
> actually have an answer for "how do we use this?". Do we?

Not sure.  Maybe to start, the answer might be to allow it to be set for
the ultra-paranoid, but in general don't enable it by default.  Having it
enabled would be an alternative to someone deciding to disable SMT, since
that would have even more of a performance impact.

Thanks,
Tom

> 
> 

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
  2018-01-21 20:28       ` David Woodhouse
@ 2018-01-23 20:16         ` Pavel Machek
  -1 siblings, 0 replies; 120+ messages in thread
From: Pavel Machek @ 2018-01-23 20:16 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Linus Torvalds, KarimAllah Ahmed, Linux Kernel Mailing List,
	Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Arjan van de Ven,
	Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams,
	Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar,
	Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott,
	Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra,
	Radim Krčmář,
	Thomas Gleixner, Tim Chen, Tom Lendacky, kvm, x86,
	Arjan Van De Ven

On Sun 2018-01-21 20:28:17, David Woodhouse wrote:
> On Sun, 2018-01-21 at 11:34 -0800, Linus Torvalds wrote:
> > All of this is pure garbage.
> > 
> > Is Intel really planning on making this shit architectural? Has
> > anybody talked to them and told them they are f*cking insane?
> > 
> > Please, any Intel engineers here - talk to your managers. 
> 
> If the alternative was a two-decade product recall and giving everyone
> free CPUs, I'm not sure it was entirely insane.
> 
> Certainly it's a nasty hack, but hey — the world was on fire and in the
> end we didn't have to just turn the datacentres off and go back to goat
> farming, so it's not all bad.

Well, someone at Intel put world on fire. And then was selling faulty
CPUs for half a year while world was on fire; they knew they are
faulty yet they sold them anyway.

Then Intel talks about how great they are and how security is
important for them.... Intentionaly confusing between Meltdown and
Spectre so they can mask how badly they screwed. And without apologies.

> As a hack for existing CPUs, it's just about tolerable — as long as it
> can die entirely by the next generation.
> 
> So the part is I think is odd is the IBRS_ALL feature, where a future
> CPU will advertise "I am able to be not broken" and then you have to
> set the IBRS bit once at boot time to *ask* it not to be broken. That
> part is weird, because it ought to have been treated like the RDCL_NO
> bit — just "you don't have to worry any more, it got better".

And now Intel wants to cheat at benchmarks, to put companies that do
right thing at disadvantage and thinks that that's okay because world
was on fire?

At this point, I believe that yes, product recall would be
appropriate. If Intel is not willing to do it on their own, well,
perhaps courts can force them. Ouch and I wound not mind some jail time
for whoever is responsible for selling known-faulty CPUs to the public.

Oh, and still no word about the real fixes. World is not only Linux,
you see? https://pavelmachek.livejournal.com/140949.html?nojs=1

Best regards,
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
@ 2018-01-23 20:16         ` Pavel Machek
  0 siblings, 0 replies; 120+ messages in thread
From: Pavel Machek @ 2018-01-23 20:16 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Linus Torvalds, KarimAllah Ahmed, Linux Kernel Mailing List,
	Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Arjan van de Ven,
	Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams,
	Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar,
	Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott

On Sun 2018-01-21 20:28:17, David Woodhouse wrote:
> On Sun, 2018-01-21 at 11:34 -0800, Linus Torvalds wrote:
> > All of this is pure garbage.
> > 
> > Is Intel really planning on making this shit architectural? Has
> > anybody talked to them and told them they are f*cking insane?
> > 
> > Please, any Intel engineers here - talk to your managers. 
> 
> If the alternative was a two-decade product recall and giving everyone
> free CPUs, I'm not sure it was entirely insane.
> 
> Certainly it's a nasty hack, but hey — the world was on fire and in the
> end we didn't have to just turn the datacentres off and go back to goat
> farming, so it's not all bad.

Well, someone at Intel put world on fire. And then was selling faulty
CPUs for half a year while world was on fire; they knew they are
faulty yet they sold them anyway.

Then Intel talks about how great they are and how security is
important for them.... Intentionaly confusing between Meltdown and
Spectre so they can mask how badly they screwed. And without apologies.

> As a hack for existing CPUs, it's just about tolerable — as long as it
> can die entirely by the next generation.
> 
> So the part is I think is odd is the IBRS_ALL feature, where a future
> CPU will advertise "I am able to be not broken" and then you have to
> set the IBRS bit once at boot time to *ask* it not to be broken. That
> part is weird, because it ought to have been treated like the RDCL_NO
> bit — just "you don't have to worry any more, it got better".

And now Intel wants to cheat at benchmarks, to put companies that do
right thing at disadvantage and thinks that that's okay because world
was on fire?

At this point, I believe that yes, product recall would be
appropriate. If Intel is not willing to do it on their own, well,
perhaps courts can force them. Ouch and I wound not mind some jail time
for whoever is responsible for selling known-faulty CPUs to the public.

Oh, and still no word about the real fixes. World is not only Linux,
you see? https://pavelmachek.livejournal.com/140949.html?nojs=1

Best regards,
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
  2018-01-23 16:12       ` Tom Lendacky
@ 2018-01-23 16:20         ` Woodhouse, David
  -1 siblings, 0 replies; 120+ messages in thread
From: Woodhouse, David @ 2018-01-23 16:20 UTC (permalink / raw)
  To: thomas.lendacky, luto, Raslan, KarimAllah
  Cc: kvm, linux-kernel, peterz, ashok.raj, arjan, arjan.van.de.ven,
	bp, torvalds, tglx, Janakarajan.Natarajan, tim.c.chen, ak, joro,
	dan.j.williams, x86, hpa, aarcange, mingo, luto, pbonzini,
	gregkh, dave.hansen, mhiramat, asit.k.mallick, jun.nakajima,
	labbott, rkrcmar

[-- Attachment #1.1: Type: text/plain, Size: 1585 bytes --]

On Tue, 2018-01-23 at 10:12 -0600, Tom Lendacky wrote:
> 
> >> +.macro UNRESTRICT_IB_SPEC
> >> +    ALTERNATIVE "jmp .Lskip_\@", "", X86_FEATURE_IBRS
> >> +    PUSH_MSR_REGS
> >> +    WRMSR_ASM $MSR_IA32_SPEC_CTRL, $0, $0
> > 
> I think you should be writing 2, not 0, since I'm reasonably
> confident that we want STIBP on.  Can you explain why you're writing
> 0?
> 
> Do we want to talk about STIBP in general?  Should it be (yet another)
> boot option to enable or disable?  If there is STIBP support without
> IBRS support, it could be a set and forget at boot time.

We haven't got patches which enable STIBP in general. The kernel itself
is safe either way with retpoline, or because IBRS implies STIBP too
(that is, there's no difference between writing 1 and 3).

So STIBP is purely about protecting userspace processes from one
another, and VM guests from one another, when they run on HT siblings.

There's an argument that there are so many other information leaks
between HT siblings that we might not care. Especially as it's hard to
*tell* when you're scheduling, whether you trust all the processes (or
guests) on your HT siblings right now... let alone later when
scheduling another process if you need to *now* set STIBP on a sibling
which is no longer save from this process now running.

I'm not sure we want to set STIBP *unconditionally* either because of
the performance implications.

For IBRS we had an answer and it was just ugly. For STIBP we don't
actually have an answer for "how do we use this?". Do we?

[-- Attachment #1.2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5210 bytes --]

[-- Attachment #2.1: Type: text/plain, Size: 197 bytes --]

Amazon Web Services UK Limited. Registered in England and Wales with registration number 08650665 and which has its registered office at 60 Holborn Viaduct, London EC1A 2FD, United Kingdom.

[-- Attachment #2.2: Type: text/html, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
@ 2018-01-23 16:20         ` Woodhouse, David
  0 siblings, 0 replies; 120+ messages in thread
From: Woodhouse, David @ 2018-01-23 16:20 UTC (permalink / raw)
  To: Tom Lendacky, Andy Lutomirski, KarimAllah Ahmed
  Cc: linux-kernel, Andi Kleen, Andrea Arcangeli, Andy Lutomirski,
	Arjan van de Ven, Ashok Raj, Asit Mallick, Borislav Petkov,
	Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin,
	Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima,
	Laura Abbott, Linus Torvalds, Masami Hiramatsu, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 1585 bytes --]

On Tue, 2018-01-23 at 10:12 -0600, Tom Lendacky wrote:
> 
> >> +.macro UNRESTRICT_IB_SPEC
> >> +    ALTERNATIVE "jmp .Lskip_\@", "", X86_FEATURE_IBRS
> >> +    PUSH_MSR_REGS
> >> +    WRMSR_ASM $MSR_IA32_SPEC_CTRL, $0, $0
> > 
> I think you should be writing 2, not 0, since I'm reasonably
> confident that we want STIBP on.  Can you explain why you're writing
> 0?
> 
> Do we want to talk about STIBP in general?  Should it be (yet another)
> boot option to enable or disable?  If there is STIBP support without
> IBRS support, it could be a set and forget at boot time.

We haven't got patches which enable STIBP in general. The kernel itself
is safe either way with retpoline, or because IBRS implies STIBP too
(that is, there's no difference between writing 1 and 3).

So STIBP is purely about protecting userspace processes from one
another, and VM guests from one another, when they run on HT siblings.

There's an argument that there are so many other information leaks
between HT siblings that we might not care. Especially as it's hard to
*tell* when you're scheduling, whether you trust all the processes (or
guests) on your HT siblings right now... let alone later when
scheduling another process if you need to *now* set STIBP on a sibling
which is no longer save from this process now running.

I'm not sure we want to set STIBP *unconditionally* either because of
the performance implications.

For IBRS we had an answer and it was just ugly. For STIBP we don't
actually have an answer for "how do we use this?". Do we?

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5210 bytes --]

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
  2018-01-21 19:14     ` Andy Lutomirski
@ 2018-01-23 16:12       ` Tom Lendacky
  -1 siblings, 0 replies; 120+ messages in thread
From: Tom Lendacky @ 2018-01-23 16:12 UTC (permalink / raw)
  To: Andy Lutomirski, KarimAllah Ahmed
  Cc: linux-kernel, Andi Kleen, Andrea Arcangeli, Andy Lutomirski,
	Arjan van de Ven, Ashok Raj, Asit Mallick, Borislav Petkov,
	Dan Williams, Dave Hansen, David Woodhouse, Greg Kroah-Hartman,
	H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan,
	Joerg Roedel, Jun Nakajima, Laura Abbott, Linus Torvalds,
	Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra,
	Radim Krčmář,
	Thomas Gleixner, Tim Chen, kvm, x86, Arjan Van De Ven

On 1/21/2018 1:14 PM, Andy Lutomirski wrote:
> 
> 
>> On Jan 20, 2018, at 11:23 AM, KarimAllah Ahmed <karahmed@amazon.de> wrote:
>>
>> From: Tim Chen <tim.c.chen@linux.intel.com>
>>
>> Create macros to control Indirect Branch Speculation.
>>
>> Name them so they reflect what they are actually doing.
>> The macros are used to restrict and unrestrict the indirect branch speculation.
>> They do not *disable* (or *enable*) indirect branch speculation. A trip back to
>> user-space after *restricting* speculation would still affect the BTB.
>>
>> Quoting from a commit by Tim Chen:
>>
>> """
>>    If IBRS is set, near returns and near indirect jumps/calls will not allow
>>    their predicted target address to be controlled by code that executed in a
>>    less privileged prediction mode *BEFORE* the IBRS mode was last written with
>>    a value of 1 or on another logical processor so long as all Return Stack
>>    Buffer (RSB) entries from the previous less privileged prediction mode are
>>    overwritten.
>>
>>    Thus a near indirect jump/call/return may be affected by code in a less
>>    privileged prediction mode that executed *AFTER* IBRS mode was last written
>>    with a value of 1.
>> """
>>
>> [ tglx: Changed macro names and rewrote changelog ]
>> [ karahmed: changed macro names *again* and rewrote changelog ]
>>
>> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
>> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>> Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
>> Cc: Andrea Arcangeli <aarcange@redhat.com>
>> Cc: Andi Kleen <ak@linux.intel.com>
>> Cc: Peter Zijlstra <peterz@infradead.org>
>> Cc: Greg KH <gregkh@linuxfoundation.org>
>> Cc: Dave Hansen <dave.hansen@intel.com>
>> Cc: Andy Lutomirski <luto@kernel.org>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: Dan Williams <dan.j.williams@intel.com>
>> Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
>> Cc: Linus Torvalds <torvalds@linux-foundation.org>
>> Cc: David Woodhouse <dwmw@amazon.co.uk>
>> Cc: Ashok Raj <ashok.raj@intel.com>
>> Link: https://lkml.kernel.org/r/3aab341725ee6a9aafd3141387453b45d788d61a.1515542293.git.tim.c.chen@linux.intel.com
>> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
>> ---
>> arch/x86/entry/calling.h | 73 ++++++++++++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 73 insertions(+)
>>
>> diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
>> index 3f48f69..5aafb51 100644
>> --- a/arch/x86/entry/calling.h
>> +++ b/arch/x86/entry/calling.h
>> @@ -6,6 +6,8 @@
>> #include <asm/percpu.h>
>> #include <asm/asm-offsets.h>
>> #include <asm/processor-flags.h>
>> +#include <asm/msr-index.h>
>> +#include <asm/cpufeatures.h>
>>
>> /*
>>
>> @@ -349,3 +351,74 @@ For 32-bit we have the following conventions - kernel is built with
>> .Lafter_call_\@:
>> #endif
>> .endm
>> +
>> +/*
>> + * IBRS related macros
>> + */
>> +.macro PUSH_MSR_REGS
>> +    pushq    %rax
>> +    pushq    %rcx
>> +    pushq    %rdx
>> +.endm
>> +
>> +.macro POP_MSR_REGS
>> +    popq    %rdx
>> +    popq    %rcx
>> +    popq    %rax
>> +.endm
>> +
>> +.macro WRMSR_ASM msr_nr:req edx_val:req eax_val:req
>> +    movl    \msr_nr, %ecx
>> +    movl    \edx_val, %edx
>> +    movl    \eax_val, %eax
>> +    wrmsr
>> +.endm
>> +
>> +.macro RESTRICT_IB_SPEC
>> +    ALTERNATIVE "jmp .Lskip_\@", "", X86_FEATURE_IBRS
>> +    PUSH_MSR_REGS
>> +    WRMSR_ASM $MSR_IA32_SPEC_CTRL, $0, $SPEC_CTRL_IBRS
>> +    POP_MSR_REGS
>> +.Lskip_\@:
>> +.endm
>> +
>> +.macro UNRESTRICT_IB_SPEC
>> +    ALTERNATIVE "jmp .Lskip_\@", "", X86_FEATURE_IBRS
>> +    PUSH_MSR_REGS
>> +    WRMSR_ASM $MSR_IA32_SPEC_CTRL, $0, $0
> 
> I think you should be writing 2, not 0, since I'm reasonably confident that we want STIBP on.  Can you explain why you're writing 0?

Do we want to talk about STIBP in general?  Should it be (yet another)
boot option to enable or disable?  If there is STIBP support without
IBRS support, it could be a set and forget at boot time.

Thanks,
Tom

> 
> Also, holy cow, there are so many macros here.
> 
> And a meta question: why are there so many submitters of the same series?
> 

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
@ 2018-01-23 16:12       ` Tom Lendacky
  0 siblings, 0 replies; 120+ messages in thread
From: Tom Lendacky @ 2018-01-23 16:12 UTC (permalink / raw)
  To: Andy Lutomirski, KarimAllah Ahmed
  Cc: linux-kernel, Andi Kleen, Andrea Arcangeli, Andy Lutomirski,
	Arjan van de Ven, Ashok Raj, Asit Mallick, Borislav Petkov,
	Dan Williams, Dave Hansen, David Woodhouse, Greg Kroah-Hartman,
	H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan,
	Joerg Roedel, Jun Nakajima, Laura Abbott, Linus Torvalds,
	Masami Hiramatsu

On 1/21/2018 1:14 PM, Andy Lutomirski wrote:
> 
> 
>> On Jan 20, 2018, at 11:23 AM, KarimAllah Ahmed <karahmed@amazon.de> wrote:
>>
>> From: Tim Chen <tim.c.chen@linux.intel.com>
>>
>> Create macros to control Indirect Branch Speculation.
>>
>> Name them so they reflect what they are actually doing.
>> The macros are used to restrict and unrestrict the indirect branch speculation.
>> They do not *disable* (or *enable*) indirect branch speculation. A trip back to
>> user-space after *restricting* speculation would still affect the BTB.
>>
>> Quoting from a commit by Tim Chen:
>>
>> """
>>    If IBRS is set, near returns and near indirect jumps/calls will not allow
>>    their predicted target address to be controlled by code that executed in a
>>    less privileged prediction mode *BEFORE* the IBRS mode was last written with
>>    a value of 1 or on another logical processor so long as all Return Stack
>>    Buffer (RSB) entries from the previous less privileged prediction mode are
>>    overwritten.
>>
>>    Thus a near indirect jump/call/return may be affected by code in a less
>>    privileged prediction mode that executed *AFTER* IBRS mode was last written
>>    with a value of 1.
>> """
>>
>> [ tglx: Changed macro names and rewrote changelog ]
>> [ karahmed: changed macro names *again* and rewrote changelog ]
>>
>> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
>> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>> Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
>> Cc: Andrea Arcangeli <aarcange@redhat.com>
>> Cc: Andi Kleen <ak@linux.intel.com>
>> Cc: Peter Zijlstra <peterz@infradead.org>
>> Cc: Greg KH <gregkh@linuxfoundation.org>
>> Cc: Dave Hansen <dave.hansen@intel.com>
>> Cc: Andy Lutomirski <luto@kernel.org>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: Dan Williams <dan.j.williams@intel.com>
>> Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
>> Cc: Linus Torvalds <torvalds@linux-foundation.org>
>> Cc: David Woodhouse <dwmw@amazon.co.uk>
>> Cc: Ashok Raj <ashok.raj@intel.com>
>> Link: https://lkml.kernel.org/r/3aab341725ee6a9aafd3141387453b45d788d61a.1515542293.git.tim.c.chen@linux.intel.com
>> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
>> ---
>> arch/x86/entry/calling.h | 73 ++++++++++++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 73 insertions(+)
>>
>> diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
>> index 3f48f69..5aafb51 100644
>> --- a/arch/x86/entry/calling.h
>> +++ b/arch/x86/entry/calling.h
>> @@ -6,6 +6,8 @@
>> #include <asm/percpu.h>
>> #include <asm/asm-offsets.h>
>> #include <asm/processor-flags.h>
>> +#include <asm/msr-index.h>
>> +#include <asm/cpufeatures.h>
>>
>> /*
>>
>> @@ -349,3 +351,74 @@ For 32-bit we have the following conventions - kernel is built with
>> .Lafter_call_\@:
>> #endif
>> .endm
>> +
>> +/*
>> + * IBRS related macros
>> + */
>> +.macro PUSH_MSR_REGS
>> +    pushq    %rax
>> +    pushq    %rcx
>> +    pushq    %rdx
>> +.endm
>> +
>> +.macro POP_MSR_REGS
>> +    popq    %rdx
>> +    popq    %rcx
>> +    popq    %rax
>> +.endm
>> +
>> +.macro WRMSR_ASM msr_nr:req edx_val:req eax_val:req
>> +    movl    \msr_nr, %ecx
>> +    movl    \edx_val, %edx
>> +    movl    \eax_val, %eax
>> +    wrmsr
>> +.endm
>> +
>> +.macro RESTRICT_IB_SPEC
>> +    ALTERNATIVE "jmp .Lskip_\@", "", X86_FEATURE_IBRS
>> +    PUSH_MSR_REGS
>> +    WRMSR_ASM $MSR_IA32_SPEC_CTRL, $0, $SPEC_CTRL_IBRS
>> +    POP_MSR_REGS
>> +.Lskip_\@:
>> +.endm
>> +
>> +.macro UNRESTRICT_IB_SPEC
>> +    ALTERNATIVE "jmp .Lskip_\@", "", X86_FEATURE_IBRS
>> +    PUSH_MSR_REGS
>> +    WRMSR_ASM $MSR_IA32_SPEC_CTRL, $0, $0
> 
> I think you should be writing 2, not 0, since I'm reasonably confident that we want STIBP on.  Can you explain why you're writing 0?

Do we want to talk about STIBP in general?  Should it be (yet another)
boot option to enable or disable?  If there is STIBP support without
IBRS support, it could be a set and forget at boot time.

Thanks,
Tom

> 
> Also, holy cow, there are so many macros here.
> 
> And a meta question: why are there so many submitters of the same series?
> 

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
  2018-01-23  9:27                     ` Ingo Molnar
@ 2018-01-23 15:01                       ` Dave Hansen
  -1 siblings, 0 replies; 120+ messages in thread
From: Dave Hansen @ 2018-01-23 15:01 UTC (permalink / raw)
  To: Ingo Molnar, David Woodhouse
  Cc: Linus Torvalds, KarimAllah Ahmed, Linux Kernel Mailing List,
	Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Arjan van de Ven,
	Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams,
	Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar,
	Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott,
	Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra,
	Radim Krčmář,
	Thomas Gleixner, Tim Chen, Tom Lendacky, KVM list,
	the arch/x86 maintainers, Arjan Van De Ven

On 01/23/2018 01:27 AM, Ingo Molnar wrote:
> 
>  - All asynchronous contexts (IRQs, NMIs, etc.) stuff the RSB before IRET. (The 
>    tracking could probably made IRQ and maybe even NMI safe, but the worst-case 
>    nesting scenarios make my head ache.)

This all sounds totally workable to me.  We talked about using ftrace
itself to track call depth, but it would be unusable in production, of
course.  This seems workable, though.  You're also totally right about
the zero overhead on most kernels with it turned off when we don't need
RSB underflow protection (basically pre-Skylake).

I also agree that the safe thing to do is to just stuff before iret.  I
bet we can get a ftrace-driven RSB tracker working precisely enough even
with NMIs, but it's way simpler to just stuff and be done with it for now.

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
@ 2018-01-23 15:01                       ` Dave Hansen
  0 siblings, 0 replies; 120+ messages in thread
From: Dave Hansen @ 2018-01-23 15:01 UTC (permalink / raw)
  To: Ingo Molnar, David Woodhouse
  Cc: Linus Torvalds, KarimAllah Ahmed, Linux Kernel Mailing List,
	Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Arjan van de Ven,
	Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams,
	Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar,
	Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott,
	Masami Hiramatsu

On 01/23/2018 01:27 AM, Ingo Molnar wrote:
> 
>  - All asynchronous contexts (IRQs, NMIs, etc.) stuff the RSB before IRET. (The 
>    tracking could probably made IRQ and maybe even NMI safe, but the worst-case 
>    nesting scenarios make my head ache.)

This all sounds totally workable to me.  We talked about using ftrace
itself to track call depth, but it would be unusable in production, of
course.  This seems workable, though.  You're also totally right about
the zero overhead on most kernels with it turned off when we don't need
RSB underflow protection (basically pre-Skylake).

I also agree that the safe thing to do is to just stuff before iret.  I
bet we can get a ftrace-driven RSB tracker working precisely enough even
with NMIs, but it's way simpler to just stuff and be done with it for now.

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
@ 2018-01-23 11:13 Liran Alon
  2018-01-25 22:20 ` Dave Hansen
  0 siblings, 1 reply; 120+ messages in thread
From: Liran Alon @ 2018-01-23 11:13 UTC (permalink / raw)
  To: dwmw2
  Cc: labbott, luto, Janakarajan.Natarajan, torvalds, bp,
	asit.k.mallick, rkrcmar, dave.hansen, karahmed, hpa, mingo,
	jun.nakajima, x86, ashok.raj, arjan.van.de.ven, tim.c.chen,
	pbonzini, ak, linux-kernel, peterz, tglx, gregkh, mhiramat,
	arjan, thomas.lendacky, dan.j.williams, joro, kvm, aarcange


----- dwmw2@infradead.org wrote:

> On Sun, 2018-01-21 at 14:27 -0800, Linus Torvalds wrote:
> > On Sun, Jan 21, 2018 at 2:00 PM, David Woodhouse
> <dwmw2@infradead.org> wrote:
> > >>
> > >> The patches do things like add the garbage MSR writes to the
> kernel
> > >> entry/exit points. That's insane. That says "we're trying to
> protect
> > >> the kernel".  We already have retpoline there, with less
> overhead.
> > >
> > > You're looking at IBRS usage, not IBPB. They are different
> things.
> > 
> > Ehh. Odd intel naming detail.
> > 
> > If you look at this series, it very much does that kernel
> entry/exit
> > stuff. It was patch 10/10, iirc. In fact, the patch I was replying
> to
> > was explicitly setting that garbage up.
> > 
> > And I really don't want to see these garbage patches just
> mindlessly
> > sent around.
> 
> I think we've covered the technical part of this now, not that you
> like
> it — not that any of us *like* it. But since the peanut gallery is
> paying lots of attention it's probably worth explaining it a little
> more for their benefit.
> 
> This is all about Spectre variant 2, where the CPU can be tricked
> into
> mispredicting the target of an indirect branch. And I'm specifically
> looking at what we can do on *current* hardware, where we're limited
> to
> the hacks they can manage to add in the microcode.
> 
> The new microcode from Intel and AMD adds three new features.
> 
> One new feature (IBPB) is a complete barrier for branch prediction.
> After frobbing this, no branch targets learned earlier are going to
> be
> used. It's kind of expensive (order of magnitude ~4000 cycles).
> 
> The second (STIBP) protects a hyperthread sibling from following
> branch
> predictions which were learned on another sibling. You *might* want
> this when running unrelated processes in userspace, for example. Or
> different VM guests running on HT siblings.
> 
> The third feature (IBRS) is more complicated. It's designed to be
> set when you enter a more privileged execution mode (i.e. the
> kernel).
> It prevents branch targets learned in a less-privileged execution
> mode,
> BEFORE IT WAS MOST RECENTLY SET, from taking effect. But it's not
> just
> a 'set-and-forget' feature, it also has barrier-like semantics and
> needs to be set on *each* entry into the kernel (from userspace or a
> VM
> guest). It's *also* expensive. And a vile hack, but for a while it
> was
> the only option we had.
> 
> Even with IBRS, the CPU cannot tell the difference between different
> userspace processes, and between different VM guests. So in addition
> to
> IBRS to protect the kernel, we need the full IBPB barrier on context
> switch and vmexit. And maybe STIBP while they're running.
> 
> Then along came Paul with the cunning plan of "oh, indirect branches
> can be exploited? Screw it, let's not have any of *those* then",
> which
> is retpoline. And it's a *lot* faster than frobbing IBRS on every
> entry
> into the kernel. It's a massive performance win.
> 
> So now we *mostly* don't need IBRS. We build with retpoline, use IBPB
> on context switches/vmexit (which is in the first part of this patch
> series before IBRS is added), and we're safe. We even refactored the
> patch series to put retpoline first.
> 
> But wait, why did I say "mostly"? Well, not everyone has a retpoline
> compiler yet... but OK, screw them; they need to update.
> 
> Then there's Skylake, and that generation of CPU cores. For
> complicated
> reasons they actually end up being vulnerable not just on indirect
> branches, but also on a 'ret' in some circumstances (such as 16+
> CALLs
> in a deep chain).
> 
> The IBRS solution, ugly though it is, did address that. Retpoline
> doesn't. There are patches being floated to detect and prevent deep
> stacks, and deal with some of the other special cases that bite on
> SKL,
> but those are icky too. And in fact IBRS performance isn't anywhere
> near as bad on this generation of CPUs as it is on earlier CPUs
> *anyway*, which makes it not quite so insane to *contemplate* using
> it
> as Intel proposed.
> 
> That's why my initial idea, as implemented in this RFC patchset, was
> to
> stick with IBRS on Skylake, and use retpoline everywhere else. I'll
> give you "garbage patches", but they weren't being "just mindlessly
> sent around". If we're going to drop IBRS support and accept the
> caveats, then let's do it as a conscious decision having seen what it
> would look like, not just drop it quietly because poor Davey is too
> scared that Linus might shout at him again. :)
> 
> I have seen *hand-wavy* analyses of the Skylake thing that mean I'm
> not
> actually lying awake at night fretting about it, but nothing concrete
> that really says it's OK.
> 
> If you view retpoline as a performance optimisation, which is how it
> first arrived, then it's rather unconventional to say "well, it only
> opens a *little* bit of a security hole but it does go nice and fast
> so
> let's do it".
> 
> But fine, I'm content with ditching the use of IBRS to protect the
> kernel, and I'm not even surprised. There's a *reason* we put it last
> in the series, as both the most contentious and most dispensable
> part.
> I'd be *happier* with a coherent analysis showing Skylake is still
> OK,
> but hey-ho, screw Skylake.
> 
> The early part of the series adds the new feature bits and detects
> when
> it can turn KPTI off on non-Meltdown-vulnerable Intel CPUs, and also
> supports the IBPB barrier that we need to make retpoline complete.
> That
> much I think we definitely *do* want. There have been a bunch of us
> working on this behind the scenes; one of us will probably post that
> bit in the next day or so.
> 
> I think we also want to expose IBRS to VM guests, even if we don't
> use
> it ourselves. Because Windows guests (and RHEL guests; yay!) do use
> it.
> 
> If we can be done with the shouty part, I'd actually quite like to
> have
> a sensible discussion about when, if ever, we do IBPB on context
> switch
> (ptraceability and dumpable have both been suggested) and when, if
> ever, we set STIPB in userspace.

It is also important to note that current solutions, as I understand it, still have info-leak issues.

If retpoline is being used, user-mode code can leak RSB entries created while CPU was in kernel-mode.
Therefore, breaking KASLR. In order to handle this, every exit from kernel-mode to user-mode should stuff RSB. In addition, this stuffing of RSB may need to be done from a fixed address to avoid leaking the address of the RSB stuffing itself. Same concept applies for VMEntry into guests. Hypervisor should stuff RSB just before VMEntry, otherwise guest will be able to leak RSB entries which reveals hypervisor addresses.

If IBRS is being used, things seems to be even worse.
IBRS prevents BTB entries created at lower prediction-mode from being used by higher prediction-mode code.
However, nothing seems to prevent lower prediction-mode code from using BTB entries of higher prediction-mode code. This means that user-mode code could leak BTB entries in order to break KASLR and guests could leaks host's BTB entries to reveal hypervisor addresses. This seems to be an issue even with future CPUs that will have "IBRS all-the-time" feature.
Note that this issue is not theoretical. This is exactly what Google's Project-Zero KVM PoC did. They leaked host's BTB entries to reveal kvm-intel.ko, kvm.ko & vmlinux addresses.
It seems that the correct way to really handle this scenario should be to tag every BTB entry with prediction-mode and make CPU only use BTB entries tagged with current prediction-mode. Therefore, entirely separating the BTB entries between prediction-modes. That, in my opinion, should replace the IBRS-feature.

-Liran

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
  2018-01-23 10:44                           ` Ingo Molnar
  (?)
@ 2018-01-23 10:57                           ` David Woodhouse
  -1 siblings, 0 replies; 120+ messages in thread
From: David Woodhouse @ 2018-01-23 10:57 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, KarimAllah Ahmed, Linux Kernel Mailing List,
	Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Arjan van de Ven

[-- Attachment #1: Type: text/plain, Size: 2259 bytes --]

On Tue, 2018-01-23 at 11:44 +0100, Ingo Molnar wrote:
> * David Woodhouse <dwmw2@infradead.org> wrote:
> > Hm? We still have GCC emitting 'call __fentry__' don't we? Would be nice to get 
> > to the point where we can patch *that* out into a NOP... or are you saying we 
> > already can?
> Yes, we already can and do patch the 'call __fentry__/ mcount' call site into a 
> NOP today - all 50,000+ call sites on a typical distro kernel.
> 
> We did so for a long time - this is all a well established, working mechanism.

That's neat; I'd missed that.

> > But this is a digression. I was being pedantic about the "0 cycles" but sure, 
> > this would be perfectly tolerable.
> It's not a digression in two ways:
> 
> - I wanted to make it clear that for distro kernels it _is_ a zero cycles overhead
>   mechanism for non-SkyLake CPUs, literally.
> 
> - I noticed that Meltdown and the CR3 writes for PTI appears to have established a
>   kind of ... insensitivity and numbness to kernel micro-costs, which peaked with
>   the per-syscall MSR write nonsense patch of the SkyLake workaround.
>   That attitude is totally unacceptable to me as x86 maintainer and yes, still
>   every cycle counts.

Yeah, absolutely. But here we're talking about the overhead on non-SKL, 
and on non-SKL the IBRS overhead is zero too (well, again not precisely
zero because it turns into NOPs).

You're absolutely right that we shouldn't stop counting cycles.

I've already noted that on SKL IBRS is actually a lot faster than on
earlier generations, and we also get back some of the overhead by
turning the retpoline into a bare jmp again. We haven't *forgotten*
about performance.

I'd like to see your solution once the details are sorted out, and see
proper benchmarks — both microbenchmarks and real workloads — comparing
the two. And then make a reasoned decision based on that, and on how
happy we are with the theoretical holes that your solution leaves, in
the cold light of day.

We should also look at whether we want to set STIBP too, which is
somewhat orthogonal to using IBRS to protect the kernel, and could end
up with some of the same MSR writes (at least setting to zero) on some
of the same code paths.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
  2018-01-23 10:27                         ` David Woodhouse
@ 2018-01-23 10:44                           ` Ingo Molnar
  -1 siblings, 0 replies; 120+ messages in thread
From: Ingo Molnar @ 2018-01-23 10:44 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Linus Torvalds, KarimAllah Ahmed, Linux Kernel Mailing List,
	Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Arjan van de Ven,
	Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams,
	Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar,
	Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott,
	Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra,
	Radim Krčmář,
	Thomas Gleixner, Tim Chen, Tom Lendacky, KVM list,
	the arch/x86 maintainers, Arjan Van De Ven


* David Woodhouse <dwmw2@infradead.org> wrote:

> On Tue, 2018-01-23 at 11:15 +0100, Ingo Molnar wrote:
> > 
> > BTW., the reason this is enabled on all distro kernels is because the overhead 
> > is  a single patched-in NOP instruction in the function epilogue, when tracing 
> > is  disabled. So it's not even a CALL+RET - it's a patched in NOP.
> 
> Hm? We still have GCC emitting 'call __fentry__' don't we? Would be nice to get 
> to the point where we can patch *that* out into a NOP... or are you saying we 
> already can?

Yes, we already can and do patch the 'call __fentry__/ mcount' call site into a 
NOP today - all 50,000+ call sites on a typical distro kernel.

We did so for a long time - this is all a well established, working mechanism.

> But this is a digression. I was being pedantic about the "0 cycles" but sure, 
> this would be perfectly tolerable.

It's not a digression in two ways:

- I wanted to make it clear that for distro kernels it _is_ a zero cycles overhead
  mechanism for non-SkyLake CPUs, literally.

- I noticed that Meltdown and the CR3 writes for PTI appears to have established a
  kind of ... insensitivity and numbness to kernel micro-costs, which peaked with
  the per-syscall MSR write nonsense patch of the SkyLake workaround.
  That attitude is totally unacceptable to me as x86 maintainer and yes, still
  every cycle counts.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
@ 2018-01-23 10:44                           ` Ingo Molnar
  0 siblings, 0 replies; 120+ messages in thread
From: Ingo Molnar @ 2018-01-23 10:44 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Linus Torvalds, KarimAllah Ahmed, Linux Kernel Mailing List,
	Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Arjan van de Ven,
	Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams,
	Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar,
	Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott


* David Woodhouse <dwmw2@infradead.org> wrote:

> On Tue, 2018-01-23 at 11:15 +0100, Ingo Molnar wrote:
> > 
> > BTW., the reason this is enabled on all distro kernels is because the overhead 
> > is  a single patched-in NOP instruction in the function epilogue, when tracing 
> > is  disabled. So it's not even a CALL+RET - it's a patched in NOP.
> 
> Hm? We still have GCC emitting 'call __fentry__' don't we? Would be nice to get 
> to the point where we can patch *that* out into a NOP... or are you saying we 
> already can?

Yes, we already can and do patch the 'call __fentry__/ mcount' call site into a 
NOP today - all 50,000+ call sites on a typical distro kernel.

We did so for a long time - this is all a well established, working mechanism.

> But this is a digression. I was being pedantic about the "0 cycles" but sure, 
> this would be perfectly tolerable.

It's not a digression in two ways:

- I wanted to make it clear that for distro kernels it _is_ a zero cycles overhead
  mechanism for non-SkyLake CPUs, literally.

- I noticed that Meltdown and the CR3 writes for PTI appears to have established a
  kind of ... insensitivity and numbness to kernel micro-costs, which peaked with
  the per-syscall MSR write nonsense patch of the SkyLake workaround.
  That attitude is totally unacceptable to me as x86 maintainer and yes, still
  every cycle counts.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
  2018-01-23 10:23                       ` Ingo Molnar
  (?)
@ 2018-01-23 10:35                       ` David Woodhouse
  -1 siblings, 0 replies; 120+ messages in thread
From: David Woodhouse @ 2018-01-23 10:35 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, KarimAllah Ahmed, Linux Kernel Mailing List,
	Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Arjan van de Ven

[-- Attachment #1: Type: text/plain, Size: 2289 bytes --]

On Tue, 2018-01-23 at 11:23 +0100, Ingo Molnar wrote:
> * David Woodhouse <dwmw2@infradead.org> wrote:
> 
> > 
> > > 
> > > On SkyLake this would add an overhead of maybe 2-3 cycles per function call and 
> > > obviously all this code and data would be very cache hot. Given that the average 
> > > number of function calls per system call is around a dozen, this would be _much_ 
> > > faster than any microcode/MSR based approach.
> > That's kind of neat, except you don't want it at the top of the
> > function; you want it at the bottom.
> > 
> > If you could hijack the *return* site, then you could check for
> > underflow and stuff the RSB right there. But in __fentry__ there's not
> > a lot you can do other than complain that something bad is going to
> > happen in the future. You know that a string of 16+ rets is going to
> > happen, but you've got no gadget in *there* to deal with it when it
> > does.
>
> No, it can be done with the existing CALL instrumentation callback that 
> CONFIG_DYNAMIC_FTRACE=y provides, by pushing a RET trampoline on the stack from 
> the CALL trampoline - see my previous email.

Yes, that's a neat solution.

> > 
> > HJ did have patches to turn 'ret' into a form of retpoline, which I
> > don't think ever even got performance-tested.
> Return instrumentation is possible as well, but there are two major drawbacks:
> 
>  - GCC support for it is not as widely available and return instrumentation is 
>    less tested in Linux kernel contexts

Hey, we're *already* making people upgrade their compiler, and HJ
apparently never sleeps. So don't actually be held back too much by
that consideration. If it could be better done with GCC help, we really
*can* explore that.

>  - a major point of my suggestion is that CONFIG_DYNAMIC_FTRACE=y is already 
>    enabled in distros here and today, so the runtime overhead to non-SkyLake CPUs 
>    would be literally zero, while still allowing to fix the RSB vulnerability on 
>    SkyLake.

Sure. You still have a few holes to fix (or declare acceptable) to
bring it to the full coverage of the IBRS solution, and it's still
possible that by the time it's complete it's approaching the ick factor
of IBRS, but I'd love to see it.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
  2018-01-23 10:15                       ` Ingo Molnar
@ 2018-01-23 10:27                         ` David Woodhouse
  -1 siblings, 0 replies; 120+ messages in thread
From: David Woodhouse @ 2018-01-23 10:27 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, KarimAllah Ahmed, Linux Kernel Mailing List,
	Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Arjan van de Ven,
	Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams,
	Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar,
	Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott,
	Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra,
	Radim Krčmář,
	Thomas Gleixner, Tim Chen, Tom Lendacky, KVM list,
	the arch/x86 maintainers, Arjan Van De Ven

[-- Attachment #1: Type: text/plain, Size: 586 bytes --]

On Tue, 2018-01-23 at 11:15 +0100, Ingo Molnar wrote:
> 
> BTW., the reason this is enabled on all distro kernels is because the overhead is 
> a single patched-in NOP instruction in the function epilogue, when tracing is 
> disabled. So it's not even a CALL+RET - it's a patched in NOP.

Hm? We still have GCC emitting 'call __fentry__' don't we? Would be
nice to get to the point where we can patch *that* out into a NOP... or
are you saying we already can?

But this is a digression. I was being pedantic about the "0 cycles" but
sure, this would be perfectly tolerable.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
@ 2018-01-23 10:27                         ` David Woodhouse
  0 siblings, 0 replies; 120+ messages in thread
From: David Woodhouse @ 2018-01-23 10:27 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, KarimAllah Ahmed, Linux Kernel Mailing List,
	Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Arjan van de Ven,
	Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams,
	Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar,
	Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott

[-- Attachment #1: Type: text/plain, Size: 586 bytes --]

On Tue, 2018-01-23 at 11:15 +0100, Ingo Molnar wrote:
> 
> BTW., the reason this is enabled on all distro kernels is because the overhead is 
> a single patched-in NOP instruction in the function epilogue, when tracing is 
> disabled. So it's not even a CALL+RET - it's a patched in NOP.

Hm? We still have GCC emitting 'call __fentry__' don't we? Would be
nice to get to the point where we can patch *that* out into a NOP... or
are you saying we already can?

But this is a digression. I was being pedantic about the "0 cycles" but
sure, this would be perfectly tolerable.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
  2018-01-23  9:30                     ` David Woodhouse
@ 2018-01-23 10:23                       ` Ingo Molnar
  -1 siblings, 0 replies; 120+ messages in thread
From: Ingo Molnar @ 2018-01-23 10:23 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Linus Torvalds, KarimAllah Ahmed, Linux Kernel Mailing List,
	Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Arjan van de Ven,
	Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams,
	Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar,
	Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott,
	Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra,
	Radim Krčmář,
	Thomas Gleixner, Tim Chen, Tom Lendacky, KVM list,
	the arch/x86 maintainers, Arjan Van De Ven


* David Woodhouse <dwmw2@infradead.org> wrote:

> > On SkyLake this would add an overhead of maybe 2-3 cycles per function call and 
> > obviously all this code and data would be very cache hot. Given that the average 
> > number of function calls per system call is around a dozen, this would be _much_ 
> > faster than any microcode/MSR based approach.
> 
> That's kind of neat, except you don't want it at the top of the
> function; you want it at the bottom.
> 
> If you could hijack the *return* site, then you could check for
> underflow and stuff the RSB right there. But in __fentry__ there's not
> a lot you can do other than complain that something bad is going to
> happen in the future. You know that a string of 16+ rets is going to
> happen, but you've got no gadget in *there* to deal with it when it
> does.

No, it can be done with the existing CALL instrumentation callback that 
CONFIG_DYNAMIC_FTRACE=y provides, by pushing a RET trampoline on the stack from 
the CALL trampoline - see my previous email.

> HJ did have patches to turn 'ret' into a form of retpoline, which I
> don't think ever even got performance-tested.

Return instrumentation is possible as well, but there are two major drawbacks:

 - GCC support for it is not as widely available and return instrumentation is 
   less tested in Linux kernel contexts

 - a major point of my suggestion is that CONFIG_DYNAMIC_FTRACE=y is already 
   enabled in distros here and today, so the runtime overhead to non-SkyLake CPUs 
   would be literally zero, while still allowing to fix the RSB vulnerability on 
   SkyLake.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
@ 2018-01-23 10:23                       ` Ingo Molnar
  0 siblings, 0 replies; 120+ messages in thread
From: Ingo Molnar @ 2018-01-23 10:23 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Linus Torvalds, KarimAllah Ahmed, Linux Kernel Mailing List,
	Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Arjan van de Ven,
	Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams,
	Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar,
	Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott


* David Woodhouse <dwmw2@infradead.org> wrote:

> > On SkyLake this would add an overhead of maybe 2-3 cycles per function call and 
> > obviously all this code and data would be very cache hot. Given that the average 
> > number of function calls per system call is around a dozen, this would be _much_ 
> > faster than any microcode/MSR based approach.
> 
> That's kind of neat, except you don't want it at the top of the
> function; you want it at the bottom.
> 
> If you could hijack the *return* site, then you could check for
> underflow and stuff the RSB right there. But in __fentry__ there's not
> a lot you can do other than complain that something bad is going to
> happen in the future. You know that a string of 16+ rets is going to
> happen, but you've got no gadget in *there* to deal with it when it
> does.

No, it can be done with the existing CALL instrumentation callback that 
CONFIG_DYNAMIC_FTRACE=y provides, by pushing a RET trampoline on the stack from 
the CALL trampoline - see my previous email.

> HJ did have patches to turn 'ret' into a form of retpoline, which I
> don't think ever even got performance-tested.

Return instrumentation is possible as well, but there are two major drawbacks:

 - GCC support for it is not as widely available and return instrumentation is 
   less tested in Linux kernel contexts

 - a major point of my suggestion is that CONFIG_DYNAMIC_FTRACE=y is already 
   enabled in distros here and today, so the runtime overhead to non-SkyLake CPUs 
   would be literally zero, while still allowing to fix the RSB vulnerability on 
   SkyLake.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
  2018-01-23  9:30                     ` David Woodhouse
@ 2018-01-23 10:15                       ` Ingo Molnar
  -1 siblings, 0 replies; 120+ messages in thread
From: Ingo Molnar @ 2018-01-23 10:15 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Linus Torvalds, KarimAllah Ahmed, Linux Kernel Mailing List,
	Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Arjan van de Ven,
	Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams,
	Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar,
	Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott,
	Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra,
	Radim Krčmář,
	Thomas Gleixner, Tim Chen, Tom Lendacky, KVM list,
	the arch/x86 maintainers, Arjan Van De Ven


* David Woodhouse <dwmw2@infradead.org> wrote:

> On Tue, 2018-01-23 at 08:53 +0100, Ingo Molnar wrote:
> > 
> > The patch below demonstrates the principle, it forcibly enables dynamic ftrace 
> > patching (CONFIG_DYNAMIC_FTRACE=y et al) and turns mcount/__fentry__ into a RET:
> > 
> >   ffffffff81a01a40 <__fentry__>:
> >   ffffffff81a01a40:       c3                      retq   
> > 
> > This would have to be extended with (very simple) call stack depth tracking (just 
> > 3 more instructions would do in the fast path I believe) and a suitable SkyLake 
> > workaround (and also has to play nice with the ftrace callbacks).
> > 
> > On non-SkyLake the overhead would be 0 cycles.
> 
> The overhead of forcing CONFIG_DYNAMIC_FTRACE=y is precisely zero
> cycles? That seems a little optimistic. ;)

The overhead of the quick hack patch I sent to show what exact code I mean is 
obviously not zero.

The overhead of using my proposed solution, to utilize the function call callback 
that CONFIG_DYNAMIC_FTRACE=y provides, is exactly zero on non-SkyLake systems 
where the callback is patched out, on typical Linux distros.

The callback is widely enabled on distro kernels:

  Fedora:                    CONFIG_DYNAMIC_FTRACE=y
  Ubuntu:                    CONFIG_DYNAMIC_FTRACE=y
  OpenSuse (default flavor): CONFIG_DYNAMIC_FTRACE=y

BTW., the reason this is enabled on all distro kernels is because the overhead is 
a single patched-in NOP instruction in the function epilogue, when tracing is 
disabled. So it's not even a CALL+RET - it's a patched in NOP.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
@ 2018-01-23 10:15                       ` Ingo Molnar
  0 siblings, 0 replies; 120+ messages in thread
From: Ingo Molnar @ 2018-01-23 10:15 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Linus Torvalds, KarimAllah Ahmed, Linux Kernel Mailing List,
	Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Arjan van de Ven,
	Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams,
	Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar,
	Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott


* David Woodhouse <dwmw2@infradead.org> wrote:

> On Tue, 2018-01-23 at 08:53 +0100, Ingo Molnar wrote:
> > 
> > The patch below demonstrates the principle, it forcibly enables dynamic ftrace 
> > patching (CONFIG_DYNAMIC_FTRACE=y et al) and turns mcount/__fentry__ into a RET:
> > 
> >   ffffffff81a01a40 <__fentry__>:
> >   ffffffff81a01a40:       c3                      retq   
> > 
> > This would have to be extended with (very simple) call stack depth tracking (just 
> > 3 more instructions would do in the fast path I believe) and a suitable SkyLake 
> > workaround (and also has to play nice with the ftrace callbacks).
> > 
> > On non-SkyLake the overhead would be 0 cycles.
> 
> The overhead of forcing CONFIG_DYNAMIC_FTRACE=y is precisely zero
> cycles? That seems a little optimistic. ;)

The overhead of the quick hack patch I sent to show what exact code I mean is 
obviously not zero.

The overhead of using my proposed solution, to utilize the function call callback 
that CONFIG_DYNAMIC_FTRACE=y provides, is exactly zero on non-SkyLake systems 
where the callback is patched out, on typical Linux distros.

The callback is widely enabled on distro kernels:

  Fedora:                    CONFIG_DYNAMIC_FTRACE=y
  Ubuntu:                    CONFIG_DYNAMIC_FTRACE=y
  OpenSuse (default flavor): CONFIG_DYNAMIC_FTRACE=y

BTW., the reason this is enabled on all distro kernels is because the overhead is 
a single patched-in NOP instruction in the function epilogue, when tracing is 
disabled. So it's not even a CALL+RET - it's a patched in NOP.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
  2018-01-23  9:27                     ` Ingo Molnar
@ 2018-01-23  9:37                       ` David Woodhouse
  -1 siblings, 0 replies; 120+ messages in thread
From: David Woodhouse @ 2018-01-23  9:37 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, KarimAllah Ahmed, Linux Kernel Mailing List,
	Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Arjan van de Ven,
	Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams,
	Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar,
	Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott,
	Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra,
	Radim Krčmář,
	Thomas Gleixner, Tim Chen, Tom Lendacky, KVM list,
	the arch/x86 maintainers, Arjan Van De Ven

[-- Attachment #1: Type: text/plain, Size: 2561 bytes --]

On Tue, 2018-01-23 at 10:27 +0100, Ingo Molnar wrote:
> * Ingo Molnar <mingo@kernel.org> wrote:
> 
> > 
> > Is there a testcase for the SkyLake 16-deep-call-stack problem that I could run? 
> > Is there a description of the exact speculative execution vulnerability that has 
> > to be addressed to begin with?
>
> Ok, so for now I'm assuming that this is the 16 entries return-stack-buffer 
> underflow condition where SkyLake falls back to the branch predictor (while other 
> CPUs wrap the buffer).

Yep.

> > 
> > If this approach is workable I'd much prefer it to any MSR writes in the syscall 
> > entry path not just because it's fast enough in practice to not be turned off by 
> > everyone, but also because everyone would agree that per function call overhead 
> > needs to go away on new CPUs. Both deployment and backporting is also _much_ more 
> > flexible, simpler, faster and more complete than microcode/firmware or compiler 
> > based solutions.
> > 
> > Assuming the vulnerability can be addressed via this route that is, which is a big 
> > assumption!
>
> So I talked this over with PeterZ, and I think it's all doable:
> 
>  - the CALL __fentry__ callbacks maintain the depth tracking (on the kernel 
>    stack, fast to access), and issue an "RSB-stuffing sequence" when depth reaches
>    16 entries.
> 
>  - "the RSB-stuffing sequence" is a return trampoline that pushes a CALL on the 
>    stack which is executed on the RET.

That's neat. We'll want to make sure the unwinder can cope but hey,
Peter *loves* hacking objtool, right? :)

>  - All asynchronous contexts (IRQs, NMIs, etc.) stuff the RSB before IRET. (The 
>    tracking could probably made IRQ and maybe even NMI safe, but the worst-case 
>    nesting scenarios make my head ache.)
> 
> I.e. IBRS can be mostly replaced with a kernel based solution that is better than 
> IBRS and which does not negatively impact any other non-SkyLake CPUs or general 
> code quality.
> 
> I.e. a full upstream Spectre solution.

Sounds good. I look forward to seeing it.

In the meantime I'll resend the basic bits of the feature detection and
especially turning off KPTI when RDCL_NO is set.

We do also want to do IBPB even with retpoline, so I'll send those
patches for KVM and context switch. There is some bikeshedding to be
done there about the precise conditions under which we do it.

Finally, KVM should be *exposing* IBRS to guests even if we don't use
it ourselves. We'll do that too.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
@ 2018-01-23  9:37                       ` David Woodhouse
  0 siblings, 0 replies; 120+ messages in thread
From: David Woodhouse @ 2018-01-23  9:37 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, KarimAllah Ahmed, Linux Kernel Mailing List,
	Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Arjan van de Ven,
	Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams,
	Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar,
	Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott

[-- Attachment #1: Type: text/plain, Size: 2561 bytes --]

On Tue, 2018-01-23 at 10:27 +0100, Ingo Molnar wrote:
> * Ingo Molnar <mingo@kernel.org> wrote:
> 
> > 
> > Is there a testcase for the SkyLake 16-deep-call-stack problem that I could run? 
> > Is there a description of the exact speculative execution vulnerability that has 
> > to be addressed to begin with?
>
> Ok, so for now I'm assuming that this is the 16 entries return-stack-buffer 
> underflow condition where SkyLake falls back to the branch predictor (while other 
> CPUs wrap the buffer).

Yep.

> > 
> > If this approach is workable I'd much prefer it to any MSR writes in the syscall 
> > entry path not just because it's fast enough in practice to not be turned off by 
> > everyone, but also because everyone would agree that per function call overhead 
> > needs to go away on new CPUs. Both deployment and backporting is also _much_ more 
> > flexible, simpler, faster and more complete than microcode/firmware or compiler 
> > based solutions.
> > 
> > Assuming the vulnerability can be addressed via this route that is, which is a big 
> > assumption!
>
> So I talked this over with PeterZ, and I think it's all doable:
> 
>  - the CALL __fentry__ callbacks maintain the depth tracking (on the kernel 
>    stack, fast to access), and issue an "RSB-stuffing sequence" when depth reaches
>    16 entries.
> 
>  - "the RSB-stuffing sequence" is a return trampoline that pushes a CALL on the 
>    stack which is executed on the RET.

That's neat. We'll want to make sure the unwinder can cope but hey,
Peter *loves* hacking objtool, right? :)

>  - All asynchronous contexts (IRQs, NMIs, etc.) stuff the RSB before IRET. (The 
>    tracking could probably made IRQ and maybe even NMI safe, but the worst-case 
>    nesting scenarios make my head ache.)
> 
> I.e. IBRS can be mostly replaced with a kernel based solution that is better than 
> IBRS and which does not negatively impact any other non-SkyLake CPUs or general 
> code quality.
> 
> I.e. a full upstream Spectre solution.

Sounds good. I look forward to seeing it.

In the meantime I'll resend the basic bits of the feature detection and
especially turning off KPTI when RDCL_NO is set.

We do also want to do IBPB even with retpoline, so I'll send those
patches for KVM and context switch. There is some bikeshedding to be
done there about the precise conditions under which we do it.

Finally, KVM should be *exposing* IBRS to guests even if we don't use
it ourselves. We'll do that too.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
  2018-01-23  7:53                   ` Ingo Molnar
@ 2018-01-23  9:30                     ` David Woodhouse
  -1 siblings, 0 replies; 120+ messages in thread
From: David Woodhouse @ 2018-01-23  9:30 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, KarimAllah Ahmed, Linux Kernel Mailing List,
	Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Arjan van de Ven,
	Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams,
	Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar,
	Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott,
	Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra,
	Radim Krčmář,
	Thomas Gleixner, Tim Chen, Tom Lendacky, KVM list,
	the arch/x86 maintainers, Arjan Van De Ven

[-- Attachment #1: Type: text/plain, Size: 4756 bytes --]

On Tue, 2018-01-23 at 08:53 +0100, Ingo Molnar wrote:
> 
> The patch below demonstrates the principle, it forcibly enables dynamic ftrace 
> patching (CONFIG_DYNAMIC_FTRACE=y et al) and turns mcount/__fentry__ into a RET:
> 
>   ffffffff81a01a40 <__fentry__>:
>   ffffffff81a01a40:       c3                      retq   
> 
> This would have to be extended with (very simple) call stack depth tracking (just 
> 3 more instructions would do in the fast path I believe) and a suitable SkyLake 
> workaround (and also has to play nice with the ftrace callbacks).
> 
> On non-SkyLake the overhead would be 0 cycles.

The overhead of forcing CONFIG_DYNAMIC_FTRACE=y is precisely zero
cycles? That seems a little optimistic. ;)

I'll grant you if it goes straight to a 'ret' it isn't *that* high
though.

> On SkyLake this would add an overhead of maybe 2-3 cycles per function call and 
> obviously all this code and data would be very cache hot. Given that the average 
> number of function calls per system call is around a dozen, this would be _much_ 
> faster than any microcode/MSR based approach.

That's kind of neat, except you don't want it at the top of the
function; you want it at the bottom.

If you could hijack the *return* site, then you could check for
underflow and stuff the RSB right there. But in __fentry__ there's not
a lot you can do other than complain that something bad is going to
happen in the future. You know that a string of 16+ rets is going to
happen, but you've got no gadget in *there* to deal with it when it
does.

HJ did have patches to turn 'ret' into a form of retpoline, which I
don't think ever even got performance-tested. They'd have forced a
mispredict on *every* ret. A cheaper option might be to turn ret into a
'jmp skylake_ret_hack'. Which on pre-SKL will be a bare ret, and SKL+
can do the counting (in conjunction with a 'per_cpu(call_depth)++' in
__fentry__) and stuff the RSB before actually returning, when
appropriate.

By the time you've made it work properly, I suspect we're approaching
the barf-factor of IBRS, for a less complete solution.

> Is there a testcase for the SkyLake 16-deep-call-stack problem that I could run? 

Andi's been experimenting at 
https://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc.git/log/?h=spec/deep-chain-3

> Is there a description of the exact speculative execution vulnerability that has 
> to be addressed to begin with?

"It takes predictions from the generic branch target buffer when the
RSB underflows".

IBRS filters what can come from the BTB, and resolves the problem that
way. Retpoline avoids the indirect branches that on *earlier* CPUs were
the only things that would use the offending predictions. But on SKL,
now 'ret' is one of the problematic instructions too. Fun! :)

> If this approach is workable I'd much prefer it to any MSR writes in the syscall 
> entry path not just because it's fast enough in practice to not be turned off by 
> everyone, but also because everyone would agree that per function call overhead 
> needs to go away on new CPUs. Both deployment and backporting is also _much_ more 
> flexible, simpler, faster and more complete than microcode/firmware or compiler 
> based solutions.
> 
> Assuming the vulnerability can be addressed via this route that is, which is a big 
> assumption!

I think it's close. There are some other cases which empty the RSB,
like sleeping and loading microcode, which can happily be special-
cased. Andi's rounded up many of the remaining details already at 
https://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc.git/log/?h=spec/skl-rsb-3

And there's SMI, which is a pain but I think Linus is right we can
possibly just stick our fingers in our ears and pretend we didn't hear
about that one as it's likely to be hard to trigger (famous last
words).

On the whole though, I think you can see why we're keeping IBRS around
for now, sent out purely as an RFC and rebased on top of the stuff
we're *actually* sending to Linus for inclusion.

When we have a clear idea of what we're doing for Skylake, it'll be
useful to have a proper comparison of the security, the performance and
the "ick" factor of whatever we come up with, vs. IBRS.

Right now the plan is just "screw Skylake"; we'll just forget it's a
special snowflake and treat it like everything else, except for a bit
of extra RSB-stuffing on context switch (since we had to add that for
!SMEP anyway). And that's not *entirely* unreasonable but as I said I'd
*really* like to have a decent analysis of the implications of that,
not just some hand-wavy "nah, it'll be fine".

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
@ 2018-01-23  9:30                     ` David Woodhouse
  0 siblings, 0 replies; 120+ messages in thread
From: David Woodhouse @ 2018-01-23  9:30 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, KarimAllah Ahmed, Linux Kernel Mailing List,
	Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Arjan van de Ven,
	Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams,
	Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar,
	Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott

[-- Attachment #1: Type: text/plain, Size: 4756 bytes --]

On Tue, 2018-01-23 at 08:53 +0100, Ingo Molnar wrote:
> 
> The patch below demonstrates the principle, it forcibly enables dynamic ftrace 
> patching (CONFIG_DYNAMIC_FTRACE=y et al) and turns mcount/__fentry__ into a RET:
> 
>   ffffffff81a01a40 <__fentry__>:
>   ffffffff81a01a40:       c3                      retq   
> 
> This would have to be extended with (very simple) call stack depth tracking (just 
> 3 more instructions would do in the fast path I believe) and a suitable SkyLake 
> workaround (and also has to play nice with the ftrace callbacks).
> 
> On non-SkyLake the overhead would be 0 cycles.

The overhead of forcing CONFIG_DYNAMIC_FTRACE=y is precisely zero
cycles? That seems a little optimistic. ;)

I'll grant you if it goes straight to a 'ret' it isn't *that* high
though.

> On SkyLake this would add an overhead of maybe 2-3 cycles per function call and 
> obviously all this code and data would be very cache hot. Given that the average 
> number of function calls per system call is around a dozen, this would be _much_ 
> faster than any microcode/MSR based approach.

That's kind of neat, except you don't want it at the top of the
function; you want it at the bottom.

If you could hijack the *return* site, then you could check for
underflow and stuff the RSB right there. But in __fentry__ there's not
a lot you can do other than complain that something bad is going to
happen in the future. You know that a string of 16+ rets is going to
happen, but you've got no gadget in *there* to deal with it when it
does.

HJ did have patches to turn 'ret' into a form of retpoline, which I
don't think ever even got performance-tested. They'd have forced a
mispredict on *every* ret. A cheaper option might be to turn ret into a
'jmp skylake_ret_hack'. Which on pre-SKL will be a bare ret, and SKL+
can do the counting (in conjunction with a 'per_cpu(call_depth)++' in
__fentry__) and stuff the RSB before actually returning, when
appropriate.

By the time you've made it work properly, I suspect we're approaching
the barf-factor of IBRS, for a less complete solution.

> Is there a testcase for the SkyLake 16-deep-call-stack problem that I could run? 

Andi's been experimenting at 
https://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc.git/log/?h=spec/deep-chain-3

> Is there a description of the exact speculative execution vulnerability that has 
> to be addressed to begin with?

"It takes predictions from the generic branch target buffer when the
RSB underflows".

IBRS filters what can come from the BTB, and resolves the problem that
way. Retpoline avoids the indirect branches that on *earlier* CPUs were
the only things that would use the offending predictions. But on SKL,
now 'ret' is one of the problematic instructions too. Fun! :)

> If this approach is workable I'd much prefer it to any MSR writes in the syscall 
> entry path not just because it's fast enough in practice to not be turned off by 
> everyone, but also because everyone would agree that per function call overhead 
> needs to go away on new CPUs. Both deployment and backporting is also _much_ more 
> flexible, simpler, faster and more complete than microcode/firmware or compiler 
> based solutions.
> 
> Assuming the vulnerability can be addressed via this route that is, which is a big 
> assumption!

I think it's close. There are some other cases which empty the RSB,
like sleeping and loading microcode, which can happily be special-
cased. Andi's rounded up many of the remaining details already at 
https://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc.git/log/?h=spec/skl-rsb-3

And there's SMI, which is a pain but I think Linus is right we can
possibly just stick our fingers in our ears and pretend we didn't hear
about that one as it's likely to be hard to trigger (famous last
words).

On the whole though, I think you can see why we're keeping IBRS around
for now, sent out purely as an RFC and rebased on top of the stuff
we're *actually* sending to Linus for inclusion.

When we have a clear idea of what we're doing for Skylake, it'll be
useful to have a proper comparison of the security, the performance and
the "ick" factor of whatever we come up with, vs. IBRS.

Right now the plan is just "screw Skylake"; we'll just forget it's a
special snowflake and treat it like everything else, except for a bit
of extra RSB-stuffing on context switch (since we had to add that for
!SMEP anyway). And that's not *entirely* unreasonable but as I said I'd
*really* like to have a decent analysis of the implications of that,
not just some hand-wavy "nah, it'll be fine".

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
  2018-01-23  7:53                   ` Ingo Molnar
@ 2018-01-23  9:27                     ` Ingo Molnar
  -1 siblings, 0 replies; 120+ messages in thread
From: Ingo Molnar @ 2018-01-23  9:27 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Linus Torvalds, KarimAllah Ahmed, Linux Kernel Mailing List,
	Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Arjan van de Ven,
	Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams,
	Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar,
	Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott,
	Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra,
	Radim Krčmář,
	Thomas Gleixner, Tim Chen, Tom Lendacky, KVM list,
	the arch/x86 maintainers, Arjan Van De Ven


* Ingo Molnar <mingo@kernel.org> wrote:

> Is there a testcase for the SkyLake 16-deep-call-stack problem that I could run? 
> Is there a description of the exact speculative execution vulnerability that has 
> to be addressed to begin with?

Ok, so for now I'm assuming that this is the 16 entries return-stack-buffer 
underflow condition where SkyLake falls back to the branch predictor (while other 
CPUs wrap the buffer).

> If this approach is workable I'd much prefer it to any MSR writes in the syscall 
> entry path not just because it's fast enough in practice to not be turned off by 
> everyone, but also because everyone would agree that per function call overhead 
> needs to go away on new CPUs. Both deployment and backporting is also _much_ more 
> flexible, simpler, faster and more complete than microcode/firmware or compiler 
> based solutions.
> 
> Assuming the vulnerability can be addressed via this route that is, which is a big 
> assumption!

So I talked this over with PeterZ, and I think it's all doable:

 - the CALL __fentry__ callbacks maintain the depth tracking (on the kernel 
   stack, fast to access), and issue an "RSB-stuffing sequence" when depth reaches
   16 entries.

 - "the RSB-stuffing sequence" is a return trampoline that pushes a CALL on the 
   stack which is executed on the RET.

 - All asynchronous contexts (IRQs, NMIs, etc.) stuff the RSB before IRET. (The 
   tracking could probably made IRQ and maybe even NMI safe, but the worst-case 
   nesting scenarios make my head ache.)

I.e. IBRS can be mostly replaced with a kernel based solution that is better than 
IBRS and which does not negatively impact any other non-SkyLake CPUs or general 
code quality.

I.e. a full upstream Spectre solution.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
@ 2018-01-23  9:27                     ` Ingo Molnar
  0 siblings, 0 replies; 120+ messages in thread
From: Ingo Molnar @ 2018-01-23  9:27 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Linus Torvalds, KarimAllah Ahmed, Linux Kernel Mailing List,
	Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Arjan van de Ven,
	Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams,
	Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar,
	Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott


* Ingo Molnar <mingo@kernel.org> wrote:

> Is there a testcase for the SkyLake 16-deep-call-stack problem that I could run? 
> Is there a description of the exact speculative execution vulnerability that has 
> to be addressed to begin with?

Ok, so for now I'm assuming that this is the 16 entries return-stack-buffer 
underflow condition where SkyLake falls back to the branch predictor (while other 
CPUs wrap the buffer).

> If this approach is workable I'd much prefer it to any MSR writes in the syscall 
> entry path not just because it's fast enough in practice to not be turned off by 
> everyone, but also because everyone would agree that per function call overhead 
> needs to go away on new CPUs. Both deployment and backporting is also _much_ more 
> flexible, simpler, faster and more complete than microcode/firmware or compiler 
> based solutions.
> 
> Assuming the vulnerability can be addressed via this route that is, which is a big 
> assumption!

So I talked this over with PeterZ, and I think it's all doable:

 - the CALL __fentry__ callbacks maintain the depth tracking (on the kernel 
   stack, fast to access), and issue an "RSB-stuffing sequence" when depth reaches
   16 entries.

 - "the RSB-stuffing sequence" is a return trampoline that pushes a CALL on the 
   stack which is executed on the RET.

 - All asynchronous contexts (IRQs, NMIs, etc.) stuff the RSB before IRET. (The 
   tracking could probably made IRQ and maybe even NMI safe, but the worst-case 
   nesting scenarios make my head ache.)

I.e. IBRS can be mostly replaced with a kernel based solution that is better than 
IBRS and which does not negatively impact any other non-SkyLake CPUs or general 
code quality.

I.e. a full upstream Spectre solution.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
  2018-01-23  7:29                 ` Ingo Molnar
@ 2018-01-23  7:53                   ` Ingo Molnar
  -1 siblings, 0 replies; 120+ messages in thread
From: Ingo Molnar @ 2018-01-23  7:53 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Linus Torvalds, KarimAllah Ahmed, Linux Kernel Mailing List,
	Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Arjan van de Ven,
	Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams,
	Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar,
	Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott,
	Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra,
	Radim Krčmář,
	Thomas Gleixner, Tim Chen, Tom Lendacky, KVM list,
	the arch/x86 maintainers, Arjan Van De Ven


* Ingo Molnar <mingo@kernel.org> wrote:

> * David Woodhouse <dwmw2@infradead.org> wrote:
> 
> > But wait, why did I say "mostly"? Well, not everyone has a retpoline
> > compiler yet... but OK, screw them; they need to update.
> > 
> > Then there's Skylake, and that generation of CPU cores. For complicated
> > reasons they actually end up being vulnerable not just on indirect
> > branches, but also on a 'ret' in some circumstances (such as 16+ CALLs
> > in a deep chain).
> > 
> > The IBRS solution, ugly though it is, did address that. Retpoline
> > doesn't. There are patches being floated to detect and prevent deep
> > stacks, and deal with some of the other special cases that bite on SKL,
> > but those are icky too. And in fact IBRS performance isn't anywhere
> > near as bad on this generation of CPUs as it is on earlier CPUs
> > *anyway*, which makes it not quite so insane to *contemplate* using it
> > as Intel proposed.
> 
> There's another possible method to avoid deep stacks on Skylake, without compiler 
> support:
> 
>   - Use the existing mcount based function tracing live patching machinery
>     (CONFIG_FUNCTION_TRACER=y) to install a _very_ fast and simple stack depth 
>     tracking tracer which would issue a retpoline when stack depth crosses 
>     boundaries of ~16 entries.

The patch below demonstrates the principle, it forcibly enables dynamic ftrace 
patching (CONFIG_DYNAMIC_FTRACE=y et al) and turns mcount/__fentry__ into a RET:

  ffffffff81a01a40 <__fentry__>:
  ffffffff81a01a40:       c3                      retq   

This would have to be extended with (very simple) call stack depth tracking (just 
3 more instructions would do in the fast path I believe) and a suitable SkyLake 
workaround (and also has to play nice with the ftrace callbacks).

On non-SkyLake the overhead would be 0 cycles.

On SkyLake this would add an overhead of maybe 2-3 cycles per function call and 
obviously all this code and data would be very cache hot. Given that the average 
number of function calls per system call is around a dozen, this would be _much_ 
faster than any microcode/MSR based approach.

Is there a testcase for the SkyLake 16-deep-call-stack problem that I could run? 
Is there a description of the exact speculative execution vulnerability that has 
to be addressed to begin with?

If this approach is workable I'd much prefer it to any MSR writes in the syscall 
entry path not just because it's fast enough in practice to not be turned off by 
everyone, but also because everyone would agree that per function call overhead 
needs to go away on new CPUs. Both deployment and backporting is also _much_ more 
flexible, simpler, faster and more complete than microcode/firmware or compiler 
based solutions.

Assuming the vulnerability can be addressed via this route that is, which is a big 
assumption!

Thanks,

	Ingo

 arch/x86/Kconfig            | 3 +++
 arch/x86/kernel/ftrace_64.S | 1 +
 2 files changed, 4 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 423e4b64e683..df471538a79c 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -133,6 +133,8 @@ config X86
 	select HAVE_DMA_CONTIGUOUS
 	select HAVE_DYNAMIC_FTRACE
 	select HAVE_DYNAMIC_FTRACE_WITH_REGS
+	select DYNAMIC_FTRACE
+	select DYNAMIC_FTRACE_WITH_REGS
 	select HAVE_EBPF_JIT			if X86_64
 	select HAVE_EFFICIENT_UNALIGNED_ACCESS
 	select HAVE_EXIT_THREAD
@@ -140,6 +142,7 @@ config X86
 	select HAVE_FTRACE_MCOUNT_RECORD
 	select HAVE_FUNCTION_GRAPH_TRACER
 	select HAVE_FUNCTION_TRACER
+	select FUNCTION_TRACER
 	select HAVE_GCC_PLUGINS
 	select HAVE_HW_BREAKPOINT
 	select HAVE_IDE
diff --git a/arch/x86/kernel/ftrace_64.S b/arch/x86/kernel/ftrace_64.S
index 7cb8ba08beb9..1e219e0f2887 100644
--- a/arch/x86/kernel/ftrace_64.S
+++ b/arch/x86/kernel/ftrace_64.S
@@ -19,6 +19,7 @@ EXPORT_SYMBOL(__fentry__)
 # define function_hook	mcount
 EXPORT_SYMBOL(mcount)
 #endif
+	ret
 
 /* All cases save the original rbp (8 bytes) */
 #ifdef CONFIG_FRAME_POINTER

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
@ 2018-01-23  7:53                   ` Ingo Molnar
  0 siblings, 0 replies; 120+ messages in thread
From: Ingo Molnar @ 2018-01-23  7:53 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Linus Torvalds, KarimAllah Ahmed, Linux Kernel Mailing List,
	Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Arjan van de Ven,
	Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams,
	Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar,
	Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott

* Ingo Molnar <mingo@kernel.org> wrote:

> * David Woodhouse <dwmw2@infradead.org> wrote:
> 
> > But wait, why did I say "mostly"? Well, not everyone has a retpoline
> > compiler yet... but OK, screw them; they need to update.
> > 
> > Then there's Skylake, and that generation of CPU cores. For complicated
> > reasons they actually end up being vulnerable not just on indirect
> > branches, but also on a 'ret' in some circumstances (such as 16+ CALLs
> > in a deep chain).
> > 
> > The IBRS solution, ugly though it is, did address that. Retpoline
> > doesn't. There are patches being floated to detect and prevent deep
> > stacks, and deal with some of the other special cases that bite on SKL,
> > but those are icky too. And in fact IBRS performance isn't anywhere
> > near as bad on this generation of CPUs as it is on earlier CPUs
> > *anyway*, which makes it not quite so insane to *contemplate* using it
> > as Intel proposed.
> 
> There's another possible method to avoid deep stacks on Skylake, without compiler 
> support:
> 
>   - Use the existing mcount based function tracing live patching machinery
>     (CONFIG_FUNCTION_TRACER=y) to install a _very_ fast and simple stack depth 
>     tracking tracer which would issue a retpoline when stack depth crosses 
>     boundaries of ~16 entries.

The patch below demonstrates the principle, it forcibly enables dynamic ftrace 
patching (CONFIG_DYNAMIC_FTRACE=y et al) and turns mcount/__fentry__ into a RET:

  ffffffff81a01a40 <__fentry__>:
  ffffffff81a01a40:       c3                      retq   

This would have to be extended with (very simple) call stack depth tracking (just 
3 more instructions would do in the fast path I believe) and a suitable SkyLake 
workaround (and also has to play nice with the ftrace callbacks).

On non-SkyLake the overhead would be 0 cycles.

On SkyLake this would add an overhead of maybe 2-3 cycles per function call and 
obviously all this code and data would be very cache hot. Given that the average 
number of function calls per system call is around a dozen, this would be _much_ 
faster than any microcode/MSR based approach.

Is there a testcase for the SkyLake 16-deep-call-stack problem that I could run? 
Is there a description of the exact speculative execution vulnerability that has 
to be addressed to begin with?

If this approach is workable I'd much prefer it to any MSR writes in the syscall 
entry path not just because it's fast enough in practice to not be turned off by 
everyone, but also because everyone would agree that per function call overhead 
needs to go away on new CPUs. Both deployment and backporting is also _much_ more 
flexible, simpler, faster and more complete than microcode/firmware or compiler 
based solutions.

Assuming the vulnerability can be addressed via this route that is, which is a big 
assumption!

Thanks,

	Ingo

 arch/x86/Kconfig            | 3 +++
 arch/x86/kernel/ftrace_64.S | 1 +
 2 files changed, 4 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 423e4b64e683..df471538a79c 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -133,6 +133,8 @@ config X86
 	select HAVE_DMA_CONTIGUOUS
 	select HAVE_DYNAMIC_FTRACE
 	select HAVE_DYNAMIC_FTRACE_WITH_REGS
+	select DYNAMIC_FTRACE
+	select DYNAMIC_FTRACE_WITH_REGS
 	select HAVE_EBPF_JIT			if X86_64
 	select HAVE_EFFICIENT_UNALIGNED_ACCESS
 	select HAVE_EXIT_THREAD
@@ -140,6 +142,7 @@ config X86
 	select HAVE_FTRACE_MCOUNT_RECORD
 	select HAVE_FUNCTION_GRAPH_TRACER
 	select HAVE_FUNCTION_TRACER
+	select FUNCTION_TRACER
 	select HAVE_GCC_PLUGINS
 	select HAVE_HW_BREAKPOINT
 	select HAVE_IDE
diff --git a/arch/x86/kernel/ftrace_64.S b/arch/x86/kernel/ftrace_64.S
index 7cb8ba08beb9..1e219e0f2887 100644
--- a/arch/x86/kernel/ftrace_64.S
+++ b/arch/x86/kernel/ftrace_64.S
@@ -19,6 +19,7 @@ EXPORT_SYMBOL(__fentry__)
 # define function_hook	mcount
 EXPORT_SYMBOL(mcount)
 #endif
+	ret

 /* All cases save the original rbp (8 bytes) */
 #ifdef CONFIG_FRAME_POINTER

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
  2018-01-22 16:27               ` David Woodhouse
@ 2018-01-23  7:29                 ` Ingo Molnar
  -1 siblings, 0 replies; 120+ messages in thread
From: Ingo Molnar @ 2018-01-23  7:29 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Linus Torvalds, KarimAllah Ahmed, Linux Kernel Mailing List,
	Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Arjan van de Ven,
	Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams,
	Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar,
	Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott,
	Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra,
	Radim Krčmář,
	Thomas Gleixner, Tim Chen, Tom Lendacky, KVM list,
	the arch/x86 maintainers, Arjan Van De Ven

* David Woodhouse <dwmw2@infradead.org> wrote:

> But wait, why did I say "mostly"? Well, not everyone has a retpoline
> compiler yet... but OK, screw them; they need to update.
> 
> Then there's Skylake, and that generation of CPU cores. For complicated
> reasons they actually end up being vulnerable not just on indirect
> branches, but also on a 'ret' in some circumstances (such as 16+ CALLs
> in a deep chain).
> 
> The IBRS solution, ugly though it is, did address that. Retpoline
> doesn't. There are patches being floated to detect and prevent deep
> stacks, and deal with some of the other special cases that bite on SKL,
> but those are icky too. And in fact IBRS performance isn't anywhere
> near as bad on this generation of CPUs as it is on earlier CPUs
> *anyway*, which makes it not quite so insane to *contemplate* using it
> as Intel proposed.

There's another possible method to avoid deep stacks on Skylake, without compiler 
support:

  - Use the existing mcount based function tracing live patching machinery
    (CONFIG_FUNCTION_TRACER=y) to install a _very_ fast and simple stack depth 
    tracking tracer which would issue a retpoline when stack depth crosses 
    boundaries of ~16 entries.

The overhead of that would _still_ very likely be much cheaper than a hundreds 
(thousands) of cycle expensive MSR write at every kernel entry (syscall entry, IRQ 
entry, etc.).

Note the huge number of advantages:

 - All distro kernels already enable the mcount based patching options, so there's
   literally zero overhead to anything except SkyLake.

 - It is fully kernel patching based and can be activated on Skylake only

 - It doesn't require any microcode updates, so it will work on all existing CPUs
   with no firmware or microcode modificatons

 - It doesn't require any compiler updates

 - SkyLake performance is very likely to be much less fragile than relying on a 
   hastily deployed microcode hack

 - The "SkyLake stack depth tracer" can be tested on other CPUs as well in debug 
   builds, broadening the testing base

 - The tracer is very obviously simple and reviewable, and we can forget about it
   in the far future.

 - It's much more backportable to older kernels: should there be a new class of
   exploits then this machinery could be updated to cover that too - while 
   upgrades to newer kernels would give the higher performant solution.

Yes, there are some practical complications like always enabling 
CONFIG_FUNCTION_TRACER=y on x86, plus the ftrace interaction has to be sorted out, 
but in practice it's enabled on all major distros anyway, due to ftrace.

Is there any reason why this wouldn't work?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
@ 2018-01-23  7:29                 ` Ingo Molnar
  0 siblings, 0 replies; 120+ messages in thread
From: Ingo Molnar @ 2018-01-23  7:29 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Linus Torvalds, KarimAllah Ahmed, Linux Kernel Mailing List,
	Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Arjan van de Ven,
	Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams,
	Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar,
	Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott

* David Woodhouse <dwmw2@infradead.org> wrote:

> But wait, why did I say "mostly"? Well, not everyone has a retpoline
> compiler yet... but OK, screw them; they need to update.
> 
> Then there's Skylake, and that generation of CPU cores. For complicated
> reasons they actually end up being vulnerable not just on indirect
> branches, but also on a 'ret' in some circumstances (such as 16+ CALLs
> in a deep chain).
> 
> The IBRS solution, ugly though it is, did address that. Retpoline
> doesn't. There are patches being floated to detect and prevent deep
> stacks, and deal with some of the other special cases that bite on SKL,
> but those are icky too. And in fact IBRS performance isn't anywhere
> near as bad on this generation of CPUs as it is on earlier CPUs
> *anyway*, which makes it not quite so insane to *contemplate* using it
> as Intel proposed.

There's another possible method to avoid deep stacks on Skylake, without compiler 
support:

  - Use the existing mcount based function tracing live patching machinery
    (CONFIG_FUNCTION_TRACER=y) to install a _very_ fast and simple stack depth 
    tracking tracer which would issue a retpoline when stack depth crosses 
    boundaries of ~16 entries.

The overhead of that would _still_ very likely be much cheaper than a hundreds 
(thousands) of cycle expensive MSR write at every kernel entry (syscall entry, IRQ 
entry, etc.).

Note the huge number of advantages:

 - All distro kernels already enable the mcount based patching options, so there's
   literally zero overhead to anything except SkyLake.

 - It is fully kernel patching based and can be activated on Skylake only

 - It doesn't require any microcode updates, so it will work on all existing CPUs
   with no firmware or microcode modificatons

 - It doesn't require any compiler updates

 - SkyLake performance is very likely to be much less fragile than relying on a 
   hastily deployed microcode hack

 - The "SkyLake stack depth tracer" can be tested on other CPUs as well in debug 
   builds, broadening the testing base

 - The tracer is very obviously simple and reviewable, and we can forget about it
   in the far future.

 - It's much more backportable to older kernels: should there be a new class of
   exploits then this machinery could be updated to cover that too - while 
   upgrades to newer kernels would give the higher performant solution.

Yes, there are some practical complications like always enabling 
CONFIG_FUNCTION_TRACER=y on x86, plus the ftrace interaction has to be sorted out, 
but in practice it's enabled on all major distros anyway, due to ftrace.

Is there any reason why this wouldn't work?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
@ 2018-01-22 22:15 Luke Kenneth Casson Leighton
  0 siblings, 0 replies; 120+ messages in thread
From: Luke Kenneth Casson Leighton @ 2018-01-22 22:15 UTC (permalink / raw)
  To: David Woodhouse, torvalds, Linux Kernel Mailing List

[apologies for breaking the reply-thread]

David wrote:

> I think we've covered the technical part of this now, not that you like
> it â not that any of us *like* it. But since the peanut gallery is
> paying lots of attention it's probably worth explaining it a little
> more for their benefit.

i'm in taiwan (happily just accidentally landed in a position to do a
meltdown-less, spectre-less entirely libre RISC-V SoC), i got
wasabi-flavoured crisps imported from korea, and a bag of pistachios
that come with their own moisture absorbing sachet, does that count?

david, there is actually a significant benefit to what you're doing,
not just peanut-gallery-ing: this is a cluster-f*** where every single
intel (and amd) engineer is prevented and prohibited from talking
directly to you as they develop the microcode.  they're effectively
indentured slaves (all employees are), and they've been ignored
and demoralised.  it's a lesson that i'm not sure their management
are capable of learning, despite the head of the intel open source
innovation centre has been trying to get across to them for many
years: OPEN UP THE FUCKING FIRMWARE AND MICROCODE.

so unfortunately, the burden is on you, the members of the linux
kernel  team, to read between the lines, express things clearly here
on LKML so that the intel engineers who are NOT PERMITTED
to talk directly to you can at least get some clear feedback.
the burden is therefore *on you* - like it or not - to indicate *to them*
that you fully grasp the technical situation... whilst at the same time
not being permitted access to the fucking microcode gaah what
a cluster-f*** anyway you get my drift, right?  you're doing the
right thing.

anyway good luck, it's all highly entertaining, but please don't forget
that you have a huge responsibility here.  oh, and intel management?
this situation is your equivalent of heartbleed and shellshock.  you get
your fucking act together and put a much larger contribution into some
pot somewhere e.g. the linux foundation, to make up for fucking
around and freeloading off of the linux kernel team's expertise and
time, d'ya get my drift?

l.

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
  2018-01-21 22:27             ` Linus Torvalds
@ 2018-01-22 16:27               ` David Woodhouse
  -1 siblings, 0 replies; 120+ messages in thread
From: David Woodhouse @ 2018-01-22 16:27 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: KarimAllah Ahmed, Linux Kernel Mailing List, Andi Kleen,
	Andrea Arcangeli, Andy Lutomirski, Arjan van de Ven, Ashok Raj,
	Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen,
	Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar,
	Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott,
	Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra,
	Radim Krčmář,
	Thomas Gleixner, Tim Chen, Tom Lendacky, KVM list,
	the arch/x86 maintainers, Arjan Van De Ven

[-- Attachment #1: Type: text/plain, Size: 5893 bytes --]

On Sun, 2018-01-21 at 14:27 -0800, Linus Torvalds wrote:
> On Sun, Jan 21, 2018 at 2:00 PM, David Woodhouse <dwmw2@infradead.org> wrote:
> >>
> >> The patches do things like add the garbage MSR writes to the kernel
> >> entry/exit points. That's insane. That says "we're trying to protect
> >> the kernel".  We already have retpoline there, with less overhead.
> >
> > You're looking at IBRS usage, not IBPB. They are different things.
> 
> Ehh. Odd intel naming detail.
> 
> If you look at this series, it very much does that kernel entry/exit
> stuff. It was patch 10/10, iirc. In fact, the patch I was replying to
> was explicitly setting that garbage up.
> 
> And I really don't want to see these garbage patches just mindlessly
> sent around.

I think we've covered the technical part of this now, not that you like
it — not that any of us *like* it. But since the peanut gallery is
paying lots of attention it's probably worth explaining it a little
more for their benefit.

This is all about Spectre variant 2, where the CPU can be tricked into
mispredicting the target of an indirect branch. And I'm specifically
looking at what we can do on *current* hardware, where we're limited to
the hacks they can manage to add in the microcode.

The new microcode from Intel and AMD adds three new features.

One new feature (IBPB) is a complete barrier for branch prediction.
After frobbing this, no branch targets learned earlier are going to be
used. It's kind of expensive (order of magnitude ~4000 cycles).

The second (STIBP) protects a hyperthread sibling from following branch
predictions which were learned on another sibling. You *might* want
this when running unrelated processes in userspace, for example. Or
different VM guests running on HT siblings.

The third feature (IBRS) is more complicated. It's designed to be
set when you enter a more privileged execution mode (i.e. the kernel).
It prevents branch targets learned in a less-privileged execution mode,
BEFORE IT WAS MOST RECENTLY SET, from taking effect. But it's not just
a 'set-and-forget' feature, it also has barrier-like semantics and
needs to be set on *each* entry into the kernel (from userspace or a VM
guest). It's *also* expensive. And a vile hack, but for a while it was
the only option we had.

Even with IBRS, the CPU cannot tell the difference between different
userspace processes, and between different VM guests. So in addition to
IBRS to protect the kernel, we need the full IBPB barrier on context
switch and vmexit. And maybe STIBP while they're running.

Then along came Paul with the cunning plan of "oh, indirect branches
can be exploited? Screw it, let's not have any of *those* then", which
is retpoline. And it's a *lot* faster than frobbing IBRS on every entry
into the kernel. It's a massive performance win.

So now we *mostly* don't need IBRS. We build with retpoline, use IBPB
on context switches/vmexit (which is in the first part of this patch
series before IBRS is added), and we're safe. We even refactored the
patch series to put retpoline first.

But wait, why did I say "mostly"? Well, not everyone has a retpoline
compiler yet... but OK, screw them; they need to update.

Then there's Skylake, and that generation of CPU cores. For complicated
reasons they actually end up being vulnerable not just on indirect
branches, but also on a 'ret' in some circumstances (such as 16+ CALLs
in a deep chain).

The IBRS solution, ugly though it is, did address that. Retpoline
doesn't. There are patches being floated to detect and prevent deep
stacks, and deal with some of the other special cases that bite on SKL,
but those are icky too. And in fact IBRS performance isn't anywhere
near as bad on this generation of CPUs as it is on earlier CPUs
*anyway*, which makes it not quite so insane to *contemplate* using it
as Intel proposed.

That's why my initial idea, as implemented in this RFC patchset, was to
stick with IBRS on Skylake, and use retpoline everywhere else. I'll
give you "garbage patches", but they weren't being "just mindlessly
sent around". If we're going to drop IBRS support and accept the
caveats, then let's do it as a conscious decision having seen what it
would look like, not just drop it quietly because poor Davey is too
scared that Linus might shout at him again. :)

I have seen *hand-wavy* analyses of the Skylake thing that mean I'm not
actually lying awake at night fretting about it, but nothing concrete
that really says it's OK.

If you view retpoline as a performance optimisation, which is how it
first arrived, then it's rather unconventional to say "well, it only
opens a *little* bit of a security hole but it does go nice and fast so
let's do it".

But fine, I'm content with ditching the use of IBRS to protect the
kernel, and I'm not even surprised. There's a *reason* we put it last
in the series, as both the most contentious and most dispensable part.
I'd be *happier* with a coherent analysis showing Skylake is still OK,
but hey-ho, screw Skylake.

The early part of the series adds the new feature bits and detects when
it can turn KPTI off on non-Meltdown-vulnerable Intel CPUs, and also
supports the IBPB barrier that we need to make retpoline complete. That
much I think we definitely *do* want. There have been a bunch of us
working on this behind the scenes; one of us will probably post that
bit in the next day or so.

I think we also want to expose IBRS to VM guests, even if we don't use
it ourselves. Because Windows guests (and RHEL guests; yay!) do use it.

If we can be done with the shouty part, I'd actually quite like to have
a sensible discussion about when, if ever, we do IBPB on context switch
(ptraceability and dumpable have both been suggested) and when, if
ever, we set STIPB in userspace.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
@ 2018-01-22 16:27               ` David Woodhouse
  0 siblings, 0 replies; 120+ messages in thread
From: David Woodhouse @ 2018-01-22 16:27 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: KarimAllah Ahmed, Linux Kernel Mailing List, Andi Kleen,
	Andrea Arcangeli, Andy Lutomirski, Arjan van de Ven, Ashok Raj,
	Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen,
	Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar,
	Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott,
	Masami Hiramatsu, Paolo

[-- Attachment #1: Type: text/plain, Size: 5893 bytes --]

On Sun, 2018-01-21 at 14:27 -0800, Linus Torvalds wrote:
> On Sun, Jan 21, 2018 at 2:00 PM, David Woodhouse <dwmw2@infradead.org> wrote:
> >>
> >> The patches do things like add the garbage MSR writes to the kernel
> >> entry/exit points. That's insane. That says "we're trying to protect
> >> the kernel".  We already have retpoline there, with less overhead.
> >
> > You're looking at IBRS usage, not IBPB. They are different things.
> 
> Ehh. Odd intel naming detail.
> 
> If you look at this series, it very much does that kernel entry/exit
> stuff. It was patch 10/10, iirc. In fact, the patch I was replying to
> was explicitly setting that garbage up.
> 
> And I really don't want to see these garbage patches just mindlessly
> sent around.

I think we've covered the technical part of this now, not that you like
it — not that any of us *like* it. But since the peanut gallery is
paying lots of attention it's probably worth explaining it a little
more for their benefit.

This is all about Spectre variant 2, where the CPU can be tricked into
mispredicting the target of an indirect branch. And I'm specifically
looking at what we can do on *current* hardware, where we're limited to
the hacks they can manage to add in the microcode.

The new microcode from Intel and AMD adds three new features.

One new feature (IBPB) is a complete barrier for branch prediction.
After frobbing this, no branch targets learned earlier are going to be
used. It's kind of expensive (order of magnitude ~4000 cycles).

The second (STIBP) protects a hyperthread sibling from following branch
predictions which were learned on another sibling. You *might* want
this when running unrelated processes in userspace, for example. Or
different VM guests running on HT siblings.

The third feature (IBRS) is more complicated. It's designed to be
set when you enter a more privileged execution mode (i.e. the kernel).
It prevents branch targets learned in a less-privileged execution mode,
BEFORE IT WAS MOST RECENTLY SET, from taking effect. But it's not just
a 'set-and-forget' feature, it also has barrier-like semantics and
needs to be set on *each* entry into the kernel (from userspace or a VM
guest). It's *also* expensive. And a vile hack, but for a while it was
the only option we had.

Even with IBRS, the CPU cannot tell the difference between different
userspace processes, and between different VM guests. So in addition to
IBRS to protect the kernel, we need the full IBPB barrier on context
switch and vmexit. And maybe STIBP while they're running.

Then along came Paul with the cunning plan of "oh, indirect branches
can be exploited? Screw it, let's not have any of *those* then", which
is retpoline. And it's a *lot* faster than frobbing IBRS on every entry
into the kernel. It's a massive performance win.

So now we *mostly* don't need IBRS. We build with retpoline, use IBPB
on context switches/vmexit (which is in the first part of this patch
series before IBRS is added), and we're safe. We even refactored the
patch series to put retpoline first.

But wait, why did I say "mostly"? Well, not everyone has a retpoline
compiler yet... but OK, screw them; they need to update.

Then there's Skylake, and that generation of CPU cores. For complicated
reasons they actually end up being vulnerable not just on indirect
branches, but also on a 'ret' in some circumstances (such as 16+ CALLs
in a deep chain).

The IBRS solution, ugly though it is, did address that. Retpoline
doesn't. There are patches being floated to detect and prevent deep
stacks, and deal with some of the other special cases that bite on SKL,
but those are icky too. And in fact IBRS performance isn't anywhere
near as bad on this generation of CPUs as it is on earlier CPUs
*anyway*, which makes it not quite so insane to *contemplate* using it
as Intel proposed.

That's why my initial idea, as implemented in this RFC patchset, was to
stick with IBRS on Skylake, and use retpoline everywhere else. I'll
give you "garbage patches", but they weren't being "just mindlessly
sent around". If we're going to drop IBRS support and accept the
caveats, then let's do it as a conscious decision having seen what it
would look like, not just drop it quietly because poor Davey is too
scared that Linus might shout at him again. :)

I have seen *hand-wavy* analyses of the Skylake thing that mean I'm not
actually lying awake at night fretting about it, but nothing concrete
that really says it's OK.

If you view retpoline as a performance optimisation, which is how it
first arrived, then it's rather unconventional to say "well, it only
opens a *little* bit of a security hole but it does go nice and fast so
let's do it".

But fine, I'm content with ditching the use of IBRS to protect the
kernel, and I'm not even surprised. There's a *reason* we put it last
in the series, as both the most contentious and most dispensable part.
I'd be *happier* with a coherent analysis showing Skylake is still OK,
but hey-ho, screw Skylake.

The early part of the series adds the new feature bits and detects when
it can turn KPTI off on non-Meltdown-vulnerable Intel CPUs, and also
supports the IBPB barrier that we need to make retpoline complete. That
much I think we definitely *do* want. There have been a bunch of us
working on this behind the scenes; one of us will probably post that
bit in the next day or so.

I think we also want to expose IBRS to VM guests, even if we don't use
it ourselves. Because Windows guests (and RHEL guests; yay!) do use it.

If we can be done with the shouty part, I'd actually quite like to have
a sensible discussion about when, if ever, we do IBPB on context switch
(ptraceability and dumpable have both been suggested) and when, if
ever, we set STIPB in userspace.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
  2018-01-21 22:00           ` David Woodhouse
@ 2018-01-21 22:27             ` Linus Torvalds
  -1 siblings, 0 replies; 120+ messages in thread
From: Linus Torvalds @ 2018-01-21 22:27 UTC (permalink / raw)
  To: David Woodhouse
  Cc: KarimAllah Ahmed, Linux Kernel Mailing List, Andi Kleen,
	Andrea Arcangeli, Andy Lutomirski, Arjan van de Ven, Ashok Raj,
	Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen,
	Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar,
	Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott,
	Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra,
	Radim Krčmář,
	Thomas Gleixner, Tim Chen, Tom Lendacky, KVM list,
	the arch/x86 maintainers, Arjan Van De Ven

On Sun, Jan 21, 2018 at 2:00 PM, David Woodhouse <dwmw2@infradead.org> wrote:
>>
>> The patches do things like add the garbage MSR writes to the kernel
>> entry/exit points. That's insane. That says "we're trying to protect
>> the kernel".  We already have retpoline there, with less overhead.
>
> You're looking at IBRS usage, not IBPB. They are different things.

Ehh. Odd intel naming detail.

If you look at this series, it very much does that kernel entry/exit
stuff. It was patch 10/10, iirc. In fact, the patch I was replying to
was explicitly setting that garbage up.

And I really don't want to see these garbage patches just mindlessly
sent around.

                  Linus

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
@ 2018-01-21 22:27             ` Linus Torvalds
  0 siblings, 0 replies; 120+ messages in thread
From: Linus Torvalds @ 2018-01-21 22:27 UTC (permalink / raw)
  To: David Woodhouse
  Cc: KarimAllah Ahmed, Linux Kernel Mailing List, Andi Kleen,
	Andrea Arcangeli, Andy Lutomirski, Arjan van de Ven, Ashok Raj,
	Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen,
	Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar,
	Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott,
	Masami Hiramatsu, Paolo

On Sun, Jan 21, 2018 at 2:00 PM, David Woodhouse <dwmw2@infradead.org> wrote:
>>
>> The patches do things like add the garbage MSR writes to the kernel
>> entry/exit points. That's insane. That says "we're trying to protect
>> the kernel".  We already have retpoline there, with less overhead.
>
> You're looking at IBRS usage, not IBPB. They are different things.

Ehh. Odd intel naming detail.

If you look at this series, it very much does that kernel entry/exit
stuff. It was patch 10/10, iirc. In fact, the patch I was replying to
was explicitly setting that garbage up.

And I really don't want to see these garbage patches just mindlessly
sent around.

                  Linus

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
  2018-01-21 21:35         ` Linus Torvalds
@ 2018-01-21 22:00           ` David Woodhouse
  -1 siblings, 0 replies; 120+ messages in thread
From: David Woodhouse @ 2018-01-21 22:00 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: KarimAllah Ahmed, Linux Kernel Mailing List, Andi Kleen,
	Andrea Arcangeli, Andy Lutomirski, Arjan van de Ven, Ashok Raj,
	Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen,
	Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar,
	Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott,
	Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra,
	Radim Krčmář,
	Thomas Gleixner, Tim Chen, Tom Lendacky, KVM list,
	the arch/x86 maintainers, Arjan Van De Ven

[-- Attachment #1: Type: text/plain, Size: 3369 bytes --]

On Sun, 2018-01-21 at 13:35 -0800, Linus Torvalds wrote:
> On Sun, Jan 21, 2018 at 12:28 PM, David Woodhouse  wrote:
> > As a hack for existing CPUs, it's just about tolerable — as long as it
> > can die entirely by the next generation.
>
> That's part of the big problem here. The speculation control cpuid
> stuff shows that Intel actually seems to plan on doing the right thing
> for meltdown (the main question being _when_). Which is not a huge
> surprise, since it should be easy to fix, and it's a really honking
> big hole to drive through. Not doing the right thing for meltdown
> would be completely unacceptable.
> 
> So the IBRS garbage implies that Intel is _not_ planning on doing the
> right thing for the indirect branch speculation.
> 
> Honestly, that's completely unacceptable too.

Agreed. I've been saying that since I first saw the IBRS_ALL proposal.
There's *no* good reason for it to be opt-in. Just fix it!

> > So the part is I think is odd is the IBRS_ALL feature, where a future
> > CPU will advertise "I am able to be not broken" and then you have to
> > set the IBRS bit once at boot time to *ask* it not to be broken. That
> > part is weird, because it ought to have been treated like the RDCL_NO
> > bit — just "you don't have to worry any more, it got better".
>
> It's not "weird" at all. It's very much part of the whole "this is
> complete garbage" issue.
> 
> The whole IBRS_ALL feature to me very clearly says "Intel is not
> serious about this, we'll have a ugly hack that will be so expensive
> that we don't want to enable it by default, because that would look
> bad in benchmarks".
> 
> So instead they try to push the garbage down to us. And they are doing
> it entirely wrong, even from a technical standpoint.

Right. The whole IBRS/IBPB thing as a nasty hack in the short term I
could live with, but it's the long-term implications of IBRS_ALL that
I'm unhappy about.

My understanding was that the IBRS_ALL performance was supposed to not
suck — to the extent that we'd just turn it on and then ALTERNATIVE out
the retpolines, and that would be the best option.

But if that's the case, why are they making it an option, and not just
doing the same as RDCL_NO does for "we fixed Meltdown"?

> > We do need the IBPB feature to complete the protection that retpoline
> > gives us — it's that or rebuild all of userspace with retpoline.
>
> BULLSHIT.
> 
> Have you _looked_ at the patches you are talking about?  You should
> have - several of them bear your name.
> 
> The patches do things like add the garbage MSR writes to the kernel
> entry/exit points. That's insane. That says "we're trying to protect
> the kernel".  We already have retpoline there, with less overhead.

You're looking at IBRS usage, not IBPB. They are different things.

Yes, the one you're looking at really *is* trying to protect the
kernel, and you're right that it's largely redundant with retpoline.
(Assuming we can live with the implications on Skylake, as I said.)

> If this was about flushing the BTB at actual context switches between
> different users, I'd believe you. But that's not at all what the
> patches do.

That's what the *IBPB* patches do. Those were deliberately put first in
the series (and in fact that's where I stopped, when I posted).

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
@ 2018-01-21 22:00           ` David Woodhouse
  0 siblings, 0 replies; 120+ messages in thread
From: David Woodhouse @ 2018-01-21 22:00 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: KarimAllah Ahmed, Linux Kernel Mailing List, Andi Kleen,
	Andrea Arcangeli, Andy Lutomirski, Arjan van de Ven, Ashok Raj,
	Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen,
	Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar,
	Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott,
	Masami Hiramatsu, Paolo

[-- Attachment #1: Type: text/plain, Size: 3369 bytes --]

On Sun, 2018-01-21 at 13:35 -0800, Linus Torvalds wrote:
> On Sun, Jan 21, 2018 at 12:28 PM, David Woodhouse  wrote:
> > As a hack for existing CPUs, it's just about tolerable — as long as it
> > can die entirely by the next generation.
>
> That's part of the big problem here. The speculation control cpuid
> stuff shows that Intel actually seems to plan on doing the right thing
> for meltdown (the main question being _when_). Which is not a huge
> surprise, since it should be easy to fix, and it's a really honking
> big hole to drive through. Not doing the right thing for meltdown
> would be completely unacceptable.
> 
> So the IBRS garbage implies that Intel is _not_ planning on doing the
> right thing for the indirect branch speculation.
> 
> Honestly, that's completely unacceptable too.

Agreed. I've been saying that since I first saw the IBRS_ALL proposal.
There's *no* good reason for it to be opt-in. Just fix it!

> > So the part is I think is odd is the IBRS_ALL feature, where a future
> > CPU will advertise "I am able to be not broken" and then you have to
> > set the IBRS bit once at boot time to *ask* it not to be broken. That
> > part is weird, because it ought to have been treated like the RDCL_NO
> > bit — just "you don't have to worry any more, it got better".
>
> It's not "weird" at all. It's very much part of the whole "this is
> complete garbage" issue.
> 
> The whole IBRS_ALL feature to me very clearly says "Intel is not
> serious about this, we'll have a ugly hack that will be so expensive
> that we don't want to enable it by default, because that would look
> bad in benchmarks".
> 
> So instead they try to push the garbage down to us. And they are doing
> it entirely wrong, even from a technical standpoint.

Right. The whole IBRS/IBPB thing as a nasty hack in the short term I
could live with, but it's the long-term implications of IBRS_ALL that
I'm unhappy about.

My understanding was that the IBRS_ALL performance was supposed to not
suck — to the extent that we'd just turn it on and then ALTERNATIVE out
the retpolines, and that would be the best option.

But if that's the case, why are they making it an option, and not just
doing the same as RDCL_NO does for "we fixed Meltdown"?

> > We do need the IBPB feature to complete the protection that retpoline
> > gives us — it's that or rebuild all of userspace with retpoline.
>
> BULLSHIT.
> 
> Have you _looked_ at the patches you are talking about?  You should
> have - several of them bear your name.
> 
> The patches do things like add the garbage MSR writes to the kernel
> entry/exit points. That's insane. That says "we're trying to protect
> the kernel".  We already have retpoline there, with less overhead.

You're looking at IBRS usage, not IBPB. They are different things.

Yes, the one you're looking at really *is* trying to protect the
kernel, and you're right that it's largely redundant with retpoline.
(Assuming we can live with the implications on Skylake, as I said.)

> If this was about flushing the BTB at actual context switches between
> different users, I'd believe you. But that's not at all what the
> patches do.

That's what the *IBPB* patches do. Those were deliberately put first in
the series (and in fact that's where I stopped, when I posted).

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
  2018-01-21 20:28       ` David Woodhouse
@ 2018-01-21 21:35         ` Linus Torvalds
  -1 siblings, 0 replies; 120+ messages in thread
From: Linus Torvalds @ 2018-01-21 21:35 UTC (permalink / raw)
  To: David Woodhouse
  Cc: KarimAllah Ahmed, Linux Kernel Mailing List, Andi Kleen,
	Andrea Arcangeli, Andy Lutomirski, Arjan van de Ven, Ashok Raj,
	Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen,
	Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar,
	Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott,
	Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra,
	Radim Krčmář,
	Thomas Gleixner, Tim Chen, Tom Lendacky, KVM list,
	the arch/x86 maintainers, Arjan Van De Ven

On Sun, Jan 21, 2018 at 12:28 PM, David Woodhouse <dwmw2@infradead.org> wrote:
> On Sun, 2018-01-21 at 11:34 -0800, Linus Torvalds wrote:
>> All of this is pure garbage.
>>
>> Is Intel really planning on making this shit architectural? Has
>> anybody talked to them and told them they are f*cking insane?
>>
>> Please, any Intel engineers here - talk to your managers.
>
> If the alternative was a two-decade product recall and giving everyone
> free CPUs, I'm not sure it was entirely insane.

You seem to have bought into the cool-aid. Please add a healthy dose
of critical thinking. Because this isn't the kind of cool-aid that
makes for a fun trip with pretty pictures. This is the kind that melts
your brain.

> Certainly it's a nasty hack, but hey — the world was on fire and in the
> end we didn't have to just turn the datacentres off and go back to goat
> farming, so it's not all bad.

It's not that it's a nasty hack. It's much worse than that.

> As a hack for existing CPUs, it's just about tolerable — as long as it
> can die entirely by the next generation.

That's part of the big problem here. The speculation control cpuid
stuff shows that Intel actually seems to plan on doing the right thing
for meltdown (the main question being _when_). Which is not a huge
surprise, since it should be easy to fix, and it's a really honking
big hole to drive through. Not doing the right thing for meltdown
would be completely unacceptable.

So the IBRS garbage implies that Intel is _not_ planning on doing the
right thing for the indirect branch speculation.

Honestly, that's completely unacceptable too.

> So the part is I think is odd is the IBRS_ALL feature, where a future
> CPU will advertise "I am able to be not broken" and then you have to
> set the IBRS bit once at boot time to *ask* it not to be broken. That
> part is weird, because it ought to have been treated like the RDCL_NO
> bit — just "you don't have to worry any more, it got better".

It's not "weird" at all. It's very much part of the whole "this is
complete garbage" issue.

The whole IBRS_ALL feature to me very clearly says "Intel is not
serious about this, we'll have a ugly hack that will be so expensive
that we don't want to enable it by default, because that would look
bad in benchmarks".

So instead they try to push the garbage down to us. And they are doing
it entirely wrong, even from a technical standpoint.

I'm sure there is some lawyer there who says "we'll have to go through
motions to protect against a lawsuit". But legal reasons do not make
for good technology, or good patches that I should apply.

> We do need the IBPB feature to complete the protection that retpoline
> gives us — it's that or rebuild all of userspace with retpoline.

BULLSHIT.

Have you _looked_ at the patches you are talking about?  You should
have - several of them bear your name.

The patches do things like add the garbage MSR writes to the kernel
entry/exit points. That's insane. That says "we're trying to protect
the kernel".  We already have retpoline there, with less overhead.

So somebody isn't telling the truth here. Somebody is pushing complete
garbage for unclear reasons. Sorry for having to point that out.

If this was about flushing the BTB at actual context switches between
different users, I'd believe you. But that's not at all what the
patches do.

As it is, the patches  are COMPLETE AND UTTER GARBAGE.

They do literally insane things. They do things that do not make
sense. That makes all your arguments questionable and suspicious. The
patches do things that are not sane.

WHAT THE F*CK IS GOING ON?

And that's actually ignoring the much _worse_ issue, namely that the
whole hardware interface is literally mis-designed by morons.

It's mis-designed for two major reasons:

 - the "the interface implies Intel will never fix it" reason.

   See the difference between IBRS_ALL and RDCL_NO. One implies Intel
will fix something. The other does not.

   Do you really think that is acceptable?

 - the "there is no performance indicator".

   The whole point of having cpuid and flags from the
microarchitecture is that we can use those to make decisions.

   But since we already know that the IBRS overhead is <i>huge</i> on
existing hardware, all those hardware capability bits are just
complete and utter garbage. Nobody sane will use them, since the cost
is too damn high. So you end up having to look at "which CPU stepping
is this" anyway.

I think we need something better than this garbage.

                Linus

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
@ 2018-01-21 21:35         ` Linus Torvalds
  0 siblings, 0 replies; 120+ messages in thread
From: Linus Torvalds @ 2018-01-21 21:35 UTC (permalink / raw)
  To: David Woodhouse
  Cc: KarimAllah Ahmed, Linux Kernel Mailing List, Andi Kleen,
	Andrea Arcangeli, Andy Lutomirski, Arjan van de Ven, Ashok Raj,
	Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen,
	Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar,
	Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott,
	Masami Hiramatsu, Paolo

On Sun, Jan 21, 2018 at 12:28 PM, David Woodhouse <dwmw2@infradead.org> wrote:
> On Sun, 2018-01-21 at 11:34 -0800, Linus Torvalds wrote:
>> All of this is pure garbage.
>>
>> Is Intel really planning on making this shit architectural? Has
>> anybody talked to them and told them they are f*cking insane?
>>
>> Please, any Intel engineers here - talk to your managers.
>
> If the alternative was a two-decade product recall and giving everyone
> free CPUs, I'm not sure it was entirely insane.

You seem to have bought into the cool-aid. Please add a healthy dose
of critical thinking. Because this isn't the kind of cool-aid that
makes for a fun trip with pretty pictures. This is the kind that melts
your brain.

> Certainly it's a nasty hack, but hey — the world was on fire and in the
> end we didn't have to just turn the datacentres off and go back to goat
> farming, so it's not all bad.

It's not that it's a nasty hack. It's much worse than that.

> As a hack for existing CPUs, it's just about tolerable — as long as it
> can die entirely by the next generation.

That's part of the big problem here. The speculation control cpuid
stuff shows that Intel actually seems to plan on doing the right thing
for meltdown (the main question being _when_). Which is not a huge
surprise, since it should be easy to fix, and it's a really honking
big hole to drive through. Not doing the right thing for meltdown
would be completely unacceptable.

So the IBRS garbage implies that Intel is _not_ planning on doing the
right thing for the indirect branch speculation.

Honestly, that's completely unacceptable too.

> So the part is I think is odd is the IBRS_ALL feature, where a future
> CPU will advertise "I am able to be not broken" and then you have to
> set the IBRS bit once at boot time to *ask* it not to be broken. That
> part is weird, because it ought to have been treated like the RDCL_NO
> bit — just "you don't have to worry any more, it got better".

It's not "weird" at all. It's very much part of the whole "this is
complete garbage" issue.

The whole IBRS_ALL feature to me very clearly says "Intel is not
serious about this, we'll have a ugly hack that will be so expensive
that we don't want to enable it by default, because that would look
bad in benchmarks".

So instead they try to push the garbage down to us. And they are doing
it entirely wrong, even from a technical standpoint.

I'm sure there is some lawyer there who says "we'll have to go through
motions to protect against a lawsuit". But legal reasons do not make
for good technology, or good patches that I should apply.

> We do need the IBPB feature to complete the protection that retpoline
> gives us — it's that or rebuild all of userspace with retpoline.

BULLSHIT.

Have you _looked_ at the patches you are talking about?  You should
have - several of them bear your name.

The patches do things like add the garbage MSR writes to the kernel
entry/exit points. That's insane. That says "we're trying to protect
the kernel".  We already have retpoline there, with less overhead.

So somebody isn't telling the truth here. Somebody is pushing complete
garbage for unclear reasons. Sorry for having to point that out.

If this was about flushing the BTB at actual context switches between
different users, I'd believe you. But that's not at all what the
patches do.

As it is, the patches  are COMPLETE AND UTTER GARBAGE.

They do literally insane things. They do things that do not make
sense. That makes all your arguments questionable and suspicious. The
patches do things that are not sane.

WHAT THE F*CK IS GOING ON?

And that's actually ignoring the much _worse_ issue, namely that the
whole hardware interface is literally mis-designed by morons.

It's mis-designed for two major reasons:

 - the "the interface implies Intel will never fix it" reason.

   See the difference between IBRS_ALL and RDCL_NO. One implies Intel
will fix something. The other does not.

   Do you really think that is acceptable?

 - the "there is no performance indicator".

   The whole point of having cpuid and flags from the
microarchitecture is that we can use those to make decisions.

   But since we already know that the IBRS overhead is <i>huge</i> on
existing hardware, all those hardware capability bits are just
complete and utter garbage. Nobody sane will use them, since the cost
is too damn high. So you end up having to look at "which CPU stepping
is this" anyway.

I think we need something better than this garbage.

                Linus

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
  2018-01-21 19:34   ` Linus Torvalds
@ 2018-01-21 20:28       ` David Woodhouse
  0 siblings, 0 replies; 120+ messages in thread
From: David Woodhouse @ 2018-01-21 20:28 UTC (permalink / raw)
  To: Linus Torvalds, KarimAllah Ahmed
  Cc: Linux Kernel Mailing List, Andi Kleen, Andrea Arcangeli,
	Andy Lutomirski, Arjan van de Ven, Ashok Raj, Asit Mallick,
	Borislav Petkov, Dan Williams, Dave Hansen, Greg Kroah-Hartman,
	H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan,
	Joerg Roedel, Jun Nakajima, Laura Abbott, Masami Hiramatsu,
	Paolo Bonzini, Peter Zijlstra, Radim Krčmář,
	Thomas Gleixner, Tim Chen, Tom Lendacky, kvm, x86,
	Arjan Van De Ven

[-- Attachment #1: Type: text/plain, Size: 1774 bytes --]

On Sun, 2018-01-21 at 11:34 -0800, Linus Torvalds wrote:
> All of this is pure garbage.
> 
> Is Intel really planning on making this shit architectural? Has
> anybody talked to them and told them they are f*cking insane?
> 
> Please, any Intel engineers here - talk to your managers. 

If the alternative was a two-decade product recall and giving everyone
free CPUs, I'm not sure it was entirely insane.

Certainly it's a nasty hack, but hey — the world was on fire and in the
end we didn't have to just turn the datacentres off and go back to goat
farming, so it's not all bad.

As a hack for existing CPUs, it's just about tolerable — as long as it
can die entirely by the next generation.

So the part is I think is odd is the IBRS_ALL feature, where a future
CPU will advertise "I am able to be not broken" and then you have to
set the IBRS bit once at boot time to *ask* it not to be broken. That
part is weird, because it ought to have been treated like the RDCL_NO
bit — just "you don't have to worry any more, it got better".

https://software.intel.com/sites/default/files/managed/c5/63/336996-Speculative-Execution-Side-Channel-Mitigations.pdf

We do need the IBPB feature to complete the protection that retpoline
gives us — it's that or rebuild all of userspace with retpoline.

We'll also want to expose IBRS to VM guests, since Windows uses it.

I think we could probably live without the IBRS frobbing in our own
syscall/interrupt paths, as long as we're prepared to live with the
very hypothetical holes that still exist on Skylake. Because I like
IBRS more... no, let me rephrase... I hate IBRS less than I hate the
'deepstack' and other stuff that was being proposed to make Skylake
almost safe with retpoline.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
@ 2018-01-21 20:28       ` David Woodhouse
  0 siblings, 0 replies; 120+ messages in thread
From: David Woodhouse @ 2018-01-21 20:28 UTC (permalink / raw)
  To: Linus Torvalds, KarimAllah Ahmed
  Cc: Linux Kernel Mailing List, Andi Kleen, Andrea Arcangeli,
	Andy Lutomirski, Arjan van de Ven, Ashok Raj, Asit Mallick,
	Borislav Petkov, Dan Williams, Dave Hansen, Greg Kroah-Hartman,
	H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan,
	Joerg Roedel, Jun Nakajima, Laura Abbott, Masami Hiramatsu,
	Paolo Bonzini, Peter

[-- Attachment #1: Type: text/plain, Size: 1774 bytes --]

On Sun, 2018-01-21 at 11:34 -0800, Linus Torvalds wrote:
> All of this is pure garbage.
> 
> Is Intel really planning on making this shit architectural? Has
> anybody talked to them and told them they are f*cking insane?
> 
> Please, any Intel engineers here - talk to your managers. 

If the alternative was a two-decade product recall and giving everyone
free CPUs, I'm not sure it was entirely insane.

Certainly it's a nasty hack, but hey — the world was on fire and in the
end we didn't have to just turn the datacentres off and go back to goat
farming, so it's not all bad.

As a hack for existing CPUs, it's just about tolerable — as long as it
can die entirely by the next generation.

So the part is I think is odd is the IBRS_ALL feature, where a future
CPU will advertise "I am able to be not broken" and then you have to
set the IBRS bit once at boot time to *ask* it not to be broken. That
part is weird, because it ought to have been treated like the RDCL_NO
bit — just "you don't have to worry any more, it got better".

https://software.intel.com/sites/default/files/managed/c5/63/336996-Speculative-Execution-Side-Channel-Mitigations.pdf

We do need the IBPB feature to complete the protection that retpoline
gives us — it's that or rebuild all of userspace with retpoline.

We'll also want to expose IBRS to VM guests, since Windows uses it.

I think we could probably live without the IBRS frobbing in our own
syscall/interrupt paths, as long as we're prepared to live with the
very hypothetical holes that still exist on Skylake. Because I like
IBRS more... no, let me rephrase... I hate IBRS less than I hate the
'deepstack' and other stuff that was being proposed to make Skylake
almost safe with retpoline.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
  2018-01-20 19:23   ` KarimAllah Ahmed
  (?)
  (?)
@ 2018-01-21 19:34   ` Linus Torvalds
  2018-01-21 20:28       ` David Woodhouse
  -1 siblings, 1 reply; 120+ messages in thread
From: Linus Torvalds @ 2018-01-21 19:34 UTC (permalink / raw)
  To: KarimAllah Ahmed
  Cc: Linux Kernel Mailing List, Andi Kleen, Andrea Arcangeli,
	Andy Lutomirski, Arjan van de Ven, Ashok Raj, Asit Mallick,
	Borislav Petkov, Dan Williams, Dave Hansen, David Woodhouse,
	Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar,
	Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott,
	Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra,
	Radim Krčmář,
	Thomas Gleixner, Tim Chen, Tom Lendacky, kvm, x86,
	Arjan Van De Ven

[-- Attachment #1: Type: text/plain, Size: 4852 bytes --]

All of this is pure garbage.

Is Intel really planning on making this shit architectural? Has anybody
talked to them and told them they are f*cking insane?

Please, any Intel engineers here - talk to your managers.

        Linus

On Jan 20, 2018 11:23, "KarimAllah Ahmed" <karahmed@amazon.de> wrote:

> From: Tim Chen <tim.c.chen@linux.intel.com>
>
> Create macros to control Indirect Branch Speculation.
>
> Name them so they reflect what they are actually doing.
> The macros are used to restrict and unrestrict the indirect branch
> speculation.
> They do not *disable* (or *enable*) indirect branch speculation. A trip
> back to
> user-space after *restricting* speculation would still affect the BTB.
>
> Quoting from a commit by Tim Chen:
>
> """
>     If IBRS is set, near returns and near indirect jumps/calls will not
> allow
>     their predicted target address to be controlled by code that executed
> in a
>     less privileged prediction mode *BEFORE* the IBRS mode was last
> written with
>     a value of 1 or on another logical processor so long as all Return
> Stack
>     Buffer (RSB) entries from the previous less privileged prediction mode
> are
>     overwritten.
>
>     Thus a near indirect jump/call/return may be affected by code in a less
>     privileged prediction mode that executed *AFTER* IBRS mode was last
> written
>     with a value of 1.
> """
>
> [ tglx: Changed macro names and rewrote changelog ]
> [ karahmed: changed macro names *again* and rewrote changelog ]
>
> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: Andi Kleen <ak@linux.intel.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Greg KH <gregkh@linuxfoundation.org>
> Cc: Dave Hansen <dave.hansen@intel.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: David Woodhouse <dwmw@amazon.co.uk>
> Cc: Ashok Raj <ashok.raj@intel.com>
> Link: https://lkml.kernel.org/r/3aab341725ee6a9aafd3141387453b
> 45d788d61a.1515542293.git.tim.c.chen@linux.intel.com
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> ---
>  arch/x86/entry/calling.h | 73 ++++++++++++++++++++++++++++++
> ++++++++++++++++++
>  1 file changed, 73 insertions(+)
>
> diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
> index 3f48f69..5aafb51 100644
> --- a/arch/x86/entry/calling.h
> +++ b/arch/x86/entry/calling.h
> @@ -6,6 +6,8 @@
>  #include <asm/percpu.h>
>  #include <asm/asm-offsets.h>
>  #include <asm/processor-flags.h>
> +#include <asm/msr-index.h>
> +#include <asm/cpufeatures.h>
>
>  /*
>
> @@ -349,3 +351,74 @@ For 32-bit we have the following conventions - kernel
> is built with
>  .Lafter_call_\@:
>  #endif
>  .endm
> +
> +/*
> + * IBRS related macros
> + */
> +.macro PUSH_MSR_REGS
> +       pushq   %rax
> +       pushq   %rcx
> +       pushq   %rdx
> +.endm
> +
> +.macro POP_MSR_REGS
> +       popq    %rdx
> +       popq    %rcx
> +       popq    %rax
> +.endm
> +
> +.macro WRMSR_ASM msr_nr:req edx_val:req eax_val:req
> +       movl    \msr_nr, %ecx
> +       movl    \edx_val, %edx
> +       movl    \eax_val, %eax
> +       wrmsr
> +.endm
> +
> +.macro RESTRICT_IB_SPEC
> +       ALTERNATIVE "jmp .Lskip_\@", "", X86_FEATURE_IBRS
> +       PUSH_MSR_REGS
> +       WRMSR_ASM $MSR_IA32_SPEC_CTRL, $0, $SPEC_CTRL_IBRS
> +       POP_MSR_REGS
> +.Lskip_\@:
> +.endm
> +
> +.macro UNRESTRICT_IB_SPEC
> +       ALTERNATIVE "jmp .Lskip_\@", "", X86_FEATURE_IBRS
> +       PUSH_MSR_REGS
> +       WRMSR_ASM $MSR_IA32_SPEC_CTRL, $0, $0
> +       POP_MSR_REGS
> +.Lskip_\@:
> +.endm
> +
> +.macro RESTRICT_IB_SPEC_CLOBBER
> +       ALTERNATIVE "jmp .Lskip_\@", "", X86_FEATURE_IBRS
> +       WRMSR_ASM $MSR_IA32_SPEC_CTRL, $0, $SPEC_CTRL_IBRS
> +.Lskip_\@:
> +.endm
> +
> +.macro UNRESTRICT_IB_SPEC_CLOBBER
> +       ALTERNATIVE "jmp .Lskip_\@", "", X86_FEATURE_IBRS
> +       WRMSR_ASM $MSR_IA32_SPEC_CTRL, $0, $0
> +.Lskip_\@:
> +.endm
> +
> +.macro RESTRICT_IB_SPEC_SAVE_AND_CLOBBER save_reg:req
> +       ALTERNATIVE "jmp .Lskip_\@", "", X86_FEATURE_IBRS
> +       movl    $MSR_IA32_SPEC_CTRL, %ecx
> +       rdmsr
> +       movl    %eax, \save_reg
> +       movl    $0, %edx
> +       movl    $SPEC_CTRL_IBRS, %eax
> +       wrmsr
> +.Lskip_\@:
> +.endm
> +
> +.macro RESTORE_IB_SPEC_CLOBBER save_reg:req
> +       ALTERNATIVE "jmp .Lskip_\@", "", X86_FEATURE_IBRS
> +       /* Set IBRS to the value saved in the save_reg */
> +       movl    $MSR_IA32_SPEC_CTRL, %ecx
> +       movl    $0, %edx
> +       movl    \save_reg, %eax
> +       wrmsr
> +.Lskip_\@:
> +.endm
> --
> 2.7.4
>
>

[-- Attachment #2: Type: text/html, Size: 7077 bytes --]

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
  2018-01-20 19:23   ` KarimAllah Ahmed
@ 2018-01-21 19:14     ` Andy Lutomirski
  -1 siblings, 0 replies; 120+ messages in thread
From: Andy Lutomirski @ 2018-01-21 19:14 UTC (permalink / raw)
  To: KarimAllah Ahmed
  Cc: linux-kernel, Andi Kleen, Andrea Arcangeli, Andy Lutomirski,
	Arjan van de Ven, Ashok Raj, Asit Mallick, Borislav Petkov,
	Dan Williams, Dave Hansen, David Woodhouse, Greg Kroah-Hartman,
	H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan,
	Joerg Roedel, Jun Nakajima, Laura Abbott, Linus Torvalds,
	Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra,
	Radim Krčmář,
	Thomas Gleixner, Tim Chen, Tom Lendacky, kvm, x86,
	Arjan Van De Ven



> On Jan 20, 2018, at 11:23 AM, KarimAllah Ahmed <karahmed@amazon.de> wrote:
> 
> From: Tim Chen <tim.c.chen@linux.intel.com>
> 
> Create macros to control Indirect Branch Speculation.
> 
> Name them so they reflect what they are actually doing.
> The macros are used to restrict and unrestrict the indirect branch speculation.
> They do not *disable* (or *enable*) indirect branch speculation. A trip back to
> user-space after *restricting* speculation would still affect the BTB.
> 
> Quoting from a commit by Tim Chen:
> 
> """
>    If IBRS is set, near returns and near indirect jumps/calls will not allow
>    their predicted target address to be controlled by code that executed in a
>    less privileged prediction mode *BEFORE* the IBRS mode was last written with
>    a value of 1 or on another logical processor so long as all Return Stack
>    Buffer (RSB) entries from the previous less privileged prediction mode are
>    overwritten.
> 
>    Thus a near indirect jump/call/return may be affected by code in a less
>    privileged prediction mode that executed *AFTER* IBRS mode was last written
>    with a value of 1.
> """
> 
> [ tglx: Changed macro names and rewrote changelog ]
> [ karahmed: changed macro names *again* and rewrote changelog ]
> 
> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: Andi Kleen <ak@linux.intel.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Greg KH <gregkh@linuxfoundation.org>
> Cc: Dave Hansen <dave.hansen@intel.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: David Woodhouse <dwmw@amazon.co.uk>
> Cc: Ashok Raj <ashok.raj@intel.com>
> Link: https://lkml.kernel.org/r/3aab341725ee6a9aafd3141387453b45d788d61a.1515542293.git.tim.c.chen@linux.intel.com
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> ---
> arch/x86/entry/calling.h | 73 ++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 73 insertions(+)
> 
> diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
> index 3f48f69..5aafb51 100644
> --- a/arch/x86/entry/calling.h
> +++ b/arch/x86/entry/calling.h
> @@ -6,6 +6,8 @@
> #include <asm/percpu.h>
> #include <asm/asm-offsets.h>
> #include <asm/processor-flags.h>
> +#include <asm/msr-index.h>
> +#include <asm/cpufeatures.h>
> 
> /*
> 
> @@ -349,3 +351,74 @@ For 32-bit we have the following conventions - kernel is built with
> .Lafter_call_\@:
> #endif
> .endm
> +
> +/*
> + * IBRS related macros
> + */
> +.macro PUSH_MSR_REGS
> +    pushq    %rax
> +    pushq    %rcx
> +    pushq    %rdx
> +.endm
> +
> +.macro POP_MSR_REGS
> +    popq    %rdx
> +    popq    %rcx
> +    popq    %rax
> +.endm
> +
> +.macro WRMSR_ASM msr_nr:req edx_val:req eax_val:req
> +    movl    \msr_nr, %ecx
> +    movl    \edx_val, %edx
> +    movl    \eax_val, %eax
> +    wrmsr
> +.endm
> +
> +.macro RESTRICT_IB_SPEC
> +    ALTERNATIVE "jmp .Lskip_\@", "", X86_FEATURE_IBRS
> +    PUSH_MSR_REGS
> +    WRMSR_ASM $MSR_IA32_SPEC_CTRL, $0, $SPEC_CTRL_IBRS
> +    POP_MSR_REGS
> +.Lskip_\@:
> +.endm
> +
> +.macro UNRESTRICT_IB_SPEC
> +    ALTERNATIVE "jmp .Lskip_\@", "", X86_FEATURE_IBRS
> +    PUSH_MSR_REGS
> +    WRMSR_ASM $MSR_IA32_SPEC_CTRL, $0, $0

I think you should be writing 2, not 0, since I'm reasonably confident that we want STIBP on.  Can you explain why you're writing 0?

Also, holy cow, there are so many macros here.

And a meta question: why are there so many submitters of the same series?

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
@ 2018-01-21 19:14     ` Andy Lutomirski
  0 siblings, 0 replies; 120+ messages in thread
From: Andy Lutomirski @ 2018-01-21 19:14 UTC (permalink / raw)
  To: KarimAllah Ahmed
  Cc: linux-kernel, Andi Kleen, Andrea Arcangeli, Andy Lutomirski,
	Arjan van de Ven, Ashok Raj, Asit Mallick, Borislav Petkov,
	Dan Williams, Dave Hansen, David Woodhouse, Greg Kroah-Hartman,
	H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan,
	Joerg Roedel, Jun Nakajima, Laura Abbott, Linus Torvalds,
	Masami Hiramatsu



> On Jan 20, 2018, at 11:23 AM, KarimAllah Ahmed <karahmed@amazon.de> wrote:
> 
> From: Tim Chen <tim.c.chen@linux.intel.com>
> 
> Create macros to control Indirect Branch Speculation.
> 
> Name them so they reflect what they are actually doing.
> The macros are used to restrict and unrestrict the indirect branch speculation.
> They do not *disable* (or *enable*) indirect branch speculation. A trip back to
> user-space after *restricting* speculation would still affect the BTB.
> 
> Quoting from a commit by Tim Chen:
> 
> """
>    If IBRS is set, near returns and near indirect jumps/calls will not allow
>    their predicted target address to be controlled by code that executed in a
>    less privileged prediction mode *BEFORE* the IBRS mode was last written with
>    a value of 1 or on another logical processor so long as all Return Stack
>    Buffer (RSB) entries from the previous less privileged prediction mode are
>    overwritten.
> 
>    Thus a near indirect jump/call/return may be affected by code in a less
>    privileged prediction mode that executed *AFTER* IBRS mode was last written
>    with a value of 1.
> """
> 
> [ tglx: Changed macro names and rewrote changelog ]
> [ karahmed: changed macro names *again* and rewrote changelog ]
> 
> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: Andi Kleen <ak@linux.intel.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Greg KH <gregkh@linuxfoundation.org>
> Cc: Dave Hansen <dave.hansen@intel.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: David Woodhouse <dwmw@amazon.co.uk>
> Cc: Ashok Raj <ashok.raj@intel.com>
> Link: https://lkml.kernel.org/r/3aab341725ee6a9aafd3141387453b45d788d61a.1515542293.git.tim.c.chen@linux.intel.com
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> ---
> arch/x86/entry/calling.h | 73 ++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 73 insertions(+)
> 
> diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
> index 3f48f69..5aafb51 100644
> --- a/arch/x86/entry/calling.h
> +++ b/arch/x86/entry/calling.h
> @@ -6,6 +6,8 @@
> #include <asm/percpu.h>
> #include <asm/asm-offsets.h>
> #include <asm/processor-flags.h>
> +#include <asm/msr-index.h>
> +#include <asm/cpufeatures.h>
> 
> /*
> 
> @@ -349,3 +351,74 @@ For 32-bit we have the following conventions - kernel is built with
> .Lafter_call_\@:
> #endif
> .endm
> +
> +/*
> + * IBRS related macros
> + */
> +.macro PUSH_MSR_REGS
> +    pushq    %rax
> +    pushq    %rcx
> +    pushq    %rdx
> +.endm
> +
> +.macro POP_MSR_REGS
> +    popq    %rdx
> +    popq    %rcx
> +    popq    %rax
> +.endm
> +
> +.macro WRMSR_ASM msr_nr:req edx_val:req eax_val:req
> +    movl    \msr_nr, %ecx
> +    movl    \edx_val, %edx
> +    movl    \eax_val, %eax
> +    wrmsr
> +.endm
> +
> +.macro RESTRICT_IB_SPEC
> +    ALTERNATIVE "jmp .Lskip_\@", "", X86_FEATURE_IBRS
> +    PUSH_MSR_REGS
> +    WRMSR_ASM $MSR_IA32_SPEC_CTRL, $0, $SPEC_CTRL_IBRS
> +    POP_MSR_REGS
> +.Lskip_\@:
> +.endm
> +
> +.macro UNRESTRICT_IB_SPEC
> +    ALTERNATIVE "jmp .Lskip_\@", "", X86_FEATURE_IBRS
> +    PUSH_MSR_REGS
> +    WRMSR_ASM $MSR_IA32_SPEC_CTRL, $0, $0

I think you should be writing 2, not 0, since I'm reasonably confident that we want STIBP on.  Can you explain why you're writing 0?

Also, holy cow, there are so many macros here.

And a meta question: why are there so many submitters of the same series?

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
  2018-01-20 19:22 [RFC 00/10] Speculation Control feature support KarimAllah Ahmed
@ 2018-01-20 19:23   ` KarimAllah Ahmed
  0 siblings, 0 replies; 120+ messages in thread
From: KarimAllah Ahmed @ 2018-01-20 19:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: KarimAllah Ahmed, Andi Kleen, Andrea Arcangeli, Andy Lutomirski,
	Arjan van de Ven, Ashok Raj, Asit Mallick, Borislav Petkov,
	Dan Williams, Dave Hansen, David Woodhouse, Greg Kroah-Hartman,
	H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan,
	Joerg Roedel, Jun Nakajima, Laura Abbott, Linus Torvalds,
	Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra,
	Radim Krčmář,
	Thomas Gleixner, Tim Chen, Tom Lendacky, kvm, x86,
	Arjan Van De Ven

From: Tim Chen <tim.c.chen@linux.intel.com>

Create macros to control Indirect Branch Speculation.

Name them so they reflect what they are actually doing.
The macros are used to restrict and unrestrict the indirect branch speculation.
They do not *disable* (or *enable*) indirect branch speculation. A trip back to
user-space after *restricting* speculation would still affect the BTB.

Quoting from a commit by Tim Chen:

"""
    If IBRS is set, near returns and near indirect jumps/calls will not allow
    their predicted target address to be controlled by code that executed in a
    less privileged prediction mode *BEFORE* the IBRS mode was last written with
    a value of 1 or on another logical processor so long as all Return Stack
    Buffer (RSB) entries from the previous less privileged prediction mode are
    overwritten.

    Thus a near indirect jump/call/return may be affected by code in a less
    privileged prediction mode that executed *AFTER* IBRS mode was last written
    with a value of 1.
"""

[ tglx: Changed macro names and rewrote changelog ]
[ karahmed: changed macro names *again* and rewrote changelog ]

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Ashok Raj <ashok.raj@intel.com>
Link: https://lkml.kernel.org/r/3aab341725ee6a9aafd3141387453b45d788d61a.1515542293.git.tim.c.chen@linux.intel.com
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/entry/calling.h | 73 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 73 insertions(+)

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index 3f48f69..5aafb51 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -6,6 +6,8 @@
 #include <asm/percpu.h>
 #include <asm/asm-offsets.h>
 #include <asm/processor-flags.h>
+#include <asm/msr-index.h>
+#include <asm/cpufeatures.h>
 
 /*
 
@@ -349,3 +351,74 @@ For 32-bit we have the following conventions - kernel is built with
 .Lafter_call_\@:
 #endif
 .endm
+
+/*
+ * IBRS related macros
+ */
+.macro PUSH_MSR_REGS
+	pushq	%rax
+	pushq	%rcx
+	pushq	%rdx
+.endm
+
+.macro POP_MSR_REGS
+	popq	%rdx
+	popq	%rcx
+	popq	%rax
+.endm
+
+.macro WRMSR_ASM msr_nr:req edx_val:req eax_val:req
+	movl	\msr_nr, %ecx
+	movl	\edx_val, %edx
+	movl	\eax_val, %eax
+	wrmsr
+.endm
+
+.macro RESTRICT_IB_SPEC
+	ALTERNATIVE "jmp .Lskip_\@", "", X86_FEATURE_IBRS
+	PUSH_MSR_REGS
+	WRMSR_ASM $MSR_IA32_SPEC_CTRL, $0, $SPEC_CTRL_IBRS
+	POP_MSR_REGS
+.Lskip_\@:
+.endm
+
+.macro UNRESTRICT_IB_SPEC
+	ALTERNATIVE "jmp .Lskip_\@", "", X86_FEATURE_IBRS
+	PUSH_MSR_REGS
+	WRMSR_ASM $MSR_IA32_SPEC_CTRL, $0, $0
+	POP_MSR_REGS
+.Lskip_\@:
+.endm
+
+.macro RESTRICT_IB_SPEC_CLOBBER
+	ALTERNATIVE "jmp .Lskip_\@", "", X86_FEATURE_IBRS
+	WRMSR_ASM $MSR_IA32_SPEC_CTRL, $0, $SPEC_CTRL_IBRS
+.Lskip_\@:
+.endm
+
+.macro UNRESTRICT_IB_SPEC_CLOBBER
+	ALTERNATIVE "jmp .Lskip_\@", "", X86_FEATURE_IBRS
+	WRMSR_ASM $MSR_IA32_SPEC_CTRL, $0, $0
+.Lskip_\@:
+.endm
+
+.macro RESTRICT_IB_SPEC_SAVE_AND_CLOBBER save_reg:req
+	ALTERNATIVE "jmp .Lskip_\@", "", X86_FEATURE_IBRS
+	movl	$MSR_IA32_SPEC_CTRL, %ecx
+	rdmsr
+	movl	%eax, \save_reg
+	movl	$0, %edx
+	movl	$SPEC_CTRL_IBRS, %eax
+	wrmsr
+.Lskip_\@:
+.endm
+
+.macro RESTORE_IB_SPEC_CLOBBER save_reg:req
+	ALTERNATIVE "jmp .Lskip_\@", "", X86_FEATURE_IBRS
+	/* Set IBRS to the value saved in the save_reg */
+	movl    $MSR_IA32_SPEC_CTRL, %ecx
+	movl    $0, %edx
+	movl    \save_reg, %eax
+	wrmsr
+.Lskip_\@:
+.endm
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
@ 2018-01-20 19:23   ` KarimAllah Ahmed
  0 siblings, 0 replies; 120+ messages in thread
From: KarimAllah Ahmed @ 2018-01-20 19:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: KarimAllah Ahmed, Andi Kleen, Andrea Arcangeli, Andy Lutomirski,
	Arjan van de Ven, Ashok Raj, Asit Mallick, Borislav Petkov,
	Dan Williams, Dave Hansen, David Woodhouse, Greg Kroah-Hartman,
	H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan,
	Joerg Roedel, Jun Nakajima, Laura Abbott, Linus Torvalds,
	Masami Hiramatsu

From: Tim Chen <tim.c.chen@linux.intel.com>

Create macros to control Indirect Branch Speculation.

Name them so they reflect what they are actually doing.
The macros are used to restrict and unrestrict the indirect branch speculation.
They do not *disable* (or *enable*) indirect branch speculation. A trip back to
user-space after *restricting* speculation would still affect the BTB.

Quoting from a commit by Tim Chen:

"""
    If IBRS is set, near returns and near indirect jumps/calls will not allow
    their predicted target address to be controlled by code that executed in a
    less privileged prediction mode *BEFORE* the IBRS mode was last written with
    a value of 1 or on another logical processor so long as all Return Stack
    Buffer (RSB) entries from the previous less privileged prediction mode are
    overwritten.

    Thus a near indirect jump/call/return may be affected by code in a less
    privileged prediction mode that executed *AFTER* IBRS mode was last written
    with a value of 1.
"""

[ tglx: Changed macro names and rewrote changelog ]
[ karahmed: changed macro names *again* and rewrote changelog ]

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Ashok Raj <ashok.raj@intel.com>
Link: https://lkml.kernel.org/r/3aab341725ee6a9aafd3141387453b45d788d61a.1515542293.git.tim.c.chen@linux.intel.com
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/entry/calling.h | 73 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 73 insertions(+)

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index 3f48f69..5aafb51 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -6,6 +6,8 @@
 #include <asm/percpu.h>
 #include <asm/asm-offsets.h>
 #include <asm/processor-flags.h>
+#include <asm/msr-index.h>
+#include <asm/cpufeatures.h>
 
 /*
 
@@ -349,3 +351,74 @@ For 32-bit we have the following conventions - kernel is built with
 .Lafter_call_\@:
 #endif
 .endm
+
+/*
+ * IBRS related macros
+ */
+.macro PUSH_MSR_REGS
+	pushq	%rax
+	pushq	%rcx
+	pushq	%rdx
+.endm
+
+.macro POP_MSR_REGS
+	popq	%rdx
+	popq	%rcx
+	popq	%rax
+.endm
+
+.macro WRMSR_ASM msr_nr:req edx_val:req eax_val:req
+	movl	\msr_nr, %ecx
+	movl	\edx_val, %edx
+	movl	\eax_val, %eax
+	wrmsr
+.endm
+
+.macro RESTRICT_IB_SPEC
+	ALTERNATIVE "jmp .Lskip_\@", "", X86_FEATURE_IBRS
+	PUSH_MSR_REGS
+	WRMSR_ASM $MSR_IA32_SPEC_CTRL, $0, $SPEC_CTRL_IBRS
+	POP_MSR_REGS
+.Lskip_\@:
+.endm
+
+.macro UNRESTRICT_IB_SPEC
+	ALTERNATIVE "jmp .Lskip_\@", "", X86_FEATURE_IBRS
+	PUSH_MSR_REGS
+	WRMSR_ASM $MSR_IA32_SPEC_CTRL, $0, $0
+	POP_MSR_REGS
+.Lskip_\@:
+.endm
+
+.macro RESTRICT_IB_SPEC_CLOBBER
+	ALTERNATIVE "jmp .Lskip_\@", "", X86_FEATURE_IBRS
+	WRMSR_ASM $MSR_IA32_SPEC_CTRL, $0, $SPEC_CTRL_IBRS
+.Lskip_\@:
+.endm
+
+.macro UNRESTRICT_IB_SPEC_CLOBBER
+	ALTERNATIVE "jmp .Lskip_\@", "", X86_FEATURE_IBRS
+	WRMSR_ASM $MSR_IA32_SPEC_CTRL, $0, $0
+.Lskip_\@:
+.endm
+
+.macro RESTRICT_IB_SPEC_SAVE_AND_CLOBBER save_reg:req
+	ALTERNATIVE "jmp .Lskip_\@", "", X86_FEATURE_IBRS
+	movl	$MSR_IA32_SPEC_CTRL, %ecx
+	rdmsr
+	movl	%eax, \save_reg
+	movl	$0, %edx
+	movl	$SPEC_CTRL_IBRS, %eax
+	wrmsr
+.Lskip_\@:
+.endm
+
+.macro RESTORE_IB_SPEC_CLOBBER save_reg:req
+	ALTERNATIVE "jmp .Lskip_\@", "", X86_FEATURE_IBRS
+	/* Set IBRS to the value saved in the save_reg */
+	movl    $MSR_IA32_SPEC_CTRL, %ecx
+	movl    $0, %edx
+	movl    \save_reg, %eax
+	wrmsr
+.Lskip_\@:
+.endm
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 120+ messages in thread

end of thread, other threads:[~2018-02-06  9:14 UTC | newest]

Thread overview: 120+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-26  2:50 [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation Liran Alon
2018-01-26  2:55 ` Van De Ven, Arjan
2018-01-26  2:55   ` Van De Ven, Arjan
  -- strict thread matches above, loose matches on Subject: below --
2018-01-26  2:11 Liran Alon
2018-01-26  2:23 ` Dave Hansen
2018-01-26  9:11   ` David Woodhouse
2018-01-26  9:11     ` David Woodhouse
2018-01-26 17:19     ` Linus Torvalds
2018-01-26 17:19       ` Linus Torvalds
2018-01-26 17:27       ` Borislav Petkov
2018-01-26 17:27         ` Borislav Petkov
2018-01-26 17:29       ` David Woodhouse
2018-01-26 17:29         ` David Woodhouse
2018-01-26 17:31         ` David Woodhouse
2018-01-26 17:31           ` David Woodhouse
2018-01-26 17:59       ` Andi Kleen
2018-01-26 17:59         ` Andi Kleen
2018-01-26 18:11         ` David Woodhouse
2018-01-26 18:11           ` David Woodhouse
2018-01-26 18:12           ` Arjan van de Ven
2018-01-26 18:12             ` Arjan van de Ven
2018-01-26 18:26             ` David Woodhouse
2018-01-26 18:26               ` David Woodhouse
2018-01-26 18:28               ` Van De Ven, Arjan
2018-01-26 18:28                 ` Van De Ven, Arjan
2018-01-26 18:43                 ` David Woodhouse
2018-01-26 18:43                   ` David Woodhouse
2018-01-26 18:44                   ` Van De Ven, Arjan
2018-01-26 18:44                     ` Van De Ven, Arjan
2018-01-26 18:53                     ` David Woodhouse
2018-01-26 18:53                       ` David Woodhouse
2018-01-26 19:02         ` Konrad Rzeszutek Wilk
2018-01-26 19:02           ` Konrad Rzeszutek Wilk
2018-01-26 19:11           ` Hansen, Dave
2018-01-26 19:11             ` Hansen, Dave
2018-01-27 13:42             ` Konrad Rzeszutek Wilk
2018-01-27 13:42               ` Konrad Rzeszutek Wilk
2018-01-27 15:55               ` Dave Hansen
2018-01-27 15:55                 ` Dave Hansen
2018-01-26 19:11           ` David Woodhouse
2018-01-26 19:11             ` David Woodhouse
2018-01-26  8:46 ` David Woodhouse
2018-01-26  8:46   ` David Woodhouse
2018-01-23 11:13 Liran Alon
2018-01-25 22:20 ` Dave Hansen
2018-01-22 22:15 Luke Kenneth Casson Leighton
2018-01-20 19:22 [RFC 00/10] Speculation Control feature support KarimAllah Ahmed
2018-01-20 19:23 ` [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation KarimAllah Ahmed
2018-01-20 19:23   ` KarimAllah Ahmed
2018-01-21 19:14   ` Andy Lutomirski
2018-01-21 19:14     ` Andy Lutomirski
2018-01-23 16:12     ` Tom Lendacky
2018-01-23 16:12       ` Tom Lendacky
2018-01-23 16:20       ` Woodhouse, David
2018-01-23 16:20         ` Woodhouse, David
2018-01-23 22:37         ` Tom Lendacky
2018-01-23 22:37           ` Tom Lendacky
2018-01-23 22:49           ` Andi Kleen
2018-01-23 22:49             ` Andi Kleen
2018-01-23 23:14             ` Woodhouse, David
2018-01-23 23:14               ` Woodhouse, David
2018-01-23 23:22               ` Andi Kleen
2018-01-23 23:22                 ` Andi Kleen
2018-01-24  0:47               ` Tim Chen
2018-01-24  0:47                 ` Tim Chen
2018-01-24  1:00                 ` Andy Lutomirski
2018-01-24  1:00                   ` Andy Lutomirski
2018-01-24  1:22                   ` David Woodhouse
2018-01-24  1:22                     ` David Woodhouse
2018-01-24  1:59                   ` Van De Ven, Arjan
2018-01-24  1:59                     ` Van De Ven, Arjan
2018-01-24  3:25                     ` Andy Lutomirski
2018-01-24  3:25                       ` Andy Lutomirski
2018-01-21 19:34   ` Linus Torvalds
2018-01-21 20:28     ` David Woodhouse
2018-01-21 20:28       ` David Woodhouse
2018-01-21 21:35       ` Linus Torvalds
2018-01-21 21:35         ` Linus Torvalds
2018-01-21 22:00         ` David Woodhouse
2018-01-21 22:00           ` David Woodhouse
2018-01-21 22:27           ` Linus Torvalds
2018-01-21 22:27             ` Linus Torvalds
2018-01-22 16:27             ` David Woodhouse
2018-01-22 16:27               ` David Woodhouse
2018-01-23  7:29               ` Ingo Molnar
2018-01-23  7:29                 ` Ingo Molnar
2018-01-23  7:53                 ` Ingo Molnar
2018-01-23  7:53                   ` Ingo Molnar
2018-01-23  9:27                   ` Ingo Molnar
2018-01-23  9:27                     ` Ingo Molnar
2018-01-23  9:37                     ` David Woodhouse
2018-01-23  9:37                       ` David Woodhouse
2018-01-23 15:01                     ` Dave Hansen
2018-01-23 15:01                       ` Dave Hansen
2018-01-23  9:30                   ` David Woodhouse
2018-01-23  9:30                     ` David Woodhouse
2018-01-23 10:15                     ` Ingo Molnar
2018-01-23 10:15                       ` Ingo Molnar
2018-01-23 10:27                       ` David Woodhouse
2018-01-23 10:27                         ` David Woodhouse
2018-01-23 10:44                         ` Ingo Molnar
2018-01-23 10:44                           ` Ingo Molnar
2018-01-23 10:57                           ` David Woodhouse
2018-01-23 10:23                     ` Ingo Molnar
2018-01-23 10:23                       ` Ingo Molnar
2018-01-23 10:35                       ` David Woodhouse
2018-02-04 18:43                       ` Thomas Gleixner
2018-02-04 18:43                         ` Thomas Gleixner
2018-02-04 20:22                         ` David Woodhouse
2018-02-04 20:22                           ` David Woodhouse
2018-02-06  9:14                         ` David Woodhouse
2018-02-06  9:14                           ` David Woodhouse
2018-01-25 16:19                     ` Mason
2018-01-25 16:19                       ` Mason
2018-01-25 17:16                       ` Greg Kroah-Hartman
2018-01-25 17:16                         ` Greg Kroah-Hartman
2018-01-29 11:59                         ` Mason
2018-01-29 11:59                           ` Mason
2018-01-24  0:05                 ` Andi Kleen
2018-01-23 20:16       ` Pavel Machek
2018-01-23 20:16         ` Pavel Machek

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.