linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1] x86/retpoline: Use lfence in the retpoline/RSB filling RSB macros
@ 2018-01-13  1:07 Tom Lendacky
  2018-01-13  1:53 ` Dan Williams
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Tom Lendacky @ 2018-01-13  1:07 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Rik van Riel, Andi Kleen, Josh Poimboeuf, Peter Zijlstra,
	Linus Torvalds, Jiri Kosina, Dan Williams, Dave Hansen,
	Borislav Petkov, Andy Lutomirski, Kees Cook, Thomas Gleixner,
	Tim Chen, Greg Kroah-Hartman, David Woodhouse, Paul Turner

The pause instruction is currently used in the retpoline and RSB filling
macros as a speculation trap.  The use of pause was originally suggested
because it showed a very, very small difference in the amount of
cycles/time used to execute the retpoline as compared to lfence.  On AMD,
the pause instruction is not a serializing instruction, so the pause/jmp
loop will use excess power as it is speculated over waiting for return
to mispredict to the correct target.

The RSB filling macro is applicable to AMD, and, if software is unable to
verify that lfence is serializing on AMD (possible when running under a
hypervisor), the generic retpoline support will be used and, so, is also
applicable to AMD.  Change the use of pause to lfence.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/include/asm/nospec-branch.h |   10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 402a11c..2c4a09a 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -11,7 +11,7 @@
  * Fill the CPU return stack buffer.
  *
  * Each entry in the RSB, if used for a speculative 'ret', contains an
- * infinite 'pause; jmp' loop to capture speculative execution.
+ * infinite 'lfence; jmp' loop to capture speculative execution.
  *
  * This is required in various cases for retpoline and IBRS-based
  * mitigations for the Spectre variant 2 vulnerability. Sometimes to
@@ -37,12 +37,12 @@
 771:						\
 	call	772f;				\
 773:	/* speculation trap */			\
-	pause;					\
+	lfence;					\
 	jmp	773b;				\
 772:						\
 	call	774f;				\
 775:	/* speculation trap */			\
-	pause;					\
+	lfence;					\
 	jmp	775b;				\
 774:						\
 	dec	reg;				\
@@ -72,7 +72,7 @@
 .macro RETPOLINE_JMP reg:req
 	call	.Ldo_rop_\@
 .Lspec_trap_\@:
-	pause
+	lfence
 	jmp	.Lspec_trap_\@
 .Ldo_rop_\@:
 	mov	\reg, (%_ASM_SP)
@@ -164,7 +164,7 @@
 	"       jmp    904f;\n"					\
 	"       .align 16\n"					\
 	"901:	call   903f;\n"					\
-	"902:	pause;\n"					\
+	"902:	lfence;\n"					\
 	"       jmp    902b;\n"					\
 	"       .align 16\n"					\
 	"903:	addl   $4, %%esp;\n"				\

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v1] x86/retpoline: Use lfence in the retpoline/RSB filling RSB macros
  2018-01-13  1:07 [PATCH v1] x86/retpoline: Use lfence in the retpoline/RSB filling RSB macros Tom Lendacky
@ 2018-01-13  1:53 ` Dan Williams
  2018-01-13  2:22   ` Tom Lendacky
  2018-01-13 10:33 ` [tip:x86/pti] x86/retpoline: Use LFENCE instead of PAUSE " tip-bot for Tom Lendacky
  2018-01-13 10:46 ` [PATCH v1] x86/retpoline: Use lfence " Woodhouse, David
  2 siblings, 1 reply; 8+ messages in thread
From: Dan Williams @ 2018-01-13  1:53 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: X86 ML, Linux Kernel Mailing List, Rik van Riel, Andi Kleen,
	Josh Poimboeuf, Peter Zijlstra, Linus Torvalds, Jiri Kosina,
	Dave Hansen, Borislav Petkov, Andy Lutomirski, Kees Cook,
	Thomas Gleixner, Tim Chen, Greg Kroah-Hartman, David Woodhouse,
	Paul Turner

On Fri, Jan 12, 2018 at 5:07 PM, Tom Lendacky <thomas.lendacky@amd.com> wrote:
> The pause instruction is currently used in the retpoline and RSB filling
> macros as a speculation trap.  The use of pause was originally suggested
> because it showed a very, very small difference in the amount of
> cycles/time used to execute the retpoline as compared to lfence.  On AMD,
> the pause instruction is not a serializing instruction, so the pause/jmp
> loop will use excess power as it is speculated over waiting for return
> to mispredict to the correct target.
>
> The RSB filling macro is applicable to AMD, and, if software is unable to
> verify that lfence is serializing on AMD (possible when running under a
> hypervisor), the generic retpoline support will be used and, so, is also
> applicable to AMD.  Change the use of pause to lfence.

Should we use ASM_IFENCE for this?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v1] x86/retpoline: Use lfence in the retpoline/RSB filling RSB macros
  2018-01-13  1:53 ` Dan Williams
@ 2018-01-13  2:22   ` Tom Lendacky
  0 siblings, 0 replies; 8+ messages in thread
From: Tom Lendacky @ 2018-01-13  2:22 UTC (permalink / raw)
  To: Dan Williams
  Cc: X86 ML, Linux Kernel Mailing List, Rik van Riel, Andi Kleen,
	Josh Poimboeuf, Peter Zijlstra, Linus Torvalds, Jiri Kosina,
	Dave Hansen, Borislav Petkov, Andy Lutomirski, Kees Cook,
	Thomas Gleixner, Tim Chen, Greg Kroah-Hartman, David Woodhouse,
	Paul Turner

On 1/12/2018 7:53 PM, Dan Williams wrote:
> On Fri, Jan 12, 2018 at 5:07 PM, Tom Lendacky <thomas.lendacky@amd.com> wrote:
>> The pause instruction is currently used in the retpoline and RSB filling
>> macros as a speculation trap.  The use of pause was originally suggested
>> because it showed a very, very small difference in the amount of
>> cycles/time used to execute the retpoline as compared to lfence.  On AMD,
>> the pause instruction is not a serializing instruction, so the pause/jmp
>> loop will use excess power as it is speculated over waiting for return
>> to mispredict to the correct target.
>>
>> The RSB filling macro is applicable to AMD, and, if software is unable to
>> verify that lfence is serializing on AMD (possible when running under a
>> hypervisor), the generic retpoline support will be used and, so, is also
>> applicable to AMD.  Change the use of pause to lfence.
> 
> Should we use ASM_IFENCE for this?

I don't think we need to.  On bare-metal this will be fine.  When running
as a guest, the hypervisor should have made lfence serializing, and if it
hasn't, this is still better then pause.

Thanks,
Tom

> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [tip:x86/pti] x86/retpoline: Use LFENCE instead of PAUSE in the retpoline/RSB filling RSB macros
  2018-01-13  1:07 [PATCH v1] x86/retpoline: Use lfence in the retpoline/RSB filling RSB macros Tom Lendacky
  2018-01-13  1:53 ` Dan Williams
@ 2018-01-13 10:33 ` tip-bot for Tom Lendacky
  2018-01-13 10:46 ` [PATCH v1] x86/retpoline: Use lfence " Woodhouse, David
  2 siblings, 0 replies; 8+ messages in thread
From: tip-bot for Tom Lendacky @ 2018-01-13 10:33 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: gregkh, bp, luto, jikos, tim.c.chen, peterz, linux-kernel,
	thomas.lendacky, torvalds, keescook, jpoimboe, tglx, dwmw, mingo,
	pjt, hpa, riel, dave.hansen, dan.j.williams

Commit-ID:  2eb9137c8744f9adf1670e9aa52850948a30112b
Gitweb:     https://git.kernel.org/tip/2eb9137c8744f9adf1670e9aa52850948a30112b
Author:     Tom Lendacky <thomas.lendacky@amd.com>
AuthorDate: Fri, 12 Jan 2018 19:07:28 -0600
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sat, 13 Jan 2018 11:28:50 +0100

x86/retpoline: Use LFENCE instead of PAUSE in the retpoline/RSB filling RSB macros

The PAUSE instruction is currently used in the retpoline and RSB filling
macros as a speculation trap.  The use of PAUSE was originally suggested
because it showed a very, very small difference in the amount of
cycles/time used to execute the retpoline as compared to LFENCE.

On AMD, the PAUSE instruction is not a serializing instruction, so the
PAUSE/JMP loop will use excess power as it is speculated over waiting
for return to mispredict to the correct target.

The RSB filling macro is applicable to AMD, and, if software is unable to
verify that LFENCE is serializing on AMD (possible when running under a
hypervisor), the generic retpoline support will be used and, so, is also
applicable to AMD.  Change the use of PAUSE to LFENCE.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Link: http://lkml.kernel.org/r/20180113010728.27928.8537.stgit@tlendack-t1.amdoffice.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/nospec-branch.h | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 402a11c..2c4a09a 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -11,7 +11,7 @@
  * Fill the CPU return stack buffer.
  *
  * Each entry in the RSB, if used for a speculative 'ret', contains an
- * infinite 'pause; jmp' loop to capture speculative execution.
+ * infinite 'lfence; jmp' loop to capture speculative execution.
  *
  * This is required in various cases for retpoline and IBRS-based
  * mitigations for the Spectre variant 2 vulnerability. Sometimes to
@@ -37,12 +37,12 @@
 771:						\
 	call	772f;				\
 773:	/* speculation trap */			\
-	pause;					\
+	lfence;					\
 	jmp	773b;				\
 772:						\
 	call	774f;				\
 775:	/* speculation trap */			\
-	pause;					\
+	lfence;					\
 	jmp	775b;				\
 774:						\
 	dec	reg;				\
@@ -72,7 +72,7 @@
 .macro RETPOLINE_JMP reg:req
 	call	.Ldo_rop_\@
 .Lspec_trap_\@:
-	pause
+	lfence
 	jmp	.Lspec_trap_\@
 .Ldo_rop_\@:
 	mov	\reg, (%_ASM_SP)
@@ -164,7 +164,7 @@
 	"       jmp    904f;\n"					\
 	"       .align 16\n"					\
 	"901:	call   903f;\n"					\
-	"902:	pause;\n"					\
+	"902:	lfence;\n"					\
 	"       jmp    902b;\n"					\
 	"       .align 16\n"					\
 	"903:	addl   $4, %%esp;\n"				\

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v1] x86/retpoline: Use lfence in the retpoline/RSB filling RSB macros
  2018-01-13  1:07 [PATCH v1] x86/retpoline: Use lfence in the retpoline/RSB filling RSB macros Tom Lendacky
  2018-01-13  1:53 ` Dan Williams
  2018-01-13 10:33 ` [tip:x86/pti] x86/retpoline: Use LFENCE instead of PAUSE " tip-bot for Tom Lendacky
@ 2018-01-13 10:46 ` Woodhouse, David
  2018-01-13 14:07   ` Van De Ven, Arjan
  2 siblings, 1 reply; 8+ messages in thread
From: Woodhouse, David @ 2018-01-13 10:46 UTC (permalink / raw)
  To: Tom Lendacky, x86, linux-kernel, Van De Ven, Arjan, asit.k.mallick
  Cc: Rik van Riel, Andi Kleen, Josh Poimboeuf, Peter Zijlstra,
	Linus Torvalds, Jiri Kosina, Dan Williams, Dave Hansen,
	Borislav Petkov, Andy Lutomirski, Kees Cook, Thomas Gleixner,
	Tim Chen, Greg Kroah-Hartman, Paul Turner

[-- Attachment #1: Type: text/plain, Size: 2702 bytes --]

On Fri, 2018-01-12 at 19:07 -0600, Tom Lendacky wrote:
> The pause instruction is currently used in the retpoline and RSB filling
> macros as a speculation trap.  The use of pause was originally suggested
> because it showed a very, very small difference in the amount of
> cycles/time used to execute the retpoline as compared to lfence.  On AMD,
> the pause instruction is not a serializing instruction, so the pause/jmp
> loop will use excess power as it is speculated over waiting for return
> to mispredict to the correct target.
> 
> The RSB filling macro is applicable to AMD, and, if software is unable to
> verify that lfence is serializing on AMD (possible when running under a
> hypervisor), the generic retpoline support will be used and, so, is also
> applicable to AMD.  Change the use of pause to lfence.
> 
> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>

Conditionally-Acked-by: David Woodhouse <dwmw@amazon.co.uk>

The condition being, as noted, that I'd really like to see it acked by
Arjan/Asit and Paul.



> ---
>  arch/x86/include/asm/nospec-branch.h |   10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
> index 402a11c..2c4a09a 100644
> --- a/arch/x86/include/asm/nospec-branch.h
> +++ b/arch/x86/include/asm/nospec-branch.h
> @@ -11,7 +11,7 @@
>   * Fill the CPU return stack buffer.
>   *
>   * Each entry in the RSB, if used for a speculative 'ret', contains an
> - * infinite 'pause; jmp' loop to capture speculative execution.
> + * infinite 'lfence; jmp' loop to capture speculative execution.
>   *
>   * This is required in various cases for retpoline and IBRS-based
>   * mitigations for the Spectre variant 2 vulnerability. Sometimes to
> @@ -37,12 +37,12 @@
>  771:						\
>  	call	772f;				\
>  773:	/* speculation trap */			\
> -	pause;					\
> +	lfence;					\
>  	jmp	773b;				\
>  772:						\
>  	call	774f;				\
>  775:	/* speculation trap */			\
> -	pause;					\
> +	lfence;					\
>  	jmp	775b;				\
>  774:						\
>  	dec	reg;				\
> @@ -72,7 +72,7 @@
>  .macro RETPOLINE_JMP reg:req
>  	call	.Ldo_rop_\@
>  .Lspec_trap_\@:
> -	pause
> +	lfence
>  	jmp	.Lspec_trap_\@
>  .Ldo_rop_\@:
>  	mov	\reg, (%_ASM_SP)
> @@ -164,7 +164,7 @@
>  	"       jmp    904f;\n"					\
>  	"       .align 16\n"					\
>  	"901:	call   903f;\n"					\
> -	"902:	pause;\n"					\
> +	"902:	lfence;\n"					\
>  	"       jmp    902b;\n"					\
>  	"       .align 16\n"					\
>  	"903:	addl   $4, %%esp;\n"				\
> 
> 

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5210 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH v1] x86/retpoline: Use lfence in the retpoline/RSB filling RSB macros
  2018-01-13 10:46 ` [PATCH v1] x86/retpoline: Use lfence " Woodhouse, David
@ 2018-01-13 14:07   ` Van De Ven, Arjan
  2018-01-13 21:13     ` Tom Lendacky
  0 siblings, 1 reply; 8+ messages in thread
From: Van De Ven, Arjan @ 2018-01-13 14:07 UTC (permalink / raw)
  To: Woodhouse, David, thomas.lendacky, linux-kernel, x86, Mallick, Asit K
  Cc: tim.c.chen, peterz, torvalds, tglx, jpoimboe, ak, Williams,
	Dan J, riel, keescook, luto, pjt, bp, Hansen, Dave, jikos,
	gregkh

> > The RSB filling macro is applicable to AMD, and, if software is unable to
> > verify that lfence is serializing on AMD (possible when running under a
> > hypervisor), the generic retpoline support will be used and, so, is also
> > applicable to AMD.  Change the use of pause to lfence.
> >
> > Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> 
> Conditionally-Acked-by: David Woodhouse <dwmw@amazon.co.uk>


pause is technically the "save me power" instruction

how about a compromise where we do a double:

pause
lfence
jmp <up>

as sequence... that way if the branch recovery is fast, we get the performance of pause, but if it takes a while, on AMD you get the behavior of lfence?



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v1] x86/retpoline: Use lfence in the retpoline/RSB filling RSB macros
  2018-01-13 14:07   ` Van De Ven, Arjan
@ 2018-01-13 21:13     ` Tom Lendacky
  2018-01-13 21:36       ` Thomas Gleixner
  0 siblings, 1 reply; 8+ messages in thread
From: Tom Lendacky @ 2018-01-13 21:13 UTC (permalink / raw)
  To: Van De Ven, Arjan, Woodhouse, David, linux-kernel, x86, Mallick, Asit K
  Cc: tim.c.chen, peterz, torvalds, tglx, jpoimboe, ak, Williams,
	Dan J, riel, keescook, luto, pjt, bp, Hansen, Dave, jikos,
	gregkh

On 1/13/2018 8:07 AM, Van De Ven, Arjan wrote:
>>> The RSB filling macro is applicable to AMD, and, if software is unable to
>>> verify that lfence is serializing on AMD (possible when running under a
>>> hypervisor), the generic retpoline support will be used and, so, is also
>>> applicable to AMD.  Change the use of pause to lfence.
>>>
>>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>>
>> Conditionally-Acked-by: David Woodhouse <dwmw@amazon.co.uk>
> 
> 
> pause is technically the "save me power" instruction
> 
> how about a compromise where we do a double:
> 
> pause
> lfence
> jmp <up>
> 
> as sequence... that way if the branch recovery is fast, we get the performance of pause, but if it takes a while, on AMD you get the behavior of lfence?

That should work on AMD.

Thanks,
Tom

> 
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v1] x86/retpoline: Use lfence in the retpoline/RSB filling RSB macros
  2018-01-13 21:13     ` Tom Lendacky
@ 2018-01-13 21:36       ` Thomas Gleixner
  0 siblings, 0 replies; 8+ messages in thread
From: Thomas Gleixner @ 2018-01-13 21:36 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: Van De Ven, Arjan, Woodhouse, David, linux-kernel, x86, Mallick,
	Asit K, tim.c.chen, peterz, torvalds, jpoimboe, ak, Williams,
	Dan J, riel, keescook, luto, pjt, bp, Hansen, Dave, jikos,
	gregkh

[-- Attachment #1: Type: text/plain, Size: 967 bytes --]

On Sat, 13 Jan 2018, Tom Lendacky wrote:

> On 1/13/2018 8:07 AM, Van De Ven, Arjan wrote:
> >>> The RSB filling macro is applicable to AMD, and, if software is unable to
> >>> verify that lfence is serializing on AMD (possible when running under a
> >>> hypervisor), the generic retpoline support will be used and, so, is also
> >>> applicable to AMD.  Change the use of pause to lfence.
> >>>
> >>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> >>
> >> Conditionally-Acked-by: David Woodhouse <dwmw@amazon.co.uk>
> > 
> > 
> > pause is technically the "save me power" instruction
> > 
> > how about a compromise where we do a double:
> > 
> > pause
> > lfence
> > jmp <up>
> > 
> > as sequence... that way if the branch recovery is fast, we get the performance of pause, but if it takes a while, on AMD you get the behavior of lfence?
> 
> That should work on AMD.

I zapped the commit from tip for now until this discussion is resolved.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-01-13 21:36 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-13  1:07 [PATCH v1] x86/retpoline: Use lfence in the retpoline/RSB filling RSB macros Tom Lendacky
2018-01-13  1:53 ` Dan Williams
2018-01-13  2:22   ` Tom Lendacky
2018-01-13 10:33 ` [tip:x86/pti] x86/retpoline: Use LFENCE instead of PAUSE " tip-bot for Tom Lendacky
2018-01-13 10:46 ` [PATCH v1] x86/retpoline: Use lfence " Woodhouse, David
2018-01-13 14:07   ` Van De Ven, Arjan
2018-01-13 21:13     ` Tom Lendacky
2018-01-13 21:36       ` Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).