linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Laight <David.Laight@ACULAB.COM>
To: 'Peter Zijlstra' <peterz@infradead.org>
Cc: "x86@kernel.org" <x86@kernel.org>,
	"jpoimboe@redhat.com" <jpoimboe@redhat.com>,
	"jgross@suse.com" <jgross@suse.com>,
	"mbenes@suse.com" <mbenes@suse.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: RE: [PATCH v2 03/14] x86/retpoline: Simplify retpolines
Date: Mon, 22 Mar 2021 15:41:56 +0000	[thread overview]
Message-ID: <b8e8cea6ac884c04b8c9e7a479fd2208@AcuMS.aculab.com> (raw)
In-Reply-To: <YFhkRN8dxHqItHy8@hirez.programming.kicks-ass.net>

From: Peter Zijlstra
> Sent: 22 March 2021 09:33
> 
> On Fri, Mar 19, 2021 at 05:18:14PM +0000, David Laight wrote:
> > From: Peter Zijlstra
> > > Sent: 18 March 2021 17:11
> > >
> > > Due to commit c9c324dc22aa ("objtool: Support stack layout changes
> > > in alternatives"), it is possible to simplify the retpolines.
> > >
> > ...
> > > Notice that since the longest alternative sequence is now:
> > >
> > >    0:   e8 07 00 00 00          callq  c <.altinstr_replacement+0xc>
> > >    5:   f3 90                   pause
> > >    7:   0f ae e8                lfence
> > >    a:   eb f9                   jmp    5 <.altinstr_replacement+0x5>
> > >    c:   48 89 04 24             mov    %rax,(%rsp)
> > >   10:   c3                      retq
> > >
> > > 17 bytes, we have 15 bytes NOP at the end of our 32 byte slot. (IOW,
> > > if we can shrink the retpoline by 1 byte we can pack it more dense)
> >
> > I'm intrigued about the lfence after the pause.
> > Clearly this is for very warped cpu behaviour.
> > To get to the pause you have to be speculating past an
> > unconditional call.
> 
> Please read up on retpoline... That's the speculation trap. The warped
> CPU behaviour is called Spectre-v2.

There is 'warped' and 'very warped' :-)
Most of Spectre-v2 (don't search for Spectra v2 by mistake) is avoiding
the indirect branch prediction - which I knew.

I think the 'pause' is only executed is the cpu speculates through
the mov and retq; rather the speculating past the 'call' - which
some ARM cpu seem to do.

> For others, the obvious alternative is the below; and possibly we could
> then also remove the loop.

Another alternative is 'hlt' with the loop (rather than int3).

> The original retpoline, as per Paul's article has: 1: pause; jmp 1b;.
> That is, it lacks the LFENCE we have and would also fit 16 bytes.

Yes. Just 'jmps .' ought to be enough to block any side effects
of speculative execution.
Adding 'pause' is 'a good idea' for any spin loop.

There might be another lurking performance issue.
Skylake increased the execution time of pause from ~10 to ~140 clocks.
Reading between the lines I suspect this applies to speculatively
executed pause.
If that happens on a regular basis it might be noticeable.
So it may even be best to remove the pause!

As you say, the original retpoline lacked the lfence.
The only 'load' instruction in that sequence is the 'retq'.
It has to be said that given all (normal) loads are executed
in program order and all (normal) stores are also executed
in program order I've never actually seen what either
lfence or sfence actually do for you.
(mfence synchronises reads and writes - so may be useful.)
The (pre spectre) copies of the intel pdf's I have don't say
anything about lfence being any kind of a barrier against
speculative memory reads.

If the retpoline doesn't fit in 16 bytes it is almost certainly
(probably) worth putting the called label at offset 16.
This would mean that there is only one 16-byte code block
read from the call target.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


  reply	other threads:[~2021-03-22 15:42 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-18 17:11 [PATCH v2 00/14] x86,objtool: Optimize !RETPOLINE Peter Zijlstra
2021-03-18 17:11 ` [PATCH v2 01/14] x86: Add insn_decode_kernel() Peter Zijlstra
2021-03-19 10:40   ` Borislav Petkov
2021-03-18 17:11 ` [PATCH v2 02/14] x86/alternatives: Optimize optimize_nops() Peter Zijlstra
2021-03-21 12:06   ` Borislav Petkov
2021-03-22  8:17     ` Peter Zijlstra
2021-03-22 11:07       ` Borislav Petkov
2021-03-18 17:11 ` [PATCH v2 03/14] x86/retpoline: Simplify retpolines Peter Zijlstra
2021-03-19 17:18   ` David Laight
2021-03-22  9:32     ` Peter Zijlstra
2021-03-22 15:41       ` David Laight [this message]
2021-03-18 17:11 ` [PATCH v2 04/14] objtool: Correctly handle retpoline thunk calls Peter Zijlstra
2021-03-18 17:11 ` [PATCH v2 05/14] objtool: Per arch retpoline naming Peter Zijlstra
2021-03-19  2:38   ` Josh Poimboeuf
2021-03-19  9:07     ` Peter Zijlstra
2021-03-18 17:11 ` [PATCH v2 06/14] objtool: Fix static_call list generation Peter Zijlstra
2021-03-22 12:44   ` Miroslav Benes
2021-03-18 17:11 ` [PATCH v2 07/14] objtool: Rework rebuild_reloc logic Peter Zijlstra
2021-03-18 17:11 ` [PATCH v2 08/14] objtool: Add elf_create_reloc() helper Peter Zijlstra
2021-03-19  1:42   ` Josh Poimboeuf
2021-03-19  9:47     ` Peter Zijlstra
2021-03-19 15:12       ` Josh Poimboeuf
2021-03-19 15:24         ` Peter Zijlstra
2021-03-19 15:37           ` Josh Poimboeuf
2021-03-18 17:11 ` [PATCH v2 09/14] objtool: Extract elf_strtab_concat() Peter Zijlstra
2021-03-19  2:10   ` Josh Poimboeuf
2021-03-19  9:52     ` Peter Zijlstra
2021-03-18 17:11 ` [PATCH v2 10/14] objtool: Extract elf_symbol_add() Peter Zijlstra
2021-03-19  2:14   ` Josh Poimboeuf
2021-03-19  9:54     ` Peter Zijlstra
2021-03-19 15:04       ` Josh Poimboeuf
2021-03-18 17:11 ` [PATCH v2 11/14] objtool: Add elf_create_undef_symbol() Peter Zijlstra
2021-03-19  2:29   ` Josh Poimboeuf
2021-03-19  7:56     ` Peter Zijlstra
2021-03-18 17:11 ` [PATCH v2 12/14] objtool: Allow archs to rewrite retpolines Peter Zijlstra
2021-03-19  2:54   ` Josh Poimboeuf
2021-03-19 11:21     ` Peter Zijlstra
2021-03-19 13:28       ` Peter Zijlstra
2021-03-18 17:11 ` [PATCH v2 13/14] objtool: Skip magical retpoline .altinstr_replacement Peter Zijlstra
2021-03-18 17:11 ` [PATCH v2 14/14] objtool,x86: Rewrite retpoline thunk calls Peter Zijlstra
2021-03-19  3:29   ` Josh Poimboeuf
2021-03-19  8:06     ` Peter Zijlstra
2021-03-19 15:30       ` Josh Poimboeuf
2021-03-19 15:56         ` Peter Zijlstra
2021-03-19 22:52           ` Josh Poimboeuf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b8e8cea6ac884c04b8c9e7a479fd2208@AcuMS.aculab.com \
    --to=david.laight@aculab.com \
    --cc=jgross@suse.com \
    --cc=jpoimboe@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mbenes@suse.com \
    --cc=peterz@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).