On Thu, 2018-01-04 at 10:36 -0800, Alexei Starovoitov wrote: > > Pretty much. > Paul's writeup: https://support.google.com/faqs/answer/7625886 > tldr: jmp *%r11 gets converted to: > call set_up_target; > capture_spec: >   pause; >   jmp capture_spec; > set_up_target: >   mov %r11, (%rsp); >   ret; > where capture_spec part will be looping speculatively. That is almost identical to what's in my latest patch set, except that the capture_spec loop has 'lfence' instead of 'pause'. As Andi says, I'd want to see explicit approval from the CPU architects for making that change. We've already had false starts there — for a long time, Intel thought that a much simpler option with an lfence after the register load was sufficient, and then eventually worked out that in some rare cases it wasn't. While AMD still seem to think it *is* sufficient for them, apparently.