On Thu, 2018-01-25 at 13:07 +0100, Borislav Petkov wrote: > On Fri, Jan 12, 2018 at 03:37:49AM -0800, tip-bot for David Woodhouse wrote: > > > > +/* > > + * On VMEXIT we must ensure that no RSB predictions learned in the guest > > + * can be followed in the host, by overwriting the RSB completely. Both > > + * retpoline and IBRS mitigations for Spectre v2 need this; only on future > > + * CPUs with IBRS_ATT *might* it be avoided. > > + */ > > +static inline void vmexit_fill_RSB(void) > > +{ > > +#ifdef CONFIG_RETPOLINE > > + unsigned long loops = RSB_CLEAR_LOOPS / 2; > > + > > + asm volatile (ANNOTATE_NOSPEC_ALTERNATIVE > > +       ALTERNATIVE("jmp 910f", > > +   __stringify(__FILL_RETURN_BUFFER(%0, RSB_CLEAR_LOOPS, %1)), > > +   X86_FEATURE_RETPOLINE) > > +       "910:" > > +       : "=&r" (loops), ASM_CALL_CONSTRAINT > > +       : "r" (loops) : "memory" ); > > +#endif > > +} > Btw, > > this thing is a real pain to backport to older kernels without breaking > kABI (I know, I know, latter sucks but it is what it is) I haven't had lunch yet, so I don't feel queasy and I'm vaguely interested... *why* does it break kABI? > so I'm thinking we could simplify that thing regardless. > > As it is, 41 bytes get replaced currently: > > [    0.437005] apply_alternatives: feat: 7*32+12, old: (        (ptrval), len: 43), repl: (        (ptrval), len: 43), pad: 41 > [    0.438885]         (ptrval): old_insn: eb 29 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 > [    0.440001]         (ptrval): rpl_insn: 48 c7 c0 10 00 00 00 e8 07 00 00 00 f3 90 0f ae e8 eb f9 e8 07 00 00 00 f3 90 0f ae e8 eb f9 48 ff c8 75 e3 48 81 c4 00 01 00 00 > [    0.444002]         (ptrval): final_insn: 48 c7 c0 10 00 00 00 e8 07 00 00 00 f3 90 0f ae e8 eb f9 e8 07 00 00 00 f3 90 0f ae e8 eb f9 48 ff c8 75 e3 48 81 c4 00 01 00 00 > > Not that it doesn't work but the less bytes we replace, the better. > > So it could be made to simply call two functions. The replacing then > turns into: > > [    0.438154] apply_alternatives: feat: 7*32+12, old: (ffffffff81060b60 len: 5), repl: (ffffffff82434692, len: 5), pad: 0 > [    0.440002] ffffffff81060b60: old_insn: e8 ab 73 01 00 > [    0.441003] ffffffff82434692: rpl_insn: e8 89 38 c4 fe > [    0.441966] apply_alternatives: Fix CALL offset: 0x173bb, CALL 0xffffffff81077f20 > [    0.444002] ffffffff81060b60: final_insn: e8 bb 73 01 00 > > The "old_insn" above is: > ffffffff81060b60:       e8 ab 73 01 > 00          callq  ffffffff81077f10 <__fill_rsb_nop> > > and final_insn is > ffffffff82434692:       e8 89 38 c4 > fe          callq  ffffffff81077f20 <__fill_rsb> > > so both CALLs with immediates for which there's no speculation going > on. > > I had to add a new alternative_void_call() macro for that but that's > straight-forward. Also, this causes objtool to segfault so Josh and I > need to look at that first. > > Other than that, it gets a lot simpler this way: > > -- > diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h > index cf5961ca8677..a84863c1b7d3 100644 > --- a/arch/x86/include/asm/alternative.h > +++ b/arch/x86/include/asm/alternative.h > @@ -210,6 +210,11 @@ static inline int alternatives_text_reserved(void *start, void *end) >   asm volatile (ALTERNATIVE("call %P[old]", "call %P[new]", feature) \ >   : output : [old] "i" (oldfunc), [new] "i" (newfunc), ## input) >   > +/* Like alternative_io, but for replacing a direct call with another one. */ > +#define alternative_void_call(oldfunc, newfunc, feature, input...) \ > + asm volatile (ALTERNATIVE("call %P[old]", "call %P[new]", feature) \ > + : : [old] "i" (oldfunc), [new] "i" (newfunc), ## input) But you aren't doing the call at all in the other case, and alternatives *always* handled the case where the first 'alternative' instruction was a branch, didn't it? So couldn't it just be alternative(nop, call __fill_rsb_func)? But I still don't understand why it matters. > +void __fill_rsb(void) > +{ > + unsigned long loops; > + > + asm volatile (__stringify(__FILL_RETURN_BUFFER(%0, RSB_CLEAR_LOOPS, %1)) > +       : "=r" (loops), ASM_CALL_CONSTRAINT > +       : : "memory" ); > +} > +#endif The out-of-line function should be __clear_rsb() if it's using RSB_CLEAR_LOOPS, and __fill_rsb() if it's using RSB_FILL_LOOPS. I think we're only using one right now but Andi at least is posting patches which use the other, as part of the Skylake clusterfuck.