From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753200AbdG2BGf (ORCPT ); Fri, 28 Jul 2017 21:06:35 -0400 Received: from mail-pf0-f180.google.com ([209.85.192.180]:35704 "EHLO mail-pf0-f180.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752848AbdG2BGe (ORCPT ); Fri, 28 Jul 2017 21:06:34 -0400 Date: Fri, 28 Jul 2017 18:06:32 -0700 From: Matthias Kaehlcke To: Josh Poimboeuf Cc: Andrey Ryabinin , Chris J Arges , Borislav Petkov , Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , "x86@kernel.org" , LKML , Douglas Anderson , Michael Davidson , Greg Hackmann , Nick Desaulniers , Stephen Hines , Kees Cook , Arnd Bergmann , Bernhard =?utf-8?Q?Rosenkr=C3=A4nzer?= Subject: Re: [PATCH] Revert "x86/uaccess: Add stack frame output operand in get_user() inline asm" Message-ID: <20170729010632.GE84665@google.com> References: <20170713211245.GG95735@google.com> <20170713213406.gx4ixkx6kxa4ppps@treble> <20170713215704.GJ95735@google.com> <20170719174630.kz5g553evcrnirmr@treble> <20170720151813.5wnpsb5wy7bqrpec@treble> <20170720205652.patvjnfqymrv73ho@treble> <20170729003852.GD84665@google.com> <20170729005521.2lhekao5fvb452oy@treble> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20170729005521.2lhekao5fvb452oy@treble> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org El Fri, Jul 28, 2017 at 07:55:21PM -0500 Josh Poimboeuf ha dit: > On Fri, Jul 28, 2017 at 05:38:52PM -0700, Matthias Kaehlcke wrote: > > El Thu, Jul 20, 2017 at 03:56:52PM -0500 Josh Poimboeuf ha dit: > > > > > On Thu, Jul 20, 2017 at 06:30:24PM +0300, Andrey Ryabinin wrote: > > > > FWIW bellow is my understanding of what's going on. > > > > > > > > It seems clang treats local named register almost the same as ordinary > > > > local variables. > > > > The only difference is that before reading the register variable clang > > > > puts variable's value into the specified register. > > > > > > > > So clang just assigns stack slot for the variable __sp where it's > > > > going to keep variable's value. > > > > But since __sp is unitialized (we haven't assign anything to it), the > > > > value of the __sp is some garbage from stack. > > > > inline asm specifies __sp as input, so clang assumes that it have to > > > > load __sp into 'rsp' because inline asm is going to use > > > > it. And it just loads garbage from stack into 'rsp' > > > > > > > > In fact, such behavior (I mean storing the value on stack and loading > > > > into reg before the use) is very useful. > > > > Clang's behavior allows to keep the value assigned to the > > > > call-clobbered register across the function calls. > > > > > > > > Unlike clang, gcc assigns value to the register right away and doesn't > > > > store the value anywhere else. So if the reg is > > > > call clobbered register you have to be absolutely sure that there is > > > > no subsequent function call that might clobber the register. > > > > > > > > E.g. see some real examples > > > > https://patchwork.kernel.org/patch/4111971/ or 98d4ded60bda("msm: scm: > > > > Fix improper register assignment"). > > > > These bugs shouldn't happen with clang. > > > > > > > > But the global named register works slightly differently in clang. For > > > > the global, the value is just the value of the register itself, > > > > whatever it is. Read/write from global named register is just like > > > > direct read/write to the register > > > > > > Thanks, that clears up a lot of the confusion for me. > > > > > > Still, unfortunately, I don't think that's going to work for GCC. > > > Changing the '__sp' register variable to global in the header file > > > causes it to make a *bunch* of changes across the kernel, even in > > > functions which don't do inline asm. It seems to be disabling some > > > optimizations across the board. > > > > > > I do have another idea, which is to replace all uses of > > > > > > asm(" ... call foo ... " : outputs : inputs : clobbers); > > > > > > with a new ASM_CALL macro: > > > > > > ASM_CALL(" ... call foo ... ", outputs, inputs, clobbers); > > > > > > Then the compiler differences can be abstracted out, with GCC adding > > > "sp" as an output constraint and clang doing nothing (for now). > > > > The idea sounds interesting, however I see two issues with ASM_CALL(): > > > > Inline assembly expects the individual elements of outputs, inputs and > > clobbers to be comma separated, and so does the macro for it's > > parameters. > > I think this isn't a problem, see the (untested and unfinished) patch > below for an idea of how the arguments can be handled. Good point! > > The assembler template can refer to the position of output and input > > operands, adding "sp" as output changes the positions of the inputs > > wrt to the clang version. > > True, though I think we could convert all ASM_CALL users to use named > operands instead of the (bug-prone) numbered ones. It would be an > improvement regardless. Agreed, that sounds like a good solution and a desirable improvement. > > Not sure how to move forward from here. Not even using an ugly #ifdef > > seems to be a halfway reasonable solution, since get_user() isn't the > > only place using this construct and #ifdefs would result in highly > > redundant macros in multiple places. > > I still think ASM_CALL might work. The below patch is what I came up > with last week before I got pulled into some other work. I'll be out on > vacation next week but I hope to finish the patch after that. Thanks for working on this. Enjoy your vacation! Matthias > diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h > index 1b020381ab38..5da4bcbeebab 100644 > --- a/arch/x86/include/asm/alternative.h > +++ b/arch/x86/include/asm/alternative.h > @@ -216,14 +216,14 @@ static inline int alternatives_text_reserved(void *start, void *end) > * Otherwise, old function is used. > */ > #define alternative_call_2(oldfunc, newfunc1, feature1, newfunc2, feature2, \ > - output, input...) \ > + output, input, clobbers...) \ > { \ > - register void *__sp asm(_ASM_SP); \ > - asm volatile (ALTERNATIVE_2("call %P[old]", "call %P[new1]", feature1,\ > - "call %P[new2]", feature2) \ > - : output, "+r" (__sp) \ > - : [old] "i" (oldfunc), [new1] "i" (newfunc1), \ > - [new2] "i" (newfunc2), ## input); \ > + ASM_CALL(ALTERNATIVE_2("call %P[old]", "call %P[new1]", feature1, \ > + "call %P[new2]", feature2), \ > + ARGS(output), \ > + ARGS([old] "i" (oldfunc), [new1] "i" (newfunc1), \ > + [new2] "i" (newfunc2), ARGS(input)), \ > + ## clobbers); \ > } > > /* > diff --git a/arch/x86/include/asm/page_64.h b/arch/x86/include/asm/page_64.h > index b4a0d43248cf..21adcc8e739f 100644 > --- a/arch/x86/include/asm/page_64.h > +++ b/arch/x86/include/asm/page_64.h > @@ -45,8 +45,8 @@ static inline void clear_page(void *page) > clear_page_rep, X86_FEATURE_REP_GOOD, > clear_page_erms, X86_FEATURE_ERMS, > "=D" (page), > - "0" (page) > - : "memory", "rax", "rcx"); > + "0" (page), > + ARGS("memory", "rax", "rcx")); > } > > void copy_page(void *to, void *from); > diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h > index cb976bab6299..2a585b799a3e 100644 > --- a/arch/x86/include/asm/paravirt_types.h > +++ b/arch/x86/include/asm/paravirt_types.h > @@ -471,8 +471,7 @@ int paravirt_disable_iospace(void); > */ > #ifdef CONFIG_X86_32 > #define PVOP_VCALL_ARGS \ > - unsigned long __eax = __eax, __edx = __edx, __ecx = __ecx; \ > - register void *__sp asm("esp") > + unsigned long __eax = __eax, __edx = __edx, __ecx = __ecx; > #define PVOP_CALL_ARGS PVOP_VCALL_ARGS > > #define PVOP_CALL_ARG1(x) "a" ((unsigned long)(x)) > @@ -492,8 +491,7 @@ int paravirt_disable_iospace(void); > /* [re]ax isn't an arg, but the return val */ > #define PVOP_VCALL_ARGS \ > unsigned long __edi = __edi, __esi = __esi, \ > - __edx = __edx, __ecx = __ecx, __eax = __eax; \ > - register void *__sp asm("rsp") > + __edx = __edx, __ecx = __ecx, __eax = __eax; > #define PVOP_CALL_ARGS PVOP_VCALL_ARGS > > #define PVOP_CALL_ARG1(x) "D" ((unsigned long)(x)) > @@ -541,24 +539,24 @@ int paravirt_disable_iospace(void); > /* This is 32-bit specific, but is okay in 64-bit */ \ > /* since this condition will never hold */ \ > if (sizeof(rettype) > sizeof(unsigned long)) { \ > - asm volatile(pre \ > - paravirt_alt(PARAVIRT_CALL) \ > - post \ > - : call_clbr, "+r" (__sp) \ > - : paravirt_type(op), \ > - paravirt_clobber(clbr), \ > - ##__VA_ARGS__ \ > - : "memory", "cc" extra_clbr); \ > + ASM_CALL(pre \ > + paravirt_alt(PARAVIRT_CALL) \ > + post, \ > + ARGS(call_clbr), \ > + ARGS(paravirt_type(op), \ > + paravirt_clobber(clbr), \ > + ##__VA_ARGS__), \ > + ARGS("memory", "cc" extra_clbr)); \ > __ret = (rettype)((((u64)__edx) << 32) | __eax); \ > } else { \ > - asm volatile(pre \ > - paravirt_alt(PARAVIRT_CALL) \ > - post \ > - : call_clbr, "+r" (__sp) \ > - : paravirt_type(op), \ > - paravirt_clobber(clbr), \ > - ##__VA_ARGS__ \ > - : "memory", "cc" extra_clbr); \ > + ASM_CALL(pre \ > + paravirt_alt(PARAVIRT_CALL) \ > + post, \ > + ARGS(call_clbr), \ > + ARGS(paravirt_type(op), \ > + paravirt_clobber(clbr), \ > + ##__VA_ARGS__), \ > + ARGS("memory", "cc" extra_clbr)); \ > __ret = (rettype)(__eax & PVOP_RETMASK(rettype)); \ > } \ > __ret; \ > @@ -578,14 +576,14 @@ int paravirt_disable_iospace(void); > ({ \ > PVOP_VCALL_ARGS; \ > PVOP_TEST_NULL(op); \ > - asm volatile(pre \ > - paravirt_alt(PARAVIRT_CALL) \ > - post \ > - : call_clbr, "+r" (__sp) \ > - : paravirt_type(op), \ > - paravirt_clobber(clbr), \ > - ##__VA_ARGS__ \ > - : "memory", "cc" extra_clbr); \ > + ASM_CALL(pre \ > + paravirt_alt(PARAVIRT_CALL) \ > + post, \ > + ARGS(call_clbr), \ > + ARGS(paravirt_type(op), \ > + paravirt_clobber(clbr), \ > + ##__VA_ARGS__), \ > + ARGS("memory", "cc" extra_clbr)); \ > }) > > #define __PVOP_VCALL(op, pre, post, ...) \ > diff --git a/arch/x86/include/asm/preempt.h b/arch/x86/include/asm/preempt.h > index ec1f3c651150..19a034cbde37 100644 > --- a/arch/x86/include/asm/preempt.h > +++ b/arch/x86/include/asm/preempt.h > @@ -102,16 +102,14 @@ static __always_inline bool should_resched(int preempt_offset) > extern asmlinkage void ___preempt_schedule(void); > # define __preempt_schedule() \ > ({ \ > - register void *__sp asm(_ASM_SP); \ > - asm volatile ("call ___preempt_schedule" : "+r"(__sp)); \ > + ASM_CALL("call ___preempt_schedule",,); \ > }) > > extern asmlinkage void preempt_schedule(void); > extern asmlinkage void ___preempt_schedule_notrace(void); > # define __preempt_schedule_notrace() \ > ({ \ > - register void *__sp asm(_ASM_SP); \ > - asm volatile ("call ___preempt_schedule_notrace" : "+r"(__sp)); \ > + ASM_CALL("call ___preempt_schedule_notrace",,); \ > }) > extern asmlinkage void preempt_schedule_notrace(void); > #endif > diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h > index 550bc4ab0df4..2bbfb777c8bb 100644 > --- a/arch/x86/include/asm/processor.h > +++ b/arch/x86/include/asm/processor.h > @@ -677,20 +677,19 @@ static inline void sync_core(void) > * Like all of Linux's memory ordering operations, this is a > * compiler barrier as well. > */ > - register void *__sp asm(_ASM_SP); > > #ifdef CONFIG_X86_32 > - asm volatile ( > + ASM_CALL( > "pushfl\n\t" > "pushl %%cs\n\t" > "pushl $1f\n\t" > "iret\n\t" > - "1:" > - : "+r" (__sp) : : "memory"); > + "1:", > + , , "memory"); > #else > unsigned int tmp; > > - asm volatile ( > + ASM_CALL( > UNWIND_HINT_SAVE > "mov %%ss, %0\n\t" > "pushq %q0\n\t" > @@ -702,8 +701,8 @@ static inline void sync_core(void) > "pushq $1f\n\t" > "iretq\n\t" > UNWIND_HINT_RESTORE > - "1:" > - : "=&r" (tmp), "+r" (__sp) : : "cc", "memory"); > + "1:", > + "=&r" (tmp), , ARGS("cc", "memory")); > #endif > } > > diff --git a/arch/x86/include/asm/rwsem.h b/arch/x86/include/asm/rwsem.h > index a34e0d4b957d..9d20f4fb5d7c 100644 > --- a/arch/x86/include/asm/rwsem.h > +++ b/arch/x86/include/asm/rwsem.h > @@ -99,25 +99,24 @@ static inline bool __down_read_trylock(struct rw_semaphore *sem) > /* > * lock for writing > */ > -#define ____down_write(sem, slow_path) \ > -({ \ > - long tmp; \ > - struct rw_semaphore* ret; \ > - register void *__sp asm(_ASM_SP); \ > - \ > - asm volatile("# beginning down_write\n\t" \ > - LOCK_PREFIX " xadd %1,(%4)\n\t" \ > - /* adds 0xffff0001, returns the old value */ \ > - " test " __ASM_SEL(%w1,%k1) "," __ASM_SEL(%w1,%k1) "\n\t" \ > - /* was the active mask 0 before? */\ > - " jz 1f\n" \ > - " call " slow_path "\n" \ > - "1:\n" \ > - "# ending down_write" \ > - : "+m" (sem->count), "=d" (tmp), "=a" (ret), "+r" (__sp) \ > - : "a" (sem), "1" (RWSEM_ACTIVE_WRITE_BIAS) \ > - : "memory", "cc"); \ > - ret; \ > +#define ____down_write(sem, slow_path) \ > +({ \ > + long tmp; \ > + struct rw_semaphore* ret; \ > + \ > + ASM_CALL("# beginning down_write\n\t" \ > + LOCK_PREFIX " xadd %1,(%3)\n\t" \ > + /* adds 0xffff0001, returns the old value */ \ > + " test " __ASM_SEL(%w1,%k1) "," __ASM_SEL(%w1,%k1) "\n\t" \ > + /* was the active mask 0 before? */ \ > + " jz 1f\n" \ > + " call " slow_path "\n" \ > + "1:\n" \ > + "# ending down_write", \ > + ARGS("+m" (sem->count), "=d" (tmp), "=a" (ret)), \ > + ARGS("a" (sem), "1" (RWSEM_ACTIVE_WRITE_BIAS)), \ > + ARGS("memory", "cc")); \ > + ret; \ > }) > > static inline void __down_write(struct rw_semaphore *sem) > diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h > index 184eb9894dae..c7be076ee703 100644 > --- a/arch/x86/include/asm/uaccess.h > +++ b/arch/x86/include/asm/uaccess.h > @@ -166,12 +166,11 @@ __typeof__(__builtin_choose_expr(sizeof(x) > sizeof(0UL), 0ULL, 0UL)) > ({ \ > int __ret_gu; \ > register __inttype(*(ptr)) __val_gu asm("%"_ASM_DX); \ > - register void *__sp asm(_ASM_SP); \ > __chk_user_ptr(ptr); \ > might_fault(); \ > - asm volatile("call __get_user_%P4" \ > - : "=a" (__ret_gu), "=r" (__val_gu), "+r" (__sp) \ > - : "0" (ptr), "i" (sizeof(*(ptr)))); \ > + ASM_CALL("call __get_user_%P3", \ > + ARGS("=a" (__ret_gu), "=r" (__val_gu)), \ > + ARGS("0" (ptr), "i" (sizeof(*(ptr))))); \ > (x) = (__force __typeof__(*(ptr))) __val_gu; \ > __builtin_expect(__ret_gu, 0); \ > }) > diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h > index b16f6a1d8b26..8b63e2cb1eaf 100644 > --- a/arch/x86/include/asm/uaccess_64.h > +++ b/arch/x86/include/asm/uaccess_64.h > @@ -38,10 +38,9 @@ copy_user_generic(void *to, const void *from, unsigned len) > X86_FEATURE_REP_GOOD, > copy_user_enhanced_fast_string, > X86_FEATURE_ERMS, > - ASM_OUTPUT2("=a" (ret), "=D" (to), "=S" (from), > - "=d" (len)), > - "1" (to), "2" (from), "3" (len) > - : "memory", "rcx", "r8", "r9", "r10", "r11"); > + ARGS("=a" (ret), "=D" (to), "=S" (from), "=d" (len)), > + ARGS("1" (to), "2" (from), "3" (len)), > + ARGS("memory", "rcx", "r8", "r9", "r10", "r11")); > return ret; > } > > diff --git a/arch/x86/include/asm/xen/hypercall.h b/arch/x86/include/asm/xen/hypercall.h > index 11071fcd630e..2feddeeec1d1 100644 > --- a/arch/x86/include/asm/xen/hypercall.h > +++ b/arch/x86/include/asm/xen/hypercall.h > @@ -114,9 +114,8 @@ extern struct { char _entry[32]; } hypercall_page[]; > register unsigned long __arg3 asm(__HYPERCALL_ARG3REG) = __arg3; \ > register unsigned long __arg4 asm(__HYPERCALL_ARG4REG) = __arg4; \ > register unsigned long __arg5 asm(__HYPERCALL_ARG5REG) = __arg5; \ > - register void *__sp asm(_ASM_SP); > > -#define __HYPERCALL_0PARAM "=r" (__res), "+r" (__sp) > +#define __HYPERCALL_0PARAM "=r" (__res) > #define __HYPERCALL_1PARAM __HYPERCALL_0PARAM, "+r" (__arg1) > #define __HYPERCALL_2PARAM __HYPERCALL_1PARAM, "+r" (__arg2) > #define __HYPERCALL_3PARAM __HYPERCALL_2PARAM, "+r" (__arg3) > @@ -146,10 +145,10 @@ extern struct { char _entry[32]; } hypercall_page[]; > ({ \ > __HYPERCALL_DECLS; \ > __HYPERCALL_0ARG(); \ > - asm volatile (__HYPERCALL \ > - : __HYPERCALL_0PARAM \ > - : __HYPERCALL_ENTRY(name) \ > - : __HYPERCALL_CLOBBER0); \ > + ASM_CALL(__HYPERCALL, \ > + __HYPERCALL_0PARAM, \ > + __HYPERCALL_ENTRY(name), \ > + __HYPERCALL_CLOBBER0); \ > (type)__res; \ > }) > > @@ -157,10 +156,10 @@ extern struct { char _entry[32]; } hypercall_page[]; > ({ \ > __HYPERCALL_DECLS; \ > __HYPERCALL_1ARG(a1); \ > - asm volatile (__HYPERCALL \ > - : __HYPERCALL_1PARAM \ > - : __HYPERCALL_ENTRY(name) \ > - : __HYPERCALL_CLOBBER1); \ > + ASM_CALL(__HYPERCALL, \ > + __HYPERCALL_1PARAM, \ > + __HYPERCALL_ENTRY(name), \ > + __HYPERCALL_CLOBBER1); \ > (type)__res; \ > }) > > @@ -168,10 +167,10 @@ extern struct { char _entry[32]; } hypercall_page[]; > ({ \ > __HYPERCALL_DECLS; \ > __HYPERCALL_2ARG(a1, a2); \ > - asm volatile (__HYPERCALL \ > - : __HYPERCALL_2PARAM \ > - : __HYPERCALL_ENTRY(name) \ > - : __HYPERCALL_CLOBBER2); \ > + ASM_CALL(__HYPERCALL, \ > + __HYPERCALL_2PARAM, \ > + __HYPERCALL_ENTRY(name), \ > + __HYPERCALL_CLOBBER2); \ > (type)__res; \ > }) > > @@ -179,10 +178,10 @@ extern struct { char _entry[32]; } hypercall_page[]; > ({ \ > __HYPERCALL_DECLS; \ > __HYPERCALL_3ARG(a1, a2, a3); \ > - asm volatile (__HYPERCALL \ > - : __HYPERCALL_3PARAM \ > - : __HYPERCALL_ENTRY(name) \ > - : __HYPERCALL_CLOBBER3); \ > + ASM_CALL(__HYPERCALL, \ > + __HYPERCALL_3PARAM, \ > + __HYPERCALL_ENTRY(name), \ > + __HYPERCALL_CLOBBER3); \ > (type)__res; \ > }) > > @@ -190,10 +189,10 @@ extern struct { char _entry[32]; } hypercall_page[]; > ({ \ > __HYPERCALL_DECLS; \ > __HYPERCALL_4ARG(a1, a2, a3, a4); \ > - asm volatile (__HYPERCALL \ > - : __HYPERCALL_4PARAM \ > - : __HYPERCALL_ENTRY(name) \ > - : __HYPERCALL_CLOBBER4); \ > + ASM_CALL(__HYPERCALL, \ > + __HYPERCALL_4PARAM, \ > + __HYPERCALL_ENTRY(name), \ > + __HYPERCALL_CLOBBER4); \ > (type)__res; \ > }) > > @@ -201,10 +200,10 @@ extern struct { char _entry[32]; } hypercall_page[]; > ({ \ > __HYPERCALL_DECLS; \ > __HYPERCALL_5ARG(a1, a2, a3, a4, a5); \ > - asm volatile (__HYPERCALL \ > - : __HYPERCALL_5PARAM \ > - : __HYPERCALL_ENTRY(name) \ > - : __HYPERCALL_CLOBBER5); \ > + ASM_CALL(__HYPERCALL, \ > + __HYPERCALL_5PARAM, \ > + __HYPERCALL_ENTRY(name), \ > + __HYPERCALL_CLOBBER5); \ > (type)__res; \ > }) > > @@ -218,10 +217,10 @@ privcmd_call(unsigned call, > __HYPERCALL_5ARG(a1, a2, a3, a4, a5); > > stac(); > - asm volatile("call *%[call]" > - : __HYPERCALL_5PARAM > - : [call] "a" (&hypercall_page[call]) > - : __HYPERCALL_CLOBBER5); > + ASM_CALL("call *%[call]", > + __HYPERCALL_5PARAM, > + [call] "a" (&hypercall_page[call]), > + __HYPERCALL_CLOBBER5); > clac(); > > return (long)__res; > diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c > index fb0055953fbc..4d0d326029a7 100644 > --- a/arch/x86/kvm/emulate.c > +++ b/arch/x86/kvm/emulate.c > @@ -5284,16 +5284,15 @@ static void fetch_possible_mmx_operand(struct x86_emulate_ctxt *ctxt, > > static int fastop(struct x86_emulate_ctxt *ctxt, void (*fop)(struct fastop *)) > { > - register void *__sp asm(_ASM_SP); > ulong flags = (ctxt->eflags & EFLAGS_MASK) | X86_EFLAGS_IF; > > if (!(ctxt->d & ByteOp)) > fop += __ffs(ctxt->dst.bytes) * FASTOP_SIZE; > > - asm("push %[flags]; popf; call *%[fastop]; pushf; pop %[flags]\n" > - : "+a"(ctxt->dst.val), "+d"(ctxt->src.val), [flags]"+D"(flags), > - [fastop]"+S"(fop), "+r"(__sp) > - : "c"(ctxt->src2.val)); > + ASM_CALL("push %[flags]; popf; call *%[fastop]; pushf; pop %[flags]\n", > + ARGS("+a"(ctxt->dst.val), "+d"(ctxt->src.val), > + [flags]"+D"(flags), [fastop]"+S"(fop)), > + ARGS("c"(ctxt->src2.val))); > > ctxt->eflags = (ctxt->eflags & ~EFLAGS_MASK) | (flags & EFLAGS_MASK); > if (!fop) /* exception is returned in fop variable */ > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > index ffd469ecad57..d42806ad3c9c 100644 > --- a/arch/x86/kvm/vmx.c > +++ b/arch/x86/kvm/vmx.c > @@ -8671,7 +8671,6 @@ static void vmx_complete_atomic_exit(struct vcpu_vmx *vmx) > static void vmx_handle_external_intr(struct kvm_vcpu *vcpu) > { > u32 exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO); > - register void *__sp asm(_ASM_SP); > > if ((exit_intr_info & (INTR_INFO_VALID_MASK | INTR_INFO_INTR_TYPE_MASK)) > == (INTR_INFO_VALID_MASK | INTR_TYPE_EXT_INTR)) { > @@ -8686,7 +8685,7 @@ static void vmx_handle_external_intr(struct kvm_vcpu *vcpu) > vector = exit_intr_info & INTR_INFO_VECTOR_MASK; > desc = (gate_desc *)vmx->host_idt_base + vector; > entry = gate_offset(*desc); > - asm volatile( > + ASM_CALL( > #ifdef CONFIG_X86_64 > "mov %%" _ASM_SP ", %[sp]\n\t" > "and $0xfffffffffffffff0, %%" _ASM_SP "\n\t" > @@ -8695,16 +8694,14 @@ static void vmx_handle_external_intr(struct kvm_vcpu *vcpu) > #endif > "pushf\n\t" > __ASM_SIZE(push) " $%c[cs]\n\t" > - "call *%[entry]\n\t" > - : > + "call *%[entry]\n\t", > #ifdef CONFIG_X86_64 > - [sp]"=&r"(tmp), > + [sp]"=&r"(tmp) > #endif > - "+r"(__sp) > - : > - [entry]"r"(entry), > - [ss]"i"(__KERNEL_DS), > - [cs]"i"(__KERNEL_CS) > + , > + ARGS([entry]"r"(entry), > + [ss]"i"(__KERNEL_DS), > + [cs]"i"(__KERNEL_CS)) > ); > } > } > diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c > index 2a1fa10c6a98..ca642ac3a390 100644 > --- a/arch/x86/mm/fault.c > +++ b/arch/x86/mm/fault.c > @@ -802,7 +802,6 @@ no_context(struct pt_regs *regs, unsigned long error_code, > if (is_vmalloc_addr((void *)address) && > (((unsigned long)tsk->stack - 1 - address < PAGE_SIZE) || > address - ((unsigned long)tsk->stack + THREAD_SIZE) < PAGE_SIZE)) { > - register void *__sp asm("rsp"); > unsigned long stack = this_cpu_read(orig_ist.ist[DOUBLEFAULT_STACK]) - sizeof(void *); > /* > * We're likely to be running with very little stack space > @@ -814,13 +813,12 @@ no_context(struct pt_regs *regs, unsigned long error_code, > * and then double-fault, though, because we're likely to > * break the console driver and lose most of the stack dump. > */ > - asm volatile ("movq %[stack], %%rsp\n\t" > - "call handle_stack_overflow\n\t" > - "1: jmp 1b" > - : "+r" (__sp) > - : "D" ("kernel stack overflow (page fault)"), > - "S" (regs), "d" (address), > - [stack] "rm" (stack)); > + ASM_CALL("movq %[stack], %%rsp\n\t" > + "call handle_stack_overflow\n\t" > + "1: jmp 1b", > + , > + ARGS("D" ("kernel stack overflow (page fault)"), > + "S" (regs), "d" (address), [stack] "rm" (stack))); > unreachable(); > } > #endif > diff --git a/include/linux/compiler.h b/include/linux/compiler.h > index 3f8c88e29a46..3a57f32a0bfd 100644 > --- a/include/linux/compiler.h > +++ b/include/linux/compiler.h > @@ -589,4 +589,18 @@ static __always_inline void __write_once_size(volatile void *p, void *res, int s > (_________p1); \ > }) > > +#define ARGS(args...) args > + > +#ifdef CONFIG_FRAME_POINTER > + > +#define ASM_CALL(str, outputs, inputs, clobbers...) \ > + asm volatile(str : outputs : inputs : "sp", ## clobbers) > + > +#else > + > +#define ASM_CALL(str, outputs, inputs, clobbers...) \ > + asm volatile(str : outputs : inputs : clobbers); > + > +#endif > + > #endif /* __LINUX_COMPILER_H */ > diff --git a/tools/objtool/Documentation/stack-validation.txt b/tools/objtool/Documentation/stack-validation.txt > index 6a1af43862df..766a00ebf80c 100644 > --- a/tools/objtool/Documentation/stack-validation.txt > +++ b/tools/objtool/Documentation/stack-validation.txt > @@ -193,13 +193,8 @@ they mean, and suggestions for how to fix them. > > If it's a GCC-compiled .c file, the error may be because the function > uses an inline asm() statement which has a "call" instruction. An > - asm() statement with a call instruction must declare the use of the > - stack pointer in its output operand. For example, on x86_64: > - > - register void *__sp asm("rsp"); > - asm volatile("call func" : "+r" (__sp)); > - > - Otherwise the stack frame may not get created before the call. > + asm() statement with a call instruction must use the ASM_CALL macro, > + which forces the frame pointer to be saved before the call. > > > 2. file.o: warning: objtool: .text+0x53: unreachable instruction > @@ -221,7 +216,7 @@ they mean, and suggestions for how to fix them. > change ENDPROC to END. > > > -4. file.o: warning: objtool: func(): can't find starting instruction > +3. file.o: warning: objtool: func(): can't find starting instruction > or > file.o: warning: objtool: func()+0x11dd: can't decode instruction > > @@ -230,7 +225,7 @@ they mean, and suggestions for how to fix them. > section like .data or .rodata. > > > -5. file.o: warning: objtool: func()+0x6: unsupported instruction in callable function > +4. file.o: warning: objtool: func()+0x6: unsupported instruction in callable function > > This is a kernel entry/exit instruction like sysenter or iret. Such > instructions aren't allowed in a callable function, and are most > @@ -239,7 +234,7 @@ they mean, and suggestions for how to fix them. > annotated with the unwind hint macros in asm/unwind_hints.h. > > > -6. file.o: warning: objtool: func()+0x26: sibling call from callable instruction with modified stack frame > +5. file.o: warning: objtool: func()+0x26: sibling call from callable instruction with modified stack frame > > This is a dynamic jump or a jump to an undefined symbol. Objtool > assumed it's a sibling call and detected that the frame pointer > @@ -253,7 +248,7 @@ they mean, and suggestions for how to fix them. > the unwind hint macros in asm/unwind_hints.h. > > > -7. file: warning: objtool: func()+0x5c: stack state mismatch > +6. file: warning: objtool: func()+0x5c: stack state mismatch > > The instruction's frame pointer state is inconsistent, depending on > which execution path was taken to reach the instruction. > @@ -270,7 +265,7 @@ they mean, and suggestions for how to fix them. > asm/unwind_hints.h. > > > -8. file.o: warning: objtool: funcA() falls through to next function funcB() > +7. file.o: warning: objtool: funcA() falls through to next function funcB() > > This means that funcA() doesn't end with a return instruction or an > unconditional jump, and that objtool has determined that the function