All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] x86: Fix inline asm call constraints for clang
@ 2017-09-19 18:45 Josh Poimboeuf
  2017-09-19 18:45 ` [PATCH 1/2] objtool: Handle another GCC stack pointer adjustment bug Josh Poimboeuf
  2017-09-19 18:45 ` [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang Josh Poimboeuf
  0 siblings, 2 replies; 20+ messages in thread
From: Josh Poimboeuf @ 2017-09-19 18:45 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Andy Lutomirski, Linus Torvalds, Alexander Potapenko,
	Dmitriy Vyukov, Matthias Kaehlcke, Arnd Bergmann, Peter Zijlstra,
	Andrey Ryabinin

Patch 1 is a bug fix for an objtool issue which was uncovered by patch 2.

Patch 2 is the last fix needed for clang to be able to compile and boot
the kernel.


Josh Poimboeuf (2):
  objtool: Handle another GCC stack pointer adjustment bug
  x86/asm: Fix inline asm call constraints for clang

 arch/x86/include/asm/alternative.h               |  3 +-
 arch/x86/include/asm/asm.h                       |  9 +++++
 arch/x86/include/asm/mshyperv.h                  | 27 ++++++---------
 arch/x86/include/asm/paravirt_types.h            | 14 ++++----
 arch/x86/include/asm/preempt.h                   | 15 +++------
 arch/x86/include/asm/processor.h                 |  6 ++--
 arch/x86/include/asm/rwsem.h                     |  6 ++--
 arch/x86/include/asm/uaccess.h                   |  5 ++-
 arch/x86/include/asm/xen/hypercall.h             |  5 ++-
 arch/x86/kvm/emulate.c                           |  3 +-
 arch/x86/kvm/vmx.c                               |  4 +--
 arch/x86/mm/fault.c                              |  3 +-
 tools/objtool/Documentation/stack-validation.txt | 20 ++++++++---
 tools/objtool/arch/x86/decode.c                  |  6 ++--
 tools/objtool/check.c                            | 43 ++++++++++++++++--------
 15 files changed, 92 insertions(+), 77 deletions(-)

-- 
2.13.5

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/2] objtool: Handle another GCC stack pointer adjustment bug
  2017-09-19 18:45 [PATCH 0/2] x86: Fix inline asm call constraints for clang Josh Poimboeuf
@ 2017-09-19 18:45 ` Josh Poimboeuf
  2017-09-19 18:45 ` [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang Josh Poimboeuf
  1 sibling, 0 replies; 20+ messages in thread
From: Josh Poimboeuf @ 2017-09-19 18:45 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Andy Lutomirski, Linus Torvalds, Alexander Potapenko,
	Dmitriy Vyukov, Matthias Kaehlcke, Arnd Bergmann, Peter Zijlstra,
	Andrey Ryabinin

The kbuild bot reported the following warning with GCC 4.4 and a
randconfig:

  net/socket.o: warning: objtool: compat_sock_ioctl()+0x1083: stack state mismatch: cfa1=7+160 cfa2=-1+0

This is caused by another GCC non-optimization, where it backs up and
restores the stack pointer for no apparent reason:

    2f91:       48 89 e0                mov    %rsp,%rax
    2f94:       4c 89 e7                mov    %r12,%rdi
    2f97:       4c 89 f6                mov    %r14,%rsi
    2f9a:       ba 20 00 00 00          mov    $0x20,%edx
    2f9f:       48 89 c4                mov    %rax,%rsp

This issue would have been happily ignored before the following commit:

  dd88a0a0c861 ("objtool: Handle GCC stack pointer adjustment bug")

But now that objtool is paying attention to such stack pointer writes
to/from a register, it needs to understand them properly.  In this case
that means recognizing that the "mov %rsp, %rax" instruction is
potentially a backup of the stack pointer.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Fixes: dd88a0a0c861 ("objtool: Handle GCC stack pointer adjustment bug")
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 tools/objtool/arch/x86/decode.c |  6 +++---
 tools/objtool/check.c           | 43 +++++++++++++++++++++++++++--------------
 2 files changed, 32 insertions(+), 17 deletions(-)

diff --git a/tools/objtool/arch/x86/decode.c b/tools/objtool/arch/x86/decode.c
index 0e8c8ec4fd4e..0f22768c0d4d 100644
--- a/tools/objtool/arch/x86/decode.c
+++ b/tools/objtool/arch/x86/decode.c
@@ -208,14 +208,14 @@ int arch_decode_instruction(struct elf *elf, struct section *sec,
 		break;
 
 	case 0x89:
-		if (rex == 0x48 && modrm == 0xe5) {
+		if (rex_w && !rex_r && modrm_mod == 3 && modrm_reg == 4) {
 
-			/* mov %rsp, %rbp */
+			/* mov %rsp, reg */
 			*type = INSN_STACK;
 			op->src.type = OP_SRC_REG;
 			op->src.reg = CFI_SP;
 			op->dest.type = OP_DEST_REG;
-			op->dest.reg = CFI_BP;
+			op->dest.reg = op_to_cfi_reg[modrm_rm][rex_b];
 			break;
 		}
 
diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index f744617c9946..a0c518ecf085 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -1203,24 +1203,39 @@ static int update_insn_state(struct instruction *insn, struct insn_state *state)
 		switch (op->src.type) {
 
 		case OP_SRC_REG:
-			if (op->src.reg == CFI_SP && op->dest.reg == CFI_BP) {
+			if (op->src.reg == CFI_SP && op->dest.reg == CFI_BP &&
+			    cfa->base == CFI_SP &&
+			    regs[CFI_BP].base == CFI_CFA &&
+			    regs[CFI_BP].offset == -cfa->offset) {
+
+				/* mov %rsp, %rbp */
+				cfa->base = op->dest.reg;
+				state->bp_scratch = false;
+			}
 
-				if (cfa->base == CFI_SP &&
-				    regs[CFI_BP].base == CFI_CFA &&
-				    regs[CFI_BP].offset == -cfa->offset) {
+			else if (op->src.reg == CFI_SP &&
+				 op->dest.reg == CFI_BP && state->drap) {
 
-					/* mov %rsp, %rbp */
-					cfa->base = op->dest.reg;
-					state->bp_scratch = false;
-				}
+				/* drap: mov %rsp, %rbp */
+				regs[CFI_BP].base = CFI_BP;
+				regs[CFI_BP].offset = -state->stack_size;
+				state->bp_scratch = false;
+			}
 
-				else if (state->drap) {
+			else if (op->src.reg == CFI_SP && cfa->base == CFI_SP) {
 
-					/* drap: mov %rsp, %rbp */
-					regs[CFI_BP].base = CFI_BP;
-					regs[CFI_BP].offset = -state->stack_size;
-					state->bp_scratch = false;
-				}
+				/*
+				 * mov %rsp, %reg
+				 *
+				 * This is needed for the rare case where GCC
+				 * does:
+				 *
+				 *   mov    %rsp, %rax
+				 *   ...
+				 *   mov    %rax, %rsp
+				 */
+				state->vals[op->dest.reg].base = CFI_CFA;
+				state->vals[op->dest.reg].offset = -state->stack_size;
 			}
 
 			else if (op->dest.reg == cfa->base) {
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang
  2017-09-19 18:45 [PATCH 0/2] x86: Fix inline asm call constraints for clang Josh Poimboeuf
  2017-09-19 18:45 ` [PATCH 1/2] objtool: Handle another GCC stack pointer adjustment bug Josh Poimboeuf
@ 2017-09-19 18:45 ` Josh Poimboeuf
  2017-09-19 21:55   ` Alexander Potapenko
                     ` (2 more replies)
  1 sibling, 3 replies; 20+ messages in thread
From: Josh Poimboeuf @ 2017-09-19 18:45 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Andy Lutomirski, Linus Torvalds, Alexander Potapenko,
	Dmitriy Vyukov, Matthias Kaehlcke, Arnd Bergmann, Peter Zijlstra,
	Andrey Ryabinin

For inline asm statements which have a CALL instruction, we list the
stack pointer as a constraint to convince GCC to ensure the frame
pointer is set up first:

  static inline void foo()
  {
  	register void *__sp asm(_ASM_SP);
  	asm("call bar" : "+r" (__sp))
  }

Unfortunately, that pattern causes clang to corrupt the stack pointer.

There's actually an easier way to achieve the same goal in GCC, without
causing trouble for clang.  If we declare the stack pointer register
variable as a global variable, and remove the constraint altogether,
that convinces GCC to always set up the frame pointer before inserting
*any* inline asm.

It basically acts as if *every* inline asm statement has a CALL
instruction.  It's a bit overkill, but the performance impact should be
negligible.

Here are the vmlinux .text size differences with the following configs
on GCC:

- defconfig
- defconfig without frame pointers
- Fedora distro config
- Fedora distro config without frame pointers

	defconfig	defconfig-nofp	distro		distro-nofp
before	9796300		9466764		9076191		8789745
after	9796941		9462859		9076381		8785325

With frame pointers, the text size increases slightly.  Without frame
pointers, the text size decreases, and a little more significantly.

Reported-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/include/asm/alternative.h               |  3 +--
 arch/x86/include/asm/asm.h                       |  9 ++++++++
 arch/x86/include/asm/mshyperv.h                  | 27 ++++++++++--------------
 arch/x86/include/asm/paravirt_types.h            | 14 ++++++------
 arch/x86/include/asm/preempt.h                   | 15 +++++--------
 arch/x86/include/asm/processor.h                 |  6 ++----
 arch/x86/include/asm/rwsem.h                     |  6 +++---
 arch/x86/include/asm/uaccess.h                   |  5 ++---
 arch/x86/include/asm/xen/hypercall.h             |  5 ++---
 arch/x86/kvm/emulate.c                           |  3 +--
 arch/x86/kvm/vmx.c                               |  4 +---
 arch/x86/mm/fault.c                              |  3 +--
 tools/objtool/Documentation/stack-validation.txt | 20 +++++++++++++-----
 13 files changed, 60 insertions(+), 60 deletions(-)

diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
index 1b020381ab38..7aeb1cef4204 100644
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -218,10 +218,9 @@ static inline int alternatives_text_reserved(void *start, void *end)
 #define alternative_call_2(oldfunc, newfunc1, feature1, newfunc2, feature2,   \
 			   output, input...)				      \
 {									      \
-	register void *__sp asm(_ASM_SP);				      \
 	asm volatile (ALTERNATIVE_2("call %P[old]", "call %P[new1]", feature1,\
 		"call %P[new2]", feature2)				      \
-		: output, "+r" (__sp)					      \
+		: output						      \
 		: [old] "i" (oldfunc), [new1] "i" (newfunc1),		      \
 		  [new2] "i" (newfunc2), ## input);			      \
 }
diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h
index 676ee5807d86..ff8921d4615b 100644
--- a/arch/x86/include/asm/asm.h
+++ b/arch/x86/include/asm/asm.h
@@ -132,4 +132,13 @@
 /* For C file, we already have NOKPROBE_SYMBOL macro */
 #endif
 
+#ifndef __ASSEMBLY__
+/*
+ * This variable declaration has the side effect of forcing GCC to always set
+ * up the frame pointer before inserting any inline asm.  This is necessary
+ * because some inline asm statements have CALL instructions.
+ */
+register unsigned int __sp_unused asm("esp");
+#endif
+
 #endif /* _ASM_X86_ASM_H */
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 63cc96f064dc..1c913ae27f99 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -179,16 +179,15 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
 	u64 input_address = input ? virt_to_phys(input) : 0;
 	u64 output_address = output ? virt_to_phys(output) : 0;
 	u64 hv_status;
-	register void *__sp asm(_ASM_SP);
 
 #ifdef CONFIG_X86_64
 	if (!hv_hypercall_pg)
 		return U64_MAX;
 
-	__asm__ __volatile__("mov %4, %%r8\n"
-			     "call *%5"
-			     : "=a" (hv_status), "+r" (__sp),
-			       "+c" (control), "+d" (input_address)
+	__asm__ __volatile__("mov %3, %%r8\n"
+			     "call *%4"
+			     : "=a" (hv_status), "+c" (control),
+			       "+d" (input_address)
 			     :  "r" (output_address), "m" (hv_hypercall_pg)
 			     : "cc", "memory", "r8", "r9", "r10", "r11");
 #else
@@ -200,9 +199,8 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
 	if (!hv_hypercall_pg)
 		return U64_MAX;
 
-	__asm__ __volatile__("call *%7"
-			     : "=A" (hv_status),
-			       "+c" (input_address_lo), "+r" (__sp)
+	__asm__ __volatile__("call *%6"
+			     : "=A" (hv_status), "+c" (input_address_lo)
 			     : "A" (control),
 			       "b" (input_address_hi),
 			       "D"(output_address_hi), "S"(output_address_lo),
@@ -224,13 +222,12 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
 static inline u64 hv_do_fast_hypercall8(u16 code, u64 input1)
 {
 	u64 hv_status, control = (u64)code | HV_HYPERCALL_FAST_BIT;
-	register void *__sp asm(_ASM_SP);
 
 #ifdef CONFIG_X86_64
 	{
-		__asm__ __volatile__("call *%4"
-				     : "=a" (hv_status), "+r" (__sp),
-				       "+c" (control), "+d" (input1)
+		__asm__ __volatile__("call *%3"
+				     : "=a" (hv_status), "+c" (control),
+				       "+d" (input1)
 				     : "m" (hv_hypercall_pg)
 				     : "cc", "r8", "r9", "r10", "r11");
 	}
@@ -239,10 +236,8 @@ static inline u64 hv_do_fast_hypercall8(u16 code, u64 input1)
 		u32 input1_hi = upper_32_bits(input1);
 		u32 input1_lo = lower_32_bits(input1);
 
-		__asm__ __volatile__ ("call *%5"
-				      : "=A"(hv_status),
-					"+c"(input1_lo),
-					"+r"(__sp)
+		__asm__ __volatile__ ("call *%4"
+				      : "=A"(hv_status), "+c"(input1_lo)
 				      :	"A" (control),
 					"b" (input1_hi),
 					"m" (hv_hypercall_pg)
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 42873edd9f9d..b8966ed71ba5 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -459,8 +459,8 @@ int paravirt_disable_iospace(void);
  */
 #ifdef CONFIG_X86_32
 #define PVOP_VCALL_ARGS							\
-	unsigned long __eax = __eax, __edx = __edx, __ecx = __ecx;	\
-	register void *__sp asm("esp")
+	unsigned long __eax = __eax, __edx = __edx, __ecx = __ecx;
+
 #define PVOP_CALL_ARGS			PVOP_VCALL_ARGS
 
 #define PVOP_CALL_ARG1(x)		"a" ((unsigned long)(x))
@@ -480,8 +480,8 @@ int paravirt_disable_iospace(void);
 /* [re]ax isn't an arg, but the return val */
 #define PVOP_VCALL_ARGS						\
 	unsigned long __edi = __edi, __esi = __esi,		\
-		__edx = __edx, __ecx = __ecx, __eax = __eax;	\
-	register void *__sp asm("rsp")
+		__edx = __edx, __ecx = __ecx, __eax = __eax;
+
 #define PVOP_CALL_ARGS		PVOP_VCALL_ARGS
 
 #define PVOP_CALL_ARG1(x)		"D" ((unsigned long)(x))
@@ -532,7 +532,7 @@ int paravirt_disable_iospace(void);
 			asm volatile(pre				\
 				     paravirt_alt(PARAVIRT_CALL)	\
 				     post				\
-				     : call_clbr, "+r" (__sp)		\
+				     : call_clbr			\
 				     : paravirt_type(op),		\
 				       paravirt_clobber(clbr),		\
 				       ##__VA_ARGS__			\
@@ -542,7 +542,7 @@ int paravirt_disable_iospace(void);
 			asm volatile(pre				\
 				     paravirt_alt(PARAVIRT_CALL)	\
 				     post				\
-				     : call_clbr, "+r" (__sp)		\
+				     : call_clbr			\
 				     : paravirt_type(op),		\
 				       paravirt_clobber(clbr),		\
 				       ##__VA_ARGS__			\
@@ -569,7 +569,7 @@ int paravirt_disable_iospace(void);
 		asm volatile(pre					\
 			     paravirt_alt(PARAVIRT_CALL)		\
 			     post					\
-			     : call_clbr, "+r" (__sp)			\
+			     : call_clbr				\
 			     : paravirt_type(op),			\
 			       paravirt_clobber(clbr),			\
 			       ##__VA_ARGS__				\
diff --git a/arch/x86/include/asm/preempt.h b/arch/x86/include/asm/preempt.h
index ec1f3c651150..f66b189a1142 100644
--- a/arch/x86/include/asm/preempt.h
+++ b/arch/x86/include/asm/preempt.h
@@ -100,19 +100,14 @@ static __always_inline bool should_resched(int preempt_offset)
 
 #ifdef CONFIG_PREEMPT
   extern asmlinkage void ___preempt_schedule(void);
-# define __preempt_schedule()					\
-({								\
-	register void *__sp asm(_ASM_SP);			\
-	asm volatile ("call ___preempt_schedule" : "+r"(__sp));	\
-})
+# define __preempt_schedule() \
+	asm volatile ("call ___preempt_schedule")
 
   extern asmlinkage void preempt_schedule(void);
   extern asmlinkage void ___preempt_schedule_notrace(void);
-# define __preempt_schedule_notrace()					\
-({									\
-	register void *__sp asm(_ASM_SP);				\
-	asm volatile ("call ___preempt_schedule_notrace" : "+r"(__sp));	\
-})
+# define __preempt_schedule_notrace() \
+	asm volatile ("call ___preempt_schedule_notrace")
+
   extern asmlinkage void preempt_schedule_notrace(void);
 #endif
 
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 3fa26a61eabc..7a7969211527 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -677,8 +677,6 @@ static inline void sync_core(void)
 	 * Like all of Linux's memory ordering operations, this is a
 	 * compiler barrier as well.
 	 */
-	register void *__sp asm(_ASM_SP);
-
 #ifdef CONFIG_X86_32
 	asm volatile (
 		"pushfl\n\t"
@@ -686,7 +684,7 @@ static inline void sync_core(void)
 		"pushl $1f\n\t"
 		"iret\n\t"
 		"1:"
-		: "+r" (__sp) : : "memory");
+		: : : "memory");
 #else
 	unsigned int tmp;
 
@@ -703,7 +701,7 @@ static inline void sync_core(void)
 		"iretq\n\t"
 		UNWIND_HINT_RESTORE
 		"1:"
-		: "=&r" (tmp), "+r" (__sp) : : "cc", "memory");
+		: "=&r" (tmp) : : "cc", "memory");
 #endif
 }
 
diff --git a/arch/x86/include/asm/rwsem.h b/arch/x86/include/asm/rwsem.h
index a34e0d4b957d..f8f127a2a300 100644
--- a/arch/x86/include/asm/rwsem.h
+++ b/arch/x86/include/asm/rwsem.h
@@ -103,10 +103,9 @@ static inline bool __down_read_trylock(struct rw_semaphore *sem)
 ({							\
 	long tmp;					\
 	struct rw_semaphore* ret;			\
-	register void *__sp asm(_ASM_SP);		\
 							\
 	asm volatile("# beginning down_write\n\t"	\
-		     LOCK_PREFIX "  xadd      %1,(%4)\n\t"	\
+		     LOCK_PREFIX "  xadd      %1,(%3)\n\t"	\
 		     /* adds 0xffff0001, returns the old value */ \
 		     "  test " __ASM_SEL(%w1,%k1) "," __ASM_SEL(%w1,%k1) "\n\t" \
 		     /* was the active mask 0 before? */\
@@ -114,7 +113,8 @@ static inline bool __down_read_trylock(struct rw_semaphore *sem)
 		     "  call " slow_path "\n"		\
 		     "1:\n"				\
 		     "# ending down_write"		\
-		     : "+m" (sem->count), "=d" (tmp), "=a" (ret), "+r" (__sp) \
+		     : "+m" (sem->count), "=d" (tmp),	\
+		       "=a" (ret)			\
 		     : "a" (sem), "1" (RWSEM_ACTIVE_WRITE_BIAS) \
 		     : "memory", "cc");			\
 	ret;						\
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 184eb9894dae..113ddb8ce0d8 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -166,11 +166,10 @@ __typeof__(__builtin_choose_expr(sizeof(x) > sizeof(0UL), 0ULL, 0UL))
 ({									\
 	int __ret_gu;							\
 	register __inttype(*(ptr)) __val_gu asm("%"_ASM_DX);		\
-	register void *__sp asm(_ASM_SP);				\
 	__chk_user_ptr(ptr);						\
 	might_fault();							\
-	asm volatile("call __get_user_%P4"				\
-		     : "=a" (__ret_gu), "=r" (__val_gu), "+r" (__sp)	\
+	asm volatile("call __get_user_%P3"				\
+		     : "=a" (__ret_gu), "=r" (__val_gu)			\
 		     : "0" (ptr), "i" (sizeof(*(ptr))));		\
 	(x) = (__force __typeof__(*(ptr))) __val_gu;			\
 	__builtin_expect(__ret_gu, 0);					\
diff --git a/arch/x86/include/asm/xen/hypercall.h b/arch/x86/include/asm/xen/hypercall.h
index 9606688caa4b..6264dba01896 100644
--- a/arch/x86/include/asm/xen/hypercall.h
+++ b/arch/x86/include/asm/xen/hypercall.h
@@ -113,10 +113,9 @@ extern struct { char _entry[32]; } hypercall_page[];
 	register unsigned long __arg2 asm(__HYPERCALL_ARG2REG) = __arg2; \
 	register unsigned long __arg3 asm(__HYPERCALL_ARG3REG) = __arg3; \
 	register unsigned long __arg4 asm(__HYPERCALL_ARG4REG) = __arg4; \
-	register unsigned long __arg5 asm(__HYPERCALL_ARG5REG) = __arg5; \
-	register void *__sp asm(_ASM_SP);
+	register unsigned long __arg5 asm(__HYPERCALL_ARG5REG) = __arg5;
 
-#define __HYPERCALL_0PARAM	"=r" (__res), "+r" (__sp)
+#define __HYPERCALL_0PARAM	"=r" (__res)
 #define __HYPERCALL_1PARAM	__HYPERCALL_0PARAM, "+r" (__arg1)
 #define __HYPERCALL_2PARAM	__HYPERCALL_1PARAM, "+r" (__arg2)
 #define __HYPERCALL_3PARAM	__HYPERCALL_2PARAM, "+r" (__arg3)
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 16bf6655aa85..5e82e0821c1f 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -5296,7 +5296,6 @@ static void fetch_possible_mmx_operand(struct x86_emulate_ctxt *ctxt,
 
 static int fastop(struct x86_emulate_ctxt *ctxt, void (*fop)(struct fastop *))
 {
-	register void *__sp asm(_ASM_SP);
 	ulong flags = (ctxt->eflags & EFLAGS_MASK) | X86_EFLAGS_IF;
 
 	if (!(ctxt->d & ByteOp))
@@ -5304,7 +5303,7 @@ static int fastop(struct x86_emulate_ctxt *ctxt, void (*fop)(struct fastop *))
 
 	asm("push %[flags]; popf; call *%[fastop]; pushf; pop %[flags]\n"
 	    : "+a"(ctxt->dst.val), "+d"(ctxt->src.val), [flags]"+D"(flags),
-	      [fastop]"+S"(fop), "+r"(__sp)
+	      [fastop]"+S"(fop)
 	    : "c"(ctxt->src2.val));
 
 	ctxt->eflags = (ctxt->eflags & ~EFLAGS_MASK) | (flags & EFLAGS_MASK);
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 06c0c6d0541e..a693362f41af 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -9036,7 +9036,6 @@ static void vmx_complete_atomic_exit(struct vcpu_vmx *vmx)
 static void vmx_handle_external_intr(struct kvm_vcpu *vcpu)
 {
 	u32 exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
-	register void *__sp asm(_ASM_SP);
 
 	if ((exit_intr_info & (INTR_INFO_VALID_MASK | INTR_INFO_INTR_TYPE_MASK))
 			== (INTR_INFO_VALID_MASK | INTR_TYPE_EXT_INTR)) {
@@ -9063,9 +9062,8 @@ static void vmx_handle_external_intr(struct kvm_vcpu *vcpu)
 			"call *%[entry]\n\t"
 			:
 #ifdef CONFIG_X86_64
-			[sp]"=&r"(tmp),
+			[sp]"=&r"(tmp)
 #endif
-			"+r"(__sp)
 			:
 			[entry]"r"(entry),
 			[ss]"i"(__KERNEL_DS),
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index b836a7274e12..4457e41378e4 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -806,7 +806,6 @@ no_context(struct pt_regs *regs, unsigned long error_code,
 	if (is_vmalloc_addr((void *)address) &&
 	    (((unsigned long)tsk->stack - 1 - address < PAGE_SIZE) ||
 	     address - ((unsigned long)tsk->stack + THREAD_SIZE) < PAGE_SIZE)) {
-		register void *__sp asm("rsp");
 		unsigned long stack = this_cpu_read(orig_ist.ist[DOUBLEFAULT_STACK]) - sizeof(void *);
 		/*
 		 * We're likely to be running with very little stack space
@@ -821,7 +820,7 @@ no_context(struct pt_regs *regs, unsigned long error_code,
 		asm volatile ("movq %[stack], %%rsp\n\t"
 			      "call handle_stack_overflow\n\t"
 			      "1: jmp 1b"
-			      : "+r" (__sp)
+			      :
 			      : "D" ("kernel stack overflow (page fault)"),
 				"S" (regs), "d" (address),
 				[stack] "rm" (stack));
diff --git a/tools/objtool/Documentation/stack-validation.txt b/tools/objtool/Documentation/stack-validation.txt
index 6a1af43862df..18af4aa7488d 100644
--- a/tools/objtool/Documentation/stack-validation.txt
+++ b/tools/objtool/Documentation/stack-validation.txt
@@ -192,12 +192,22 @@ they mean, and suggestions for how to fix them.
    use the manual unwind hint macros in asm/unwind_hints.h.
 
    If it's a GCC-compiled .c file, the error may be because the function
-   uses an inline asm() statement which has a "call" instruction.  An
-   asm() statement with a call instruction must declare the use of the
-   stack pointer in its output operand.  For example, on x86_64:
+   uses an inline asm() statement which has a "call" instruction.  To
+   fix this, either:
 
-     register void *__sp asm("rsp");
-     asm volatile("call func" : "+r" (__sp));
+     a) list the stack pointer as an input/output constraint;
+
+        register void *__sp asm("rsp");
+        asm volatile("call func" : "+r" (__sp));
+
+     or
+
+     b) declare a global register variable for the stack pointer.
+
+	arch/x86/include/asm/asm.h:register unsigned int __sp_unused asm("esp");
+
+	This is the strategy used on x86_64.  It ensures that GCC always
+	sets up the frame pointer before inserting *any* inline asm.
 
    Otherwise the stack frame may not get created before the call.
 
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang
  2017-09-19 18:45 ` [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang Josh Poimboeuf
@ 2017-09-19 21:55   ` Alexander Potapenko
  2017-09-19 22:25     ` Alexander Potapenko
  2017-09-20  1:18   ` Josh Poimboeuf
  2017-09-20 17:32   ` H. Peter Anvin
  2 siblings, 1 reply; 20+ messages in thread
From: Alexander Potapenko @ 2017-09-19 21:55 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, LKML, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Andy Lutomirski, Linus Torvalds, Dmitriy Vyukov,
	Matthias Kaehlcke, Arnd Bergmann, Peter Zijlstra,
	Andrey Ryabinin

On Tue, Sep 19, 2017 at 11:45 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> For inline asm statements which have a CALL instruction, we list the
> stack pointer as a constraint to convince GCC to ensure the frame
> pointer is set up first:
>
>   static inline void foo()
>   {
>         register void *__sp asm(_ASM_SP);
>         asm("call bar" : "+r" (__sp))
>   }
>
> Unfortunately, that pattern causes clang to corrupt the stack pointer.
>
> There's actually an easier way to achieve the same goal in GCC, without
> causing trouble for clang.  If we declare the stack pointer register
> variable as a global variable, and remove the constraint altogether,
> that convinces GCC to always set up the frame pointer before inserting
> *any* inline asm.
>
> It basically acts as if *every* inline asm statement has a CALL
> instruction.  It's a bit overkill, but the performance impact should be
> negligible.
>
> Here are the vmlinux .text size differences with the following configs
> on GCC:
>
> - defconfig
> - defconfig without frame pointers
> - Fedora distro config
> - Fedora distro config without frame pointers
>
>         defconfig       defconfig-nofp  distro          distro-nofp
> before  9796300         9466764         9076191         8789745
> after   9796941         9462859         9076381         8785325
>
> With frame pointers, the text size increases slightly.  Without frame
> pointers, the text size decreases, and a little more significantly.
>
> Reported-by: Matthias Kaehlcke <mka@chromium.org>
> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
> ---
>  arch/x86/include/asm/alternative.h               |  3 +--
>  arch/x86/include/asm/asm.h                       |  9 ++++++++
>  arch/x86/include/asm/mshyperv.h                  | 27 ++++++++++--------------
>  arch/x86/include/asm/paravirt_types.h            | 14 ++++++------
>  arch/x86/include/asm/preempt.h                   | 15 +++++--------
>  arch/x86/include/asm/processor.h                 |  6 ++----
>  arch/x86/include/asm/rwsem.h                     |  6 +++---
>  arch/x86/include/asm/uaccess.h                   |  5 ++---
>  arch/x86/include/asm/xen/hypercall.h             |  5 ++---
>  arch/x86/kvm/emulate.c                           |  3 +--
>  arch/x86/kvm/vmx.c                               |  4 +---
>  arch/x86/mm/fault.c                              |  3 +--
>  tools/objtool/Documentation/stack-validation.txt | 20 +++++++++++++-----
>  13 files changed, 60 insertions(+), 60 deletions(-)
>
> diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
> index 1b020381ab38..7aeb1cef4204 100644
> --- a/arch/x86/include/asm/alternative.h
> +++ b/arch/x86/include/asm/alternative.h
> @@ -218,10 +218,9 @@ static inline int alternatives_text_reserved(void *start, void *end)
>  #define alternative_call_2(oldfunc, newfunc1, feature1, newfunc2, feature2,   \
>                            output, input...)                                  \
>  {                                                                            \
> -       register void *__sp asm(_ASM_SP);                                     \
>         asm volatile (ALTERNATIVE_2("call %P[old]", "call %P[new1]", feature1,\
>                 "call %P[new2]", feature2)                                    \
> -               : output, "+r" (__sp)                                         \
> +               : output                                                      \
>                 : [old] "i" (oldfunc), [new1] "i" (newfunc1),                 \
>                   [new2] "i" (newfunc2), ## input);                           \
>  }
> diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h
> index 676ee5807d86..ff8921d4615b 100644
> --- a/arch/x86/include/asm/asm.h
> +++ b/arch/x86/include/asm/asm.h
> @@ -132,4 +132,13 @@
>  /* For C file, we already have NOKPROBE_SYMBOL macro */
>  #endif
>
> +#ifndef __ASSEMBLY__
> +/*
> + * This variable declaration has the side effect of forcing GCC to always set
> + * up the frame pointer before inserting any inline asm.  This is necessary
> + * because some inline asm statements have CALL instructions.
> + */
> +register unsigned int __sp_unused asm("esp");
Shouldn't this be "asm(_ASM_SP)"?
> +#endif
> +
>  #endif /* _ASM_X86_ASM_H */
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index 63cc96f064dc..1c913ae27f99 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -179,16 +179,15 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
>         u64 input_address = input ? virt_to_phys(input) : 0;
>         u64 output_address = output ? virt_to_phys(output) : 0;
>         u64 hv_status;
> -       register void *__sp asm(_ASM_SP);
>
>  #ifdef CONFIG_X86_64
>         if (!hv_hypercall_pg)
>                 return U64_MAX;
>
> -       __asm__ __volatile__("mov %4, %%r8\n"
> -                            "call *%5"
> -                            : "=a" (hv_status), "+r" (__sp),
> -                              "+c" (control), "+d" (input_address)
> +       __asm__ __volatile__("mov %3, %%r8\n"
> +                            "call *%4"
> +                            : "=a" (hv_status), "+c" (control),
> +                              "+d" (input_address)
>                              :  "r" (output_address), "m" (hv_hypercall_pg)
>                              : "cc", "memory", "r8", "r9", "r10", "r11");
>  #else
> @@ -200,9 +199,8 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
>         if (!hv_hypercall_pg)
>                 return U64_MAX;
>
> -       __asm__ __volatile__("call *%7"
> -                            : "=A" (hv_status),
> -                              "+c" (input_address_lo), "+r" (__sp)
> +       __asm__ __volatile__("call *%6"
> +                            : "=A" (hv_status), "+c" (input_address_lo)
>                              : "A" (control),
>                                "b" (input_address_hi),
>                                "D"(output_address_hi), "S"(output_address_lo),
> @@ -224,13 +222,12 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
>  static inline u64 hv_do_fast_hypercall8(u16 code, u64 input1)
>  {
>         u64 hv_status, control = (u64)code | HV_HYPERCALL_FAST_BIT;
> -       register void *__sp asm(_ASM_SP);
>
>  #ifdef CONFIG_X86_64
>         {
> -               __asm__ __volatile__("call *%4"
> -                                    : "=a" (hv_status), "+r" (__sp),
> -                                      "+c" (control), "+d" (input1)
> +               __asm__ __volatile__("call *%3"
> +                                    : "=a" (hv_status), "+c" (control),
> +                                      "+d" (input1)
>                                      : "m" (hv_hypercall_pg)
>                                      : "cc", "r8", "r9", "r10", "r11");
>         }
> @@ -239,10 +236,8 @@ static inline u64 hv_do_fast_hypercall8(u16 code, u64 input1)
>                 u32 input1_hi = upper_32_bits(input1);
>                 u32 input1_lo = lower_32_bits(input1);
>
> -               __asm__ __volatile__ ("call *%5"
> -                                     : "=A"(hv_status),
> -                                       "+c"(input1_lo),
> -                                       "+r"(__sp)
> +               __asm__ __volatile__ ("call *%4"
> +                                     : "=A"(hv_status), "+c"(input1_lo)
>                                       : "A" (control),
>                                         "b" (input1_hi),
>                                         "m" (hv_hypercall_pg)
> diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
> index 42873edd9f9d..b8966ed71ba5 100644
> --- a/arch/x86/include/asm/paravirt_types.h
> +++ b/arch/x86/include/asm/paravirt_types.h
> @@ -459,8 +459,8 @@ int paravirt_disable_iospace(void);
>   */
>  #ifdef CONFIG_X86_32
>  #define PVOP_VCALL_ARGS                                                        \
> -       unsigned long __eax = __eax, __edx = __edx, __ecx = __ecx;      \
> -       register void *__sp asm("esp")
> +       unsigned long __eax = __eax, __edx = __edx, __ecx = __ecx;
> +
>  #define PVOP_CALL_ARGS                 PVOP_VCALL_ARGS
>
>  #define PVOP_CALL_ARG1(x)              "a" ((unsigned long)(x))
> @@ -480,8 +480,8 @@ int paravirt_disable_iospace(void);
>  /* [re]ax isn't an arg, but the return val */
>  #define PVOP_VCALL_ARGS                                                \
>         unsigned long __edi = __edi, __esi = __esi,             \
> -               __edx = __edx, __ecx = __ecx, __eax = __eax;    \
> -       register void *__sp asm("rsp")
> +               __edx = __edx, __ecx = __ecx, __eax = __eax;
> +
>  #define PVOP_CALL_ARGS         PVOP_VCALL_ARGS
>
>  #define PVOP_CALL_ARG1(x)              "D" ((unsigned long)(x))
> @@ -532,7 +532,7 @@ int paravirt_disable_iospace(void);
>                         asm volatile(pre                                \
>                                      paravirt_alt(PARAVIRT_CALL)        \
>                                      post                               \
> -                                    : call_clbr, "+r" (__sp)           \
> +                                    : call_clbr                        \
>                                      : paravirt_type(op),               \
>                                        paravirt_clobber(clbr),          \
>                                        ##__VA_ARGS__                    \
> @@ -542,7 +542,7 @@ int paravirt_disable_iospace(void);
>                         asm volatile(pre                                \
>                                      paravirt_alt(PARAVIRT_CALL)        \
>                                      post                               \
> -                                    : call_clbr, "+r" (__sp)           \
> +                                    : call_clbr                        \
>                                      : paravirt_type(op),               \
>                                        paravirt_clobber(clbr),          \
>                                        ##__VA_ARGS__                    \
> @@ -569,7 +569,7 @@ int paravirt_disable_iospace(void);
>                 asm volatile(pre                                        \
>                              paravirt_alt(PARAVIRT_CALL)                \
>                              post                                       \
> -                            : call_clbr, "+r" (__sp)                   \
> +                            : call_clbr                                \
>                              : paravirt_type(op),                       \
>                                paravirt_clobber(clbr),                  \
>                                ##__VA_ARGS__                            \
> diff --git a/arch/x86/include/asm/preempt.h b/arch/x86/include/asm/preempt.h
> index ec1f3c651150..f66b189a1142 100644
> --- a/arch/x86/include/asm/preempt.h
> +++ b/arch/x86/include/asm/preempt.h
> @@ -100,19 +100,14 @@ static __always_inline bool should_resched(int preempt_offset)
>
>  #ifdef CONFIG_PREEMPT
>    extern asmlinkage void ___preempt_schedule(void);
> -# define __preempt_schedule()                                  \
> -({                                                             \
> -       register void *__sp asm(_ASM_SP);                       \
> -       asm volatile ("call ___preempt_schedule" : "+r"(__sp)); \
> -})
> +# define __preempt_schedule() \
> +       asm volatile ("call ___preempt_schedule")
>
>    extern asmlinkage void preempt_schedule(void);
>    extern asmlinkage void ___preempt_schedule_notrace(void);
> -# define __preempt_schedule_notrace()                                  \
> -({                                                                     \
> -       register void *__sp asm(_ASM_SP);                               \
> -       asm volatile ("call ___preempt_schedule_notrace" : "+r"(__sp)); \
> -})
> +# define __preempt_schedule_notrace() \
> +       asm volatile ("call ___preempt_schedule_notrace")
> +
>    extern asmlinkage void preempt_schedule_notrace(void);
>  #endif
>
> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> index 3fa26a61eabc..7a7969211527 100644
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -677,8 +677,6 @@ static inline void sync_core(void)
>          * Like all of Linux's memory ordering operations, this is a
>          * compiler barrier as well.
>          */
> -       register void *__sp asm(_ASM_SP);
> -
>  #ifdef CONFIG_X86_32
>         asm volatile (
>                 "pushfl\n\t"
> @@ -686,7 +684,7 @@ static inline void sync_core(void)
>                 "pushl $1f\n\t"
>                 "iret\n\t"
>                 "1:"
> -               : "+r" (__sp) : : "memory");
> +               : : : "memory");
>  #else
>         unsigned int tmp;
>
> @@ -703,7 +701,7 @@ static inline void sync_core(void)
>                 "iretq\n\t"
>                 UNWIND_HINT_RESTORE
>                 "1:"
> -               : "=&r" (tmp), "+r" (__sp) : : "cc", "memory");
> +               : "=&r" (tmp) : : "cc", "memory");
>  #endif
>  }
>
> diff --git a/arch/x86/include/asm/rwsem.h b/arch/x86/include/asm/rwsem.h
> index a34e0d4b957d..f8f127a2a300 100644
> --- a/arch/x86/include/asm/rwsem.h
> +++ b/arch/x86/include/asm/rwsem.h
> @@ -103,10 +103,9 @@ static inline bool __down_read_trylock(struct rw_semaphore *sem)
>  ({                                                     \
>         long tmp;                                       \
>         struct rw_semaphore* ret;                       \
> -       register void *__sp asm(_ASM_SP);               \
>                                                         \
>         asm volatile("# beginning down_write\n\t"       \
> -                    LOCK_PREFIX "  xadd      %1,(%4)\n\t"      \
> +                    LOCK_PREFIX "  xadd      %1,(%3)\n\t"      \
>                      /* adds 0xffff0001, returns the old value */ \
>                      "  test " __ASM_SEL(%w1,%k1) "," __ASM_SEL(%w1,%k1) "\n\t" \
>                      /* was the active mask 0 before? */\
> @@ -114,7 +113,8 @@ static inline bool __down_read_trylock(struct rw_semaphore *sem)
>                      "  call " slow_path "\n"           \
>                      "1:\n"                             \
>                      "# ending down_write"              \
> -                    : "+m" (sem->count), "=d" (tmp), "=a" (ret), "+r" (__sp) \
> +                    : "+m" (sem->count), "=d" (tmp),   \
> +                      "=a" (ret)                       \
>                      : "a" (sem), "1" (RWSEM_ACTIVE_WRITE_BIAS) \
>                      : "memory", "cc");                 \
>         ret;                                            \
> diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
> index 184eb9894dae..113ddb8ce0d8 100644
> --- a/arch/x86/include/asm/uaccess.h
> +++ b/arch/x86/include/asm/uaccess.h
> @@ -166,11 +166,10 @@ __typeof__(__builtin_choose_expr(sizeof(x) > sizeof(0UL), 0ULL, 0UL))
>  ({                                                                     \
>         int __ret_gu;                                                   \
>         register __inttype(*(ptr)) __val_gu asm("%"_ASM_DX);            \
> -       register void *__sp asm(_ASM_SP);                               \
>         __chk_user_ptr(ptr);                                            \
>         might_fault();                                                  \
> -       asm volatile("call __get_user_%P4"                              \
> -                    : "=a" (__ret_gu), "=r" (__val_gu), "+r" (__sp)    \
> +       asm volatile("call __get_user_%P3"                              \
> +                    : "=a" (__ret_gu), "=r" (__val_gu)                 \
>                      : "0" (ptr), "i" (sizeof(*(ptr))));                \
>         (x) = (__force __typeof__(*(ptr))) __val_gu;                    \
>         __builtin_expect(__ret_gu, 0);                                  \
> diff --git a/arch/x86/include/asm/xen/hypercall.h b/arch/x86/include/asm/xen/hypercall.h
> index 9606688caa4b..6264dba01896 100644
> --- a/arch/x86/include/asm/xen/hypercall.h
> +++ b/arch/x86/include/asm/xen/hypercall.h
> @@ -113,10 +113,9 @@ extern struct { char _entry[32]; } hypercall_page[];
>         register unsigned long __arg2 asm(__HYPERCALL_ARG2REG) = __arg2; \
>         register unsigned long __arg3 asm(__HYPERCALL_ARG3REG) = __arg3; \
>         register unsigned long __arg4 asm(__HYPERCALL_ARG4REG) = __arg4; \
> -       register unsigned long __arg5 asm(__HYPERCALL_ARG5REG) = __arg5; \
> -       register void *__sp asm(_ASM_SP);
> +       register unsigned long __arg5 asm(__HYPERCALL_ARG5REG) = __arg5;
>
> -#define __HYPERCALL_0PARAM     "=r" (__res), "+r" (__sp)
> +#define __HYPERCALL_0PARAM     "=r" (__res)
>  #define __HYPERCALL_1PARAM     __HYPERCALL_0PARAM, "+r" (__arg1)
>  #define __HYPERCALL_2PARAM     __HYPERCALL_1PARAM, "+r" (__arg2)
>  #define __HYPERCALL_3PARAM     __HYPERCALL_2PARAM, "+r" (__arg3)
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index 16bf6655aa85..5e82e0821c1f 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -5296,7 +5296,6 @@ static void fetch_possible_mmx_operand(struct x86_emulate_ctxt *ctxt,
>
>  static int fastop(struct x86_emulate_ctxt *ctxt, void (*fop)(struct fastop *))
>  {
> -       register void *__sp asm(_ASM_SP);
>         ulong flags = (ctxt->eflags & EFLAGS_MASK) | X86_EFLAGS_IF;
>
>         if (!(ctxt->d & ByteOp))
> @@ -5304,7 +5303,7 @@ static int fastop(struct x86_emulate_ctxt *ctxt, void (*fop)(struct fastop *))
>
>         asm("push %[flags]; popf; call *%[fastop]; pushf; pop %[flags]\n"
>             : "+a"(ctxt->dst.val), "+d"(ctxt->src.val), [flags]"+D"(flags),
> -             [fastop]"+S"(fop), "+r"(__sp)
> +             [fastop]"+S"(fop)
>             : "c"(ctxt->src2.val));
>
>         ctxt->eflags = (ctxt->eflags & ~EFLAGS_MASK) | (flags & EFLAGS_MASK);
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 06c0c6d0541e..a693362f41af 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -9036,7 +9036,6 @@ static void vmx_complete_atomic_exit(struct vcpu_vmx *vmx)
>  static void vmx_handle_external_intr(struct kvm_vcpu *vcpu)
>  {
>         u32 exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
> -       register void *__sp asm(_ASM_SP);
>
>         if ((exit_intr_info & (INTR_INFO_VALID_MASK | INTR_INFO_INTR_TYPE_MASK))
>                         == (INTR_INFO_VALID_MASK | INTR_TYPE_EXT_INTR)) {
> @@ -9063,9 +9062,8 @@ static void vmx_handle_external_intr(struct kvm_vcpu *vcpu)
>                         "call *%[entry]\n\t"
>                         :
>  #ifdef CONFIG_X86_64
> -                       [sp]"=&r"(tmp),
> +                       [sp]"=&r"(tmp)
>  #endif
> -                       "+r"(__sp)
>                         :
>                         [entry]"r"(entry),
>                         [ss]"i"(__KERNEL_DS),
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index b836a7274e12..4457e41378e4 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -806,7 +806,6 @@ no_context(struct pt_regs *regs, unsigned long error_code,
>         if (is_vmalloc_addr((void *)address) &&
>             (((unsigned long)tsk->stack - 1 - address < PAGE_SIZE) ||
>              address - ((unsigned long)tsk->stack + THREAD_SIZE) < PAGE_SIZE)) {
> -               register void *__sp asm("rsp");
>                 unsigned long stack = this_cpu_read(orig_ist.ist[DOUBLEFAULT_STACK]) - sizeof(void *);
>                 /*
>                  * We're likely to be running with very little stack space
> @@ -821,7 +820,7 @@ no_context(struct pt_regs *regs, unsigned long error_code,
>                 asm volatile ("movq %[stack], %%rsp\n\t"
>                               "call handle_stack_overflow\n\t"
>                               "1: jmp 1b"
> -                             : "+r" (__sp)
> +                             :
>                               : "D" ("kernel stack overflow (page fault)"),
>                                 "S" (regs), "d" (address),
>                                 [stack] "rm" (stack));
> diff --git a/tools/objtool/Documentation/stack-validation.txt b/tools/objtool/Documentation/stack-validation.txt
> index 6a1af43862df..18af4aa7488d 100644
> --- a/tools/objtool/Documentation/stack-validation.txt
> +++ b/tools/objtool/Documentation/stack-validation.txt
> @@ -192,12 +192,22 @@ they mean, and suggestions for how to fix them.
>     use the manual unwind hint macros in asm/unwind_hints.h.
>
>     If it's a GCC-compiled .c file, the error may be because the function
> -   uses an inline asm() statement which has a "call" instruction.  An
> -   asm() statement with a call instruction must declare the use of the
> -   stack pointer in its output operand.  For example, on x86_64:
> +   uses an inline asm() statement which has a "call" instruction.  To
> +   fix this, either:
>
> -     register void *__sp asm("rsp");
> -     asm volatile("call func" : "+r" (__sp));
> +     a) list the stack pointer as an input/output constraint;
> +
> +        register void *__sp asm("rsp");
> +        asm volatile("call func" : "+r" (__sp));
> +
> +     or
> +
> +     b) declare a global register variable for the stack pointer.
> +
> +       arch/x86/include/asm/asm.h:register unsigned int __sp_unused asm("esp");
> +
> +       This is the strategy used on x86_64.  It ensures that GCC always
> +       sets up the frame pointer before inserting *any* inline asm.
>
>     Otherwise the stack frame may not get created before the call.
>
> --
> 2.13.5
>



-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang
  2017-09-19 21:55   ` Alexander Potapenko
@ 2017-09-19 22:25     ` Alexander Potapenko
  2017-09-20  1:47       ` Josh Poimboeuf
  0 siblings, 1 reply; 20+ messages in thread
From: Alexander Potapenko @ 2017-09-19 22:25 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, LKML, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Andy Lutomirski, Linus Torvalds, Dmitriy Vyukov,
	Matthias Kaehlcke, Arnd Bergmann, Peter Zijlstra,
	Andrey Ryabinin

On Tue, Sep 19, 2017 at 2:55 PM, Alexander Potapenko <glider@google.com> wrote:
> On Tue, Sep 19, 2017 at 11:45 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>> For inline asm statements which have a CALL instruction, we list the
>> stack pointer as a constraint to convince GCC to ensure the frame
>> pointer is set up first:
>>
>>   static inline void foo()
>>   {
>>         register void *__sp asm(_ASM_SP);
>>         asm("call bar" : "+r" (__sp))
>>   }
>>
>> Unfortunately, that pattern causes clang to corrupt the stack pointer.
>>
>> There's actually an easier way to achieve the same goal in GCC, without
>> causing trouble for clang.  If we declare the stack pointer register
>> variable as a global variable, and remove the constraint altogether,
>> that convinces GCC to always set up the frame pointer before inserting
>> *any* inline asm.
>>
>> It basically acts as if *every* inline asm statement has a CALL
>> instruction.  It's a bit overkill, but the performance impact should be
>> negligible.
>>
>> Here are the vmlinux .text size differences with the following configs
>> on GCC:
>>
>> - defconfig
>> - defconfig without frame pointers
>> - Fedora distro config
>> - Fedora distro config without frame pointers
>>
>>         defconfig       defconfig-nofp  distro          distro-nofp
>> before  9796300         9466764         9076191         8789745
>> after   9796941         9462859         9076381         8785325
>>
>> With frame pointers, the text size increases slightly.  Without frame
>> pointers, the text size decreases, and a little more significantly.
>>
>> Reported-by: Matthias Kaehlcke <mka@chromium.org>
>> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
>> ---
>>  arch/x86/include/asm/alternative.h               |  3 +--
>>  arch/x86/include/asm/asm.h                       |  9 ++++++++
>>  arch/x86/include/asm/mshyperv.h                  | 27 ++++++++++--------------
>>  arch/x86/include/asm/paravirt_types.h            | 14 ++++++------
>>  arch/x86/include/asm/preempt.h                   | 15 +++++--------
>>  arch/x86/include/asm/processor.h                 |  6 ++----
>>  arch/x86/include/asm/rwsem.h                     |  6 +++---
>>  arch/x86/include/asm/uaccess.h                   |  5 ++---
>>  arch/x86/include/asm/xen/hypercall.h             |  5 ++---
>>  arch/x86/kvm/emulate.c                           |  3 +--
>>  arch/x86/kvm/vmx.c                               |  4 +---
>>  arch/x86/mm/fault.c                              |  3 +--
>>  tools/objtool/Documentation/stack-validation.txt | 20 +++++++++++++-----
>>  13 files changed, 60 insertions(+), 60 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
>> index 1b020381ab38..7aeb1cef4204 100644
>> --- a/arch/x86/include/asm/alternative.h
>> +++ b/arch/x86/include/asm/alternative.h
>> @@ -218,10 +218,9 @@ static inline int alternatives_text_reserved(void *start, void *end)
>>  #define alternative_call_2(oldfunc, newfunc1, feature1, newfunc2, feature2,   \
>>                            output, input...)                                  \
>>  {                                                                            \
>> -       register void *__sp asm(_ASM_SP);                                     \
>>         asm volatile (ALTERNATIVE_2("call %P[old]", "call %P[new1]", feature1,\
>>                 "call %P[new2]", feature2)                                    \
>> -               : output, "+r" (__sp)                                         \
>> +               : output                                                      \
>>                 : [old] "i" (oldfunc), [new1] "i" (newfunc1),                 \
>>                   [new2] "i" (newfunc2), ## input);                           \
>>  }
>> diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h
>> index 676ee5807d86..ff8921d4615b 100644
>> --- a/arch/x86/include/asm/asm.h
>> +++ b/arch/x86/include/asm/asm.h
>> @@ -132,4 +132,13 @@
>>  /* For C file, we already have NOKPROBE_SYMBOL macro */
>>  #endif
>>
>> +#ifndef __ASSEMBLY__
>> +/*
>> + * This variable declaration has the side effect of forcing GCC to always set
>> + * up the frame pointer before inserting any inline asm.  This is necessary
>> + * because some inline asm statements have CALL instructions.
>> + */
>> +register unsigned int __sp_unused asm("esp");
> Shouldn't this be "asm(_ASM_SP)"?
Answering my own question, this can't be _ASM_SP because of the
realmode code, which is built with -m16 whereas _ASM_SP is "rsp".
The patch works fine with "esp" though - most certainly declaring a
ESP variable is enough to reserve the whole RSP in an x86_64 build.
>> +#endif
>> +
>>  #endif /* _ASM_X86_ASM_H */
>> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
>> index 63cc96f064dc..1c913ae27f99 100644
>> --- a/arch/x86/include/asm/mshyperv.h
>> +++ b/arch/x86/include/asm/mshyperv.h
>> @@ -179,16 +179,15 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
>>         u64 input_address = input ? virt_to_phys(input) : 0;
>>         u64 output_address = output ? virt_to_phys(output) : 0;
>>         u64 hv_status;
>> -       register void *__sp asm(_ASM_SP);
>>
>>  #ifdef CONFIG_X86_64
>>         if (!hv_hypercall_pg)
>>                 return U64_MAX;
>>
>> -       __asm__ __volatile__("mov %4, %%r8\n"
>> -                            "call *%5"
>> -                            : "=a" (hv_status), "+r" (__sp),
>> -                              "+c" (control), "+d" (input_address)
>> +       __asm__ __volatile__("mov %3, %%r8\n"
>> +                            "call *%4"
>> +                            : "=a" (hv_status), "+c" (control),
>> +                              "+d" (input_address)
>>                              :  "r" (output_address), "m" (hv_hypercall_pg)
>>                              : "cc", "memory", "r8", "r9", "r10", "r11");
>>  #else
>> @@ -200,9 +199,8 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
>>         if (!hv_hypercall_pg)
>>                 return U64_MAX;
>>
>> -       __asm__ __volatile__("call *%7"
>> -                            : "=A" (hv_status),
>> -                              "+c" (input_address_lo), "+r" (__sp)
>> +       __asm__ __volatile__("call *%6"
>> +                            : "=A" (hv_status), "+c" (input_address_lo)
>>                              : "A" (control),
>>                                "b" (input_address_hi),
>>                                "D"(output_address_hi), "S"(output_address_lo),
>> @@ -224,13 +222,12 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
>>  static inline u64 hv_do_fast_hypercall8(u16 code, u64 input1)
>>  {
>>         u64 hv_status, control = (u64)code | HV_HYPERCALL_FAST_BIT;
>> -       register void *__sp asm(_ASM_SP);
>>
>>  #ifdef CONFIG_X86_64
>>         {
>> -               __asm__ __volatile__("call *%4"
>> -                                    : "=a" (hv_status), "+r" (__sp),
>> -                                      "+c" (control), "+d" (input1)
>> +               __asm__ __volatile__("call *%3"
>> +                                    : "=a" (hv_status), "+c" (control),
>> +                                      "+d" (input1)
>>                                      : "m" (hv_hypercall_pg)
>>                                      : "cc", "r8", "r9", "r10", "r11");
>>         }
>> @@ -239,10 +236,8 @@ static inline u64 hv_do_fast_hypercall8(u16 code, u64 input1)
>>                 u32 input1_hi = upper_32_bits(input1);
>>                 u32 input1_lo = lower_32_bits(input1);
>>
>> -               __asm__ __volatile__ ("call *%5"
>> -                                     : "=A"(hv_status),
>> -                                       "+c"(input1_lo),
>> -                                       "+r"(__sp)
>> +               __asm__ __volatile__ ("call *%4"
>> +                                     : "=A"(hv_status), "+c"(input1_lo)
>>                                       : "A" (control),
>>                                         "b" (input1_hi),
>>                                         "m" (hv_hypercall_pg)
>> diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
>> index 42873edd9f9d..b8966ed71ba5 100644
>> --- a/arch/x86/include/asm/paravirt_types.h
>> +++ b/arch/x86/include/asm/paravirt_types.h
>> @@ -459,8 +459,8 @@ int paravirt_disable_iospace(void);
>>   */
>>  #ifdef CONFIG_X86_32
>>  #define PVOP_VCALL_ARGS                                                        \
>> -       unsigned long __eax = __eax, __edx = __edx, __ecx = __ecx;      \
>> -       register void *__sp asm("esp")
>> +       unsigned long __eax = __eax, __edx = __edx, __ecx = __ecx;
>> +
>>  #define PVOP_CALL_ARGS                 PVOP_VCALL_ARGS
>>
>>  #define PVOP_CALL_ARG1(x)              "a" ((unsigned long)(x))
>> @@ -480,8 +480,8 @@ int paravirt_disable_iospace(void);
>>  /* [re]ax isn't an arg, but the return val */
>>  #define PVOP_VCALL_ARGS                                                \
>>         unsigned long __edi = __edi, __esi = __esi,             \
>> -               __edx = __edx, __ecx = __ecx, __eax = __eax;    \
>> -       register void *__sp asm("rsp")
>> +               __edx = __edx, __ecx = __ecx, __eax = __eax;
>> +
>>  #define PVOP_CALL_ARGS         PVOP_VCALL_ARGS
>>
>>  #define PVOP_CALL_ARG1(x)              "D" ((unsigned long)(x))
>> @@ -532,7 +532,7 @@ int paravirt_disable_iospace(void);
>>                         asm volatile(pre                                \
>>                                      paravirt_alt(PARAVIRT_CALL)        \
>>                                      post                               \
>> -                                    : call_clbr, "+r" (__sp)           \
>> +                                    : call_clbr                        \
>>                                      : paravirt_type(op),               \
>>                                        paravirt_clobber(clbr),          \
>>                                        ##__VA_ARGS__                    \
>> @@ -542,7 +542,7 @@ int paravirt_disable_iospace(void);
>>                         asm volatile(pre                                \
>>                                      paravirt_alt(PARAVIRT_CALL)        \
>>                                      post                               \
>> -                                    : call_clbr, "+r" (__sp)           \
>> +                                    : call_clbr                        \
>>                                      : paravirt_type(op),               \
>>                                        paravirt_clobber(clbr),          \
>>                                        ##__VA_ARGS__                    \
>> @@ -569,7 +569,7 @@ int paravirt_disable_iospace(void);
>>                 asm volatile(pre                                        \
>>                              paravirt_alt(PARAVIRT_CALL)                \
>>                              post                                       \
>> -                            : call_clbr, "+r" (__sp)                   \
>> +                            : call_clbr                                \
>>                              : paravirt_type(op),                       \
>>                                paravirt_clobber(clbr),                  \
>>                                ##__VA_ARGS__                            \
>> diff --git a/arch/x86/include/asm/preempt.h b/arch/x86/include/asm/preempt.h
>> index ec1f3c651150..f66b189a1142 100644
>> --- a/arch/x86/include/asm/preempt.h
>> +++ b/arch/x86/include/asm/preempt.h
>> @@ -100,19 +100,14 @@ static __always_inline bool should_resched(int preempt_offset)
>>
>>  #ifdef CONFIG_PREEMPT
>>    extern asmlinkage void ___preempt_schedule(void);
>> -# define __preempt_schedule()                                  \
>> -({                                                             \
>> -       register void *__sp asm(_ASM_SP);                       \
>> -       asm volatile ("call ___preempt_schedule" : "+r"(__sp)); \
>> -})
>> +# define __preempt_schedule() \
>> +       asm volatile ("call ___preempt_schedule")
>>
>>    extern asmlinkage void preempt_schedule(void);
>>    extern asmlinkage void ___preempt_schedule_notrace(void);
>> -# define __preempt_schedule_notrace()                                  \
>> -({                                                                     \
>> -       register void *__sp asm(_ASM_SP);                               \
>> -       asm volatile ("call ___preempt_schedule_notrace" : "+r"(__sp)); \
>> -})
>> +# define __preempt_schedule_notrace() \
>> +       asm volatile ("call ___preempt_schedule_notrace")
>> +
>>    extern asmlinkage void preempt_schedule_notrace(void);
>>  #endif
>>
>> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
>> index 3fa26a61eabc..7a7969211527 100644
>> --- a/arch/x86/include/asm/processor.h
>> +++ b/arch/x86/include/asm/processor.h
>> @@ -677,8 +677,6 @@ static inline void sync_core(void)
>>          * Like all of Linux's memory ordering operations, this is a
>>          * compiler barrier as well.
>>          */
>> -       register void *__sp asm(_ASM_SP);
>> -
>>  #ifdef CONFIG_X86_32
>>         asm volatile (
>>                 "pushfl\n\t"
>> @@ -686,7 +684,7 @@ static inline void sync_core(void)
>>                 "pushl $1f\n\t"
>>                 "iret\n\t"
>>                 "1:"
>> -               : "+r" (__sp) : : "memory");
>> +               : : : "memory");
>>  #else
>>         unsigned int tmp;
>>
>> @@ -703,7 +701,7 @@ static inline void sync_core(void)
>>                 "iretq\n\t"
>>                 UNWIND_HINT_RESTORE
>>                 "1:"
>> -               : "=&r" (tmp), "+r" (__sp) : : "cc", "memory");
>> +               : "=&r" (tmp) : : "cc", "memory");
>>  #endif
>>  }
>>
>> diff --git a/arch/x86/include/asm/rwsem.h b/arch/x86/include/asm/rwsem.h
>> index a34e0d4b957d..f8f127a2a300 100644
>> --- a/arch/x86/include/asm/rwsem.h
>> +++ b/arch/x86/include/asm/rwsem.h
>> @@ -103,10 +103,9 @@ static inline bool __down_read_trylock(struct rw_semaphore *sem)
>>  ({                                                     \
>>         long tmp;                                       \
>>         struct rw_semaphore* ret;                       \
>> -       register void *__sp asm(_ASM_SP);               \
>>                                                         \
>>         asm volatile("# beginning down_write\n\t"       \
>> -                    LOCK_PREFIX "  xadd      %1,(%4)\n\t"      \
>> +                    LOCK_PREFIX "  xadd      %1,(%3)\n\t"      \
>>                      /* adds 0xffff0001, returns the old value */ \
>>                      "  test " __ASM_SEL(%w1,%k1) "," __ASM_SEL(%w1,%k1) "\n\t" \
>>                      /* was the active mask 0 before? */\
>> @@ -114,7 +113,8 @@ static inline bool __down_read_trylock(struct rw_semaphore *sem)
>>                      "  call " slow_path "\n"           \
>>                      "1:\n"                             \
>>                      "# ending down_write"              \
>> -                    : "+m" (sem->count), "=d" (tmp), "=a" (ret), "+r" (__sp) \
>> +                    : "+m" (sem->count), "=d" (tmp),   \
>> +                      "=a" (ret)                       \
>>                      : "a" (sem), "1" (RWSEM_ACTIVE_WRITE_BIAS) \
>>                      : "memory", "cc");                 \
>>         ret;                                            \
>> diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
>> index 184eb9894dae..113ddb8ce0d8 100644
>> --- a/arch/x86/include/asm/uaccess.h
>> +++ b/arch/x86/include/asm/uaccess.h
>> @@ -166,11 +166,10 @@ __typeof__(__builtin_choose_expr(sizeof(x) > sizeof(0UL), 0ULL, 0UL))
>>  ({                                                                     \
>>         int __ret_gu;                                                   \
>>         register __inttype(*(ptr)) __val_gu asm("%"_ASM_DX);            \
>> -       register void *__sp asm(_ASM_SP);                               \
>>         __chk_user_ptr(ptr);                                            \
>>         might_fault();                                                  \
>> -       asm volatile("call __get_user_%P4"                              \
>> -                    : "=a" (__ret_gu), "=r" (__val_gu), "+r" (__sp)    \
>> +       asm volatile("call __get_user_%P3"                              \
>> +                    : "=a" (__ret_gu), "=r" (__val_gu)                 \
>>                      : "0" (ptr), "i" (sizeof(*(ptr))));                \
>>         (x) = (__force __typeof__(*(ptr))) __val_gu;                    \
>>         __builtin_expect(__ret_gu, 0);                                  \
>> diff --git a/arch/x86/include/asm/xen/hypercall.h b/arch/x86/include/asm/xen/hypercall.h
>> index 9606688caa4b..6264dba01896 100644
>> --- a/arch/x86/include/asm/xen/hypercall.h
>> +++ b/arch/x86/include/asm/xen/hypercall.h
>> @@ -113,10 +113,9 @@ extern struct { char _entry[32]; } hypercall_page[];
>>         register unsigned long __arg2 asm(__HYPERCALL_ARG2REG) = __arg2; \
>>         register unsigned long __arg3 asm(__HYPERCALL_ARG3REG) = __arg3; \
>>         register unsigned long __arg4 asm(__HYPERCALL_ARG4REG) = __arg4; \
>> -       register unsigned long __arg5 asm(__HYPERCALL_ARG5REG) = __arg5; \
>> -       register void *__sp asm(_ASM_SP);
>> +       register unsigned long __arg5 asm(__HYPERCALL_ARG5REG) = __arg5;
>>
>> -#define __HYPERCALL_0PARAM     "=r" (__res), "+r" (__sp)
>> +#define __HYPERCALL_0PARAM     "=r" (__res)
>>  #define __HYPERCALL_1PARAM     __HYPERCALL_0PARAM, "+r" (__arg1)
>>  #define __HYPERCALL_2PARAM     __HYPERCALL_1PARAM, "+r" (__arg2)
>>  #define __HYPERCALL_3PARAM     __HYPERCALL_2PARAM, "+r" (__arg3)
>> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
>> index 16bf6655aa85..5e82e0821c1f 100644
>> --- a/arch/x86/kvm/emulate.c
>> +++ b/arch/x86/kvm/emulate.c
>> @@ -5296,7 +5296,6 @@ static void fetch_possible_mmx_operand(struct x86_emulate_ctxt *ctxt,
>>
>>  static int fastop(struct x86_emulate_ctxt *ctxt, void (*fop)(struct fastop *))
>>  {
>> -       register void *__sp asm(_ASM_SP);
>>         ulong flags = (ctxt->eflags & EFLAGS_MASK) | X86_EFLAGS_IF;
>>
>>         if (!(ctxt->d & ByteOp))
>> @@ -5304,7 +5303,7 @@ static int fastop(struct x86_emulate_ctxt *ctxt, void (*fop)(struct fastop *))
>>
>>         asm("push %[flags]; popf; call *%[fastop]; pushf; pop %[flags]\n"
>>             : "+a"(ctxt->dst.val), "+d"(ctxt->src.val), [flags]"+D"(flags),
>> -             [fastop]"+S"(fop), "+r"(__sp)
>> +             [fastop]"+S"(fop)
>>             : "c"(ctxt->src2.val));
>>
>>         ctxt->eflags = (ctxt->eflags & ~EFLAGS_MASK) | (flags & EFLAGS_MASK);
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index 06c0c6d0541e..a693362f41af 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -9036,7 +9036,6 @@ static void vmx_complete_atomic_exit(struct vcpu_vmx *vmx)
>>  static void vmx_handle_external_intr(struct kvm_vcpu *vcpu)
>>  {
>>         u32 exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
>> -       register void *__sp asm(_ASM_SP);
>>
>>         if ((exit_intr_info & (INTR_INFO_VALID_MASK | INTR_INFO_INTR_TYPE_MASK))
>>                         == (INTR_INFO_VALID_MASK | INTR_TYPE_EXT_INTR)) {
>> @@ -9063,9 +9062,8 @@ static void vmx_handle_external_intr(struct kvm_vcpu *vcpu)
>>                         "call *%[entry]\n\t"
>>                         :
>>  #ifdef CONFIG_X86_64
>> -                       [sp]"=&r"(tmp),
>> +                       [sp]"=&r"(tmp)
>>  #endif
>> -                       "+r"(__sp)
>>                         :
>>                         [entry]"r"(entry),
>>                         [ss]"i"(__KERNEL_DS),
>> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
>> index b836a7274e12..4457e41378e4 100644
>> --- a/arch/x86/mm/fault.c
>> +++ b/arch/x86/mm/fault.c
>> @@ -806,7 +806,6 @@ no_context(struct pt_regs *regs, unsigned long error_code,
>>         if (is_vmalloc_addr((void *)address) &&
>>             (((unsigned long)tsk->stack - 1 - address < PAGE_SIZE) ||
>>              address - ((unsigned long)tsk->stack + THREAD_SIZE) < PAGE_SIZE)) {
>> -               register void *__sp asm("rsp");
>>                 unsigned long stack = this_cpu_read(orig_ist.ist[DOUBLEFAULT_STACK]) - sizeof(void *);
>>                 /*
>>                  * We're likely to be running with very little stack space
>> @@ -821,7 +820,7 @@ no_context(struct pt_regs *regs, unsigned long error_code,
>>                 asm volatile ("movq %[stack], %%rsp\n\t"
>>                               "call handle_stack_overflow\n\t"
>>                               "1: jmp 1b"
>> -                             : "+r" (__sp)
>> +                             :
>>                               : "D" ("kernel stack overflow (page fault)"),
>>                                 "S" (regs), "d" (address),
>>                                 [stack] "rm" (stack));
>> diff --git a/tools/objtool/Documentation/stack-validation.txt b/tools/objtool/Documentation/stack-validation.txt
>> index 6a1af43862df..18af4aa7488d 100644
>> --- a/tools/objtool/Documentation/stack-validation.txt
>> +++ b/tools/objtool/Documentation/stack-validation.txt
>> @@ -192,12 +192,22 @@ they mean, and suggestions for how to fix them.
>>     use the manual unwind hint macros in asm/unwind_hints.h.
>>
>>     If it's a GCC-compiled .c file, the error may be because the function
>> -   uses an inline asm() statement which has a "call" instruction.  An
>> -   asm() statement with a call instruction must declare the use of the
>> -   stack pointer in its output operand.  For example, on x86_64:
>> +   uses an inline asm() statement which has a "call" instruction.  To
>> +   fix this, either:
>>
>> -     register void *__sp asm("rsp");
>> -     asm volatile("call func" : "+r" (__sp));
>> +     a) list the stack pointer as an input/output constraint;
>> +
>> +        register void *__sp asm("rsp");
>> +        asm volatile("call func" : "+r" (__sp));
>> +
>> +     or
>> +
>> +     b) declare a global register variable for the stack pointer.
>> +
>> +       arch/x86/include/asm/asm.h:register unsigned int __sp_unused asm("esp");
>> +
>> +       This is the strategy used on x86_64.  It ensures that GCC always
>> +       sets up the frame pointer before inserting *any* inline asm.
>>
>>     Otherwise the stack frame may not get created before the call.
Tested-by: Alexander Potapenko <glider@google.com>
>> --
>> 2.13.5
>>
>
>
>
> --
> Alexander Potapenko
> Software Engineer
>
> Google Germany GmbH
> Erika-Mann-Straße, 33
> 80636 München
>
> Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
> Registergericht und -nummer: Hamburg, HRB 86891
> Sitz der Gesellschaft: Hamburg



-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang
  2017-09-19 18:45 ` [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang Josh Poimboeuf
  2017-09-19 21:55   ` Alexander Potapenko
@ 2017-09-20  1:18   ` Josh Poimboeuf
  2017-09-20 17:54     ` Josh Poimboeuf
  2017-09-20 17:32   ` H. Peter Anvin
  2 siblings, 1 reply; 20+ messages in thread
From: Josh Poimboeuf @ 2017-09-20  1:18 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Andy Lutomirski, Linus Torvalds, Alexander Potapenko,
	Dmitriy Vyukov, Matthias Kaehlcke, Arnd Bergmann, Peter Zijlstra,
	Andrey Ryabinin

On Tue, Sep 19, 2017 at 01:45:28PM -0500, Josh Poimboeuf wrote:
> For inline asm statements which have a CALL instruction, we list the
> stack pointer as a constraint to convince GCC to ensure the frame
> pointer is set up first:
> 
>   static inline void foo()
>   {
>   	register void *__sp asm(_ASM_SP);
>   	asm("call bar" : "+r" (__sp))
>   }
> 
> Unfortunately, that pattern causes clang to corrupt the stack pointer.
> 
> There's actually an easier way to achieve the same goal in GCC, without
> causing trouble for clang.  If we declare the stack pointer register
> variable as a global variable, and remove the constraint altogether,
> that convinces GCC to always set up the frame pointer before inserting
> *any* inline asm.
> 
> It basically acts as if *every* inline asm statement has a CALL
> instruction.  It's a bit overkill, but the performance impact should be
> negligible.
> 
> Here are the vmlinux .text size differences with the following configs
> on GCC:
> 
> - defconfig
> - defconfig without frame pointers
> - Fedora distro config
> - Fedora distro config without frame pointers
> 
> 	defconfig	defconfig-nofp	distro		distro-nofp
> before	9796300		9466764		9076191		8789745
> after	9796941		9462859		9076381		8785325
> 
> With frame pointers, the text size increases slightly.  Without frame
> pointers, the text size decreases, and a little more significantly.
> 
> Reported-by: Matthias Kaehlcke <mka@chromium.org>
> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>

NAK - kbuild bot is reporting some cases where this patch doesn't force
the frame pointer setup.

-- 
Josh

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang
  2017-09-19 22:25     ` Alexander Potapenko
@ 2017-09-20  1:47       ` Josh Poimboeuf
  0 siblings, 0 replies; 20+ messages in thread
From: Josh Poimboeuf @ 2017-09-20  1:47 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: x86, LKML, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Andy Lutomirski, Linus Torvalds, Dmitriy Vyukov,
	Matthias Kaehlcke, Arnd Bergmann, Peter Zijlstra,
	Andrey Ryabinin

On Tue, Sep 19, 2017 at 03:25:59PM -0700, Alexander Potapenko wrote:
> On Tue, Sep 19, 2017 at 2:55 PM, Alexander Potapenko <glider@google.com> wrote:
> > On Tue, Sep 19, 2017 at 11:45 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> >> For inline asm statements which have a CALL instruction, we list the
> >> stack pointer as a constraint to convince GCC to ensure the frame
> >> pointer is set up first:
> >>
> >>   static inline void foo()
> >>   {
> >>         register void *__sp asm(_ASM_SP);
> >>         asm("call bar" : "+r" (__sp))
> >>   }
> >>
> >> Unfortunately, that pattern causes clang to corrupt the stack pointer.
> >>
> >> There's actually an easier way to achieve the same goal in GCC, without
> >> causing trouble for clang.  If we declare the stack pointer register
> >> variable as a global variable, and remove the constraint altogether,
> >> that convinces GCC to always set up the frame pointer before inserting
> >> *any* inline asm.
> >>
> >> It basically acts as if *every* inline asm statement has a CALL
> >> instruction.  It's a bit overkill, but the performance impact should be
> >> negligible.
> >>
> >> Here are the vmlinux .text size differences with the following configs
> >> on GCC:
> >>
> >> - defconfig
> >> - defconfig without frame pointers
> >> - Fedora distro config
> >> - Fedora distro config without frame pointers
> >>
> >>         defconfig       defconfig-nofp  distro          distro-nofp
> >> before  9796300         9466764         9076191         8789745
> >> after   9796941         9462859         9076381         8785325
> >>
> >> With frame pointers, the text size increases slightly.  Without frame
> >> pointers, the text size decreases, and a little more significantly.
> >>
> >> Reported-by: Matthias Kaehlcke <mka@chromium.org>
> >> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
> >> ---
> >>  arch/x86/include/asm/alternative.h               |  3 +--
> >>  arch/x86/include/asm/asm.h                       |  9 ++++++++
> >>  arch/x86/include/asm/mshyperv.h                  | 27 ++++++++++--------------
> >>  arch/x86/include/asm/paravirt_types.h            | 14 ++++++------
> >>  arch/x86/include/asm/preempt.h                   | 15 +++++--------
> >>  arch/x86/include/asm/processor.h                 |  6 ++----
> >>  arch/x86/include/asm/rwsem.h                     |  6 +++---
> >>  arch/x86/include/asm/uaccess.h                   |  5 ++---
> >>  arch/x86/include/asm/xen/hypercall.h             |  5 ++---
> >>  arch/x86/kvm/emulate.c                           |  3 +--
> >>  arch/x86/kvm/vmx.c                               |  4 +---
> >>  arch/x86/mm/fault.c                              |  3 +--
> >>  tools/objtool/Documentation/stack-validation.txt | 20 +++++++++++++-----
> >>  13 files changed, 60 insertions(+), 60 deletions(-)
> >>
> >> diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
> >> index 1b020381ab38..7aeb1cef4204 100644
> >> --- a/arch/x86/include/asm/alternative.h
> >> +++ b/arch/x86/include/asm/alternative.h
> >> @@ -218,10 +218,9 @@ static inline int alternatives_text_reserved(void *start, void *end)
> >>  #define alternative_call_2(oldfunc, newfunc1, feature1, newfunc2, feature2,   \
> >>                            output, input...)                                  \
> >>  {                                                                            \
> >> -       register void *__sp asm(_ASM_SP);                                     \
> >>         asm volatile (ALTERNATIVE_2("call %P[old]", "call %P[new1]", feature1,\
> >>                 "call %P[new2]", feature2)                                    \
> >> -               : output, "+r" (__sp)                                         \
> >> +               : output                                                      \
> >>                 : [old] "i" (oldfunc), [new1] "i" (newfunc1),                 \
> >>                   [new2] "i" (newfunc2), ## input);                           \
> >>  }
> >> diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h
> >> index 676ee5807d86..ff8921d4615b 100644
> >> --- a/arch/x86/include/asm/asm.h
> >> +++ b/arch/x86/include/asm/asm.h
> >> @@ -132,4 +132,13 @@
> >>  /* For C file, we already have NOKPROBE_SYMBOL macro */
> >>  #endif
> >>
> >> +#ifndef __ASSEMBLY__
> >> +/*
> >> + * This variable declaration has the side effect of forcing GCC to always set
> >> + * up the frame pointer before inserting any inline asm.  This is necessary
> >> + * because some inline asm statements have CALL instructions.
> >> + */
> >> +register unsigned int __sp_unused asm("esp");

> > Shouldn't this be "asm(_ASM_SP)"?

> Answering my own question, this can't be _ASM_SP because of the
> realmode code, which is built with -m16 whereas _ASM_SP is "rsp".
> The patch works fine with "esp" though - most certainly declaring a
> ESP variable is enough to reserve the whole RSP in an x86_64 build.

Right, clang failing to build the realmode code was exactly why I did
that.

-- 
Josh

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang
  2017-09-19 18:45 ` [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang Josh Poimboeuf
  2017-09-19 21:55   ` Alexander Potapenko
  2017-09-20  1:18   ` Josh Poimboeuf
@ 2017-09-20 17:32   ` H. Peter Anvin
  2017-09-20 17:38     ` Dmitry Vyukov
  2017-09-20 17:51     ` Josh Poimboeuf
  2 siblings, 2 replies; 20+ messages in thread
From: H. Peter Anvin @ 2017-09-20 17:32 UTC (permalink / raw)
  To: Josh Poimboeuf, x86
  Cc: linux-kernel, Ingo Molnar, Thomas Gleixner, Andy Lutomirski,
	Linus Torvalds, Alexander Potapenko, Dmitriy Vyukov,
	Matthias Kaehlcke, Arnd Bergmann, Peter Zijlstra,
	Andrey Ryabinin

On 09/19/17 11:45, Josh Poimboeuf wrote:
> For inline asm statements which have a CALL instruction, we list the
> stack pointer as a constraint to convince GCC to ensure the frame
> pointer is set up first:
> 
>   static inline void foo()
>   {
>   	register void *__sp asm(_ASM_SP);
>   	asm("call bar" : "+r" (__sp))
>   }
> 
> Unfortunately, that pattern causes clang to corrupt the stack pointer.
> 
> There's actually an easier way to achieve the same goal in GCC, without
> causing trouble for clang.  If we declare the stack pointer register
> variable as a global variable, and remove the constraint altogether,
> that convinces GCC to always set up the frame pointer before inserting
> *any* inline asm.
> 
> It basically acts as if *every* inline asm statement has a CALL
> instruction.  It's a bit overkill, but the performance impact should be
> negligible.
> 

Again, probably negligible, but why do we need a frame pointer just
because we have a call assembly instruction?

	-hpa

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang
  2017-09-20 17:32   ` H. Peter Anvin
@ 2017-09-20 17:38     ` Dmitry Vyukov
  2017-09-20 17:46       ` H. Peter Anvin
  2017-09-20 17:51     ` Josh Poimboeuf
  1 sibling, 1 reply; 20+ messages in thread
From: Dmitry Vyukov @ 2017-09-20 17:38 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Josh Poimboeuf, x86, LKML, Ingo Molnar, Thomas Gleixner,
	Andy Lutomirski, Linus Torvalds, Alexander Potapenko,
	Matthias Kaehlcke, Arnd Bergmann, Peter Zijlstra,
	Andrey Ryabinin

On Wed, Sep 20, 2017 at 7:32 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 09/19/17 11:45, Josh Poimboeuf wrote:
>> For inline asm statements which have a CALL instruction, we list the
>> stack pointer as a constraint to convince GCC to ensure the frame
>> pointer is set up first:
>>
>>   static inline void foo()
>>   {
>>       register void *__sp asm(_ASM_SP);
>>       asm("call bar" : "+r" (__sp))
>>   }
>>
>> Unfortunately, that pattern causes clang to corrupt the stack pointer.
>>
>> There's actually an easier way to achieve the same goal in GCC, without
>> causing trouble for clang.  If we declare the stack pointer register
>> variable as a global variable, and remove the constraint altogether,
>> that convinces GCC to always set up the frame pointer before inserting
>> *any* inline asm.
>>
>> It basically acts as if *every* inline asm statement has a CALL
>> instruction.  It's a bit overkill, but the performance impact should be
>> negligible.
>>
>
> Again, probably negligible, but why do we need a frame pointer just
> because we have a call assembly instruction?

I think we need just the frame itself and RSP pointing below this
frame. If we don't have a frame, CALL instruction will smash whatever
RSP happens to point to. Compiler doesn't have to setup RSP to point
below used part of stack in leaf functions.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang
  2017-09-20 17:38     ` Dmitry Vyukov
@ 2017-09-20 17:46       ` H. Peter Anvin
  2017-09-20 18:01         ` Dmitry Vyukov
  0 siblings, 1 reply; 20+ messages in thread
From: H. Peter Anvin @ 2017-09-20 17:46 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Josh Poimboeuf, x86, LKML, Ingo Molnar, Thomas Gleixner,
	Andy Lutomirski, Linus Torvalds, Alexander Potapenko,
	Matthias Kaehlcke, Arnd Bergmann, Peter Zijlstra,
	Andrey Ryabinin

On 09/20/17 10:38, Dmitry Vyukov wrote:
> 
> I think we need just the frame itself and RSP pointing below this
> frame. If we don't have a frame, CALL instruction will smash whatever
> RSP happens to point to. Compiler doesn't have to setup RSP to point
> below used part of stack in leaf functions.
> 

In the kernel it does.  Redzoning is not allowed in the kernel, because
interrupts or exceptions would also smash the redzone.

	-hpa

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang
  2017-09-20 17:32   ` H. Peter Anvin
  2017-09-20 17:38     ` Dmitry Vyukov
@ 2017-09-20 17:51     ` Josh Poimboeuf
  2017-09-21 15:35       ` Ingo Molnar
  1 sibling, 1 reply; 20+ messages in thread
From: Josh Poimboeuf @ 2017-09-20 17:51 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: x86, linux-kernel, Ingo Molnar, Thomas Gleixner, Andy Lutomirski,
	Linus Torvalds, Alexander Potapenko, Dmitriy Vyukov,
	Matthias Kaehlcke, Arnd Bergmann, Peter Zijlstra,
	Andrey Ryabinin

On Wed, Sep 20, 2017 at 10:32:43AM -0700, H. Peter Anvin wrote:
> On 09/19/17 11:45, Josh Poimboeuf wrote:
> > For inline asm statements which have a CALL instruction, we list the
> > stack pointer as a constraint to convince GCC to ensure the frame
> > pointer is set up first:
> > 
> >   static inline void foo()
> >   {
> >   	register void *__sp asm(_ASM_SP);
> >   	asm("call bar" : "+r" (__sp))
> >   }
> > 
> > Unfortunately, that pattern causes clang to corrupt the stack pointer.
> > 
> > There's actually an easier way to achieve the same goal in GCC, without
> > causing trouble for clang.  If we declare the stack pointer register
> > variable as a global variable, and remove the constraint altogether,
> > that convinces GCC to always set up the frame pointer before inserting
> > *any* inline asm.
> > 
> > It basically acts as if *every* inline asm statement has a CALL
> > instruction.  It's a bit overkill, but the performance impact should be
> > negligible.
> > 
> 
> Again, probably negligible, but why do we need a frame pointer just
> because we have a call assembly instruction?

It's frame pointer convention.  Without it, if dumping the stack from
the called function, a function will get skipped in the stack trace.

-- 
Josh

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang
  2017-09-20  1:18   ` Josh Poimboeuf
@ 2017-09-20 17:54     ` Josh Poimboeuf
  0 siblings, 0 replies; 20+ messages in thread
From: Josh Poimboeuf @ 2017-09-20 17:54 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Andy Lutomirski, Linus Torvalds, Alexander Potapenko,
	Dmitriy Vyukov, Matthias Kaehlcke, Arnd Bergmann, Peter Zijlstra,
	Andrey Ryabinin

On Tue, Sep 19, 2017 at 08:18:23PM -0500, Josh Poimboeuf wrote:
> On Tue, Sep 19, 2017 at 01:45:28PM -0500, Josh Poimboeuf wrote:
> > For inline asm statements which have a CALL instruction, we list the
> > stack pointer as a constraint to convince GCC to ensure the frame
> > pointer is set up first:
> > 
> >   static inline void foo()
> >   {
> >   	register void *__sp asm(_ASM_SP);
> >   	asm("call bar" : "+r" (__sp))
> >   }
> > 
> > Unfortunately, that pattern causes clang to corrupt the stack pointer.
> > 
> > There's actually an easier way to achieve the same goal in GCC, without
> > causing trouble for clang.  If we declare the stack pointer register
> > variable as a global variable, and remove the constraint altogether,
> > that convinces GCC to always set up the frame pointer before inserting
> > *any* inline asm.
> > 
> > It basically acts as if *every* inline asm statement has a CALL
> > instruction.  It's a bit overkill, but the performance impact should be
> > negligible.
> > 
> > Here are the vmlinux .text size differences with the following configs
> > on GCC:
> > 
> > - defconfig
> > - defconfig without frame pointers
> > - Fedora distro config
> > - Fedora distro config without frame pointers
> > 
> > 	defconfig	defconfig-nofp	distro		distro-nofp
> > before	9796300		9466764		9076191		8789745
> > after	9796941		9462859		9076381		8785325
> > 
> > With frame pointers, the text size increases slightly.  Without frame
> > pointers, the text size decreases, and a little more significantly.
> > 
> > Reported-by: Matthias Kaehlcke <mka@chromium.org>
> > Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
> 
> NAK - kbuild bot is reporting some cases where this patch doesn't force
> the frame pointer setup.

So it turns out that for GCC 7, it works as described above: the global
register variable results in the frame pointer getting set up before
*all* inline asm.

But for GCC 6, it doesn't work that way.  The global register variable
has no such effect.

So we need both the global register variable *and* the output constraint
after all.  Will post another patch after some more testing.

-- 
Josh

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang
  2017-09-20 17:46       ` H. Peter Anvin
@ 2017-09-20 18:01         ` Dmitry Vyukov
  2017-09-20 21:07           ` Josh Poimboeuf
  0 siblings, 1 reply; 20+ messages in thread
From: Dmitry Vyukov @ 2017-09-20 18:01 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Josh Poimboeuf, x86, LKML, Ingo Molnar, Thomas Gleixner,
	Andy Lutomirski, Linus Torvalds, Alexander Potapenko,
	Matthias Kaehlcke, Arnd Bergmann, Peter Zijlstra,
	Andrey Ryabinin

On Wed, Sep 20, 2017 at 7:46 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 09/20/17 10:38, Dmitry Vyukov wrote:
>>
>> I think we need just the frame itself and RSP pointing below this
>> frame. If we don't have a frame, CALL instruction will smash whatever
>> RSP happens to point to. Compiler doesn't have to setup RSP to point
>> below used part of stack in leaf functions.
>>
>
> In the kernel it does.  Redzoning is not allowed in the kernel, because
> interrupts or exceptions would also smash the redzone.

I see... But it's the same for user-space signals, the first thing a
signal should do is to skip the redzone. I guess interrupt handlers
should switch to interrupt stack which avoids smashing redzone
altogether. Do you mean nested interrupts/exceptions in interrupts?
In my experience frames in leaf functions can have pretty large
performance penalty. Wonder if we have we considered changing
interrupt/exception handlers to avoid smashing redzones and disable
leaf frames?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang
  2017-09-20 18:01         ` Dmitry Vyukov
@ 2017-09-20 21:07           ` Josh Poimboeuf
  2017-09-20 21:19             ` Andy Lutomirski
  0 siblings, 1 reply; 20+ messages in thread
From: Josh Poimboeuf @ 2017-09-20 21:07 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: H. Peter Anvin, x86, LKML, Ingo Molnar, Thomas Gleixner,
	Andy Lutomirski, Linus Torvalds, Alexander Potapenko,
	Matthias Kaehlcke, Arnd Bergmann, Peter Zijlstra,
	Andrey Ryabinin

On Wed, Sep 20, 2017 at 08:01:02PM +0200, Dmitry Vyukov wrote:
> On Wed, Sep 20, 2017 at 7:46 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> > On 09/20/17 10:38, Dmitry Vyukov wrote:
> >>
> >> I think we need just the frame itself and RSP pointing below this
> >> frame. If we don't have a frame, CALL instruction will smash whatever
> >> RSP happens to point to. Compiler doesn't have to setup RSP to point
> >> below used part of stack in leaf functions.
> >>
> >
> > In the kernel it does.  Redzoning is not allowed in the kernel, because
> > interrupts or exceptions would also smash the redzone.
> 
> I see... But it's the same for user-space signals, the first thing a
> signal should do is to skip the redzone. I guess interrupt handlers
> should switch to interrupt stack which avoids smashing redzone
> altogether. Do you mean nested interrupts/exceptions in interrupts?
> In my experience frames in leaf functions can have pretty large
> performance penalty. Wonder if we have we considered changing
> interrupt/exception handlers to avoid smashing redzones and disable
> leaf frames?

Currently, on x86-64, I believe all exceptions have their own dedicated
stacks in the kernel, but IRQs still come in on the task's kernel stack.

Andy, do you know if there's a reason why IRQs don't use a dedicated IST
stack?

-- 
Josh

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang
  2017-09-20 21:07           ` Josh Poimboeuf
@ 2017-09-20 21:19             ` Andy Lutomirski
  2017-09-21  8:12               ` Dmitry Vyukov
  0 siblings, 1 reply; 20+ messages in thread
From: Andy Lutomirski @ 2017-09-20 21:19 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Dmitry Vyukov, H. Peter Anvin, x86, LKML, Ingo Molnar,
	Thomas Gleixner, Andy Lutomirski, Linus Torvalds,
	Alexander Potapenko, Matthias Kaehlcke, Arnd Bergmann,
	Peter Zijlstra, Andrey Ryabinin



> On Sep 20, 2017, at 2:07 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> 
>> On Wed, Sep 20, 2017 at 08:01:02PM +0200, Dmitry Vyukov wrote:
>>> On Wed, Sep 20, 2017 at 7:46 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>>>> On 09/20/17 10:38, Dmitry Vyukov wrote:
>>>> 
>>>> I think we need just the frame itself and RSP pointing below this
>>>> frame. If we don't have a frame, CALL instruction will smash whatever
>>>> RSP happens to point to. Compiler doesn't have to setup RSP to point
>>>> below used part of stack in leaf functions.
>>>> 
>>> 
>>> In the kernel it does.  Redzoning is not allowed in the kernel, because
>>> interrupts or exceptions would also smash the redzone.
>> 
>> I see... But it's the same for user-space signals, the first thing a
>> signal should do is to skip the redzone. I guess interrupt handlers
>> should switch to interrupt stack which avoids smashing redzone
>> altogether. Do you mean nested interrupts/exceptions in interrupts?
>> In my experience frames in leaf functions can have pretty large
>> performance penalty. Wonder if we have we considered changing
>> interrupt/exception handlers to avoid smashing redzones and disable
>> leaf frames?
> 
> Currently, on x86-64, I believe all exceptions have their own dedicated
> stacks in the kernel, but IRQs still come in on the task's kernel stack.
> 
> Andy, do you know if there's a reason why IRQs don't use a dedicated IST
> stack?
> 

Because IST is awful due to recursion issues.  We immediately switch to an IRQ stack, though.

If the kernel wanted a redzone, it would have to use IST for everything, which would entail a bunch of unpleasant hackery.

> -- 
> Josh

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang
  2017-09-20 21:19             ` Andy Lutomirski
@ 2017-09-21  8:12               ` Dmitry Vyukov
  2017-09-21 11:52                 ` Brian Gerst
  0 siblings, 1 reply; 20+ messages in thread
From: Dmitry Vyukov @ 2017-09-21  8:12 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Josh Poimboeuf, H. Peter Anvin, x86, LKML, Ingo Molnar,
	Thomas Gleixner, Andy Lutomirski, Linus Torvalds,
	Alexander Potapenko, Matthias Kaehlcke, Arnd Bergmann,
	Peter Zijlstra, Andrey Ryabinin

On Wed, Sep 20, 2017 at 11:19 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>>> On Wed, Sep 20, 2017 at 08:01:02PM +0200, Dmitry Vyukov wrote:
>>>> On Wed, Sep 20, 2017 at 7:46 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>>>>> On 09/20/17 10:38, Dmitry Vyukov wrote:
>>>>>
>>>>> I think we need just the frame itself and RSP pointing below this
>>>>> frame. If we don't have a frame, CALL instruction will smash whatever
>>>>> RSP happens to point to. Compiler doesn't have to setup RSP to point
>>>>> below used part of stack in leaf functions.
>>>>>
>>>>
>>>> In the kernel it does.  Redzoning is not allowed in the kernel, because
>>>> interrupts or exceptions would also smash the redzone.
>>>
>>> I see... But it's the same for user-space signals, the first thing a
>>> signal should do is to skip the redzone. I guess interrupt handlers
>>> should switch to interrupt stack which avoids smashing redzone
>>> altogether. Do you mean nested interrupts/exceptions in interrupts?
>>> In my experience frames in leaf functions can have pretty large
>>> performance penalty. Wonder if we have we considered changing
>>> interrupt/exception handlers to avoid smashing redzones and disable
>>> leaf frames?
>>
>> Currently, on x86-64, I believe all exceptions have their own dedicated
>> stacks in the kernel, but IRQs still come in on the task's kernel stack.
>>
>> Andy, do you know if there's a reason why IRQs don't use a dedicated IST
>> stack?
>>
>
> Because IST is awful due to recursion issues.  We immediately switch to an IRQ stack, though.
>
> If the kernel wanted a redzone, it would have to use IST for everything, which would entail a bunch of unpleasant hackery.

Thanks.

I guess it must be finite recursion, because we could not handle
infinite with finite stack. I thing that solves it is simply:

sub $256, %rsp
... do stuff ...
add $256, %rsp

Don't know if it's applicable to interrupts or not.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang
  2017-09-21  8:12               ` Dmitry Vyukov
@ 2017-09-21 11:52                 ` Brian Gerst
  2017-09-21 12:18                   ` Dmitry Vyukov
  0 siblings, 1 reply; 20+ messages in thread
From: Brian Gerst @ 2017-09-21 11:52 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Andy Lutomirski, Josh Poimboeuf, H. Peter Anvin, x86, LKML,
	Ingo Molnar, Thomas Gleixner, Andy Lutomirski, Linus Torvalds,
	Alexander Potapenko, Matthias Kaehlcke, Arnd Bergmann,
	Peter Zijlstra, Andrey Ryabinin

On Thu, Sep 21, 2017 at 4:12 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Wed, Sep 20, 2017 at 11:19 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>>>> On Wed, Sep 20, 2017 at 08:01:02PM +0200, Dmitry Vyukov wrote:
>>>>> On Wed, Sep 20, 2017 at 7:46 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>>>>>> On 09/20/17 10:38, Dmitry Vyukov wrote:
>>>>>>
>>>>>> I think we need just the frame itself and RSP pointing below this
>>>>>> frame. If we don't have a frame, CALL instruction will smash whatever
>>>>>> RSP happens to point to. Compiler doesn't have to setup RSP to point
>>>>>> below used part of stack in leaf functions.
>>>>>>
>>>>>
>>>>> In the kernel it does.  Redzoning is not allowed in the kernel, because
>>>>> interrupts or exceptions would also smash the redzone.
>>>>
>>>> I see... But it's the same for user-space signals, the first thing a
>>>> signal should do is to skip the redzone. I guess interrupt handlers
>>>> should switch to interrupt stack which avoids smashing redzone
>>>> altogether. Do you mean nested interrupts/exceptions in interrupts?
>>>> In my experience frames in leaf functions can have pretty large
>>>> performance penalty. Wonder if we have we considered changing
>>>> interrupt/exception handlers to avoid smashing redzones and disable
>>>> leaf frames?
>>>
>>> Currently, on x86-64, I believe all exceptions have their own dedicated
>>> stacks in the kernel, but IRQs still come in on the task's kernel stack.
>>>
>>> Andy, do you know if there's a reason why IRQs don't use a dedicated IST
>>> stack?
>>>
>>
>> Because IST is awful due to recursion issues.  We immediately switch to an IRQ stack, though.
>>
>> If the kernel wanted a redzone, it would have to use IST for everything, which would entail a bunch of unpleasant hackery.
>
> Thanks.
>
> I guess it must be finite recursion, because we could not handle
> infinite with finite stack. I thing that solves it is simply:
>
> sub $256, %rsp
> ... do stuff ...
> add $256, %rsp
>
> Don't know if it's applicable to interrupts or not.

No, it is not.  The processor pushes 5 or 6 words of data on the stack
(the IRET frame plus an error code for certain exceptions) before the
interrupt handler gets control.  So without using the IST for stack
switching on every interrupt, the redzone cannot be used in the kernel
as it will get smashed by the IRET frame.  In addition, since the
kernel's stack is limited in size, skipping 128 bytes on every
interrupt would overrun the stack faster.  The small gain from using
the redzone in the kernel is outweighed by these limitations.

--
Brian Gerst

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang
  2017-09-21 11:52                 ` Brian Gerst
@ 2017-09-21 12:18                   ` Dmitry Vyukov
  0 siblings, 0 replies; 20+ messages in thread
From: Dmitry Vyukov @ 2017-09-21 12:18 UTC (permalink / raw)
  To: Brian Gerst
  Cc: Andy Lutomirski, Josh Poimboeuf, H. Peter Anvin, x86, LKML,
	Ingo Molnar, Thomas Gleixner, Andy Lutomirski, Linus Torvalds,
	Alexander Potapenko, Matthias Kaehlcke, Arnd Bergmann,
	Peter Zijlstra, Andrey Ryabinin

On Thu, Sep 21, 2017 at 1:52 PM, Brian Gerst <brgerst@gmail.com> wrote:
>>>>>>> I think we need just the frame itself and RSP pointing below this
>>>>>>> frame. If we don't have a frame, CALL instruction will smash whatever
>>>>>>> RSP happens to point to. Compiler doesn't have to setup RSP to point
>>>>>>> below used part of stack in leaf functions.
>>>>>>>
>>>>>>
>>>>>> In the kernel it does.  Redzoning is not allowed in the kernel, because
>>>>>> interrupts or exceptions would also smash the redzone.
>>>>>
>>>>> I see... But it's the same for user-space signals, the first thing a
>>>>> signal should do is to skip the redzone. I guess interrupt handlers
>>>>> should switch to interrupt stack which avoids smashing redzone
>>>>> altogether. Do you mean nested interrupts/exceptions in interrupts?
>>>>> In my experience frames in leaf functions can have pretty large
>>>>> performance penalty. Wonder if we have we considered changing
>>>>> interrupt/exception handlers to avoid smashing redzones and disable
>>>>> leaf frames?
>>>>
>>>> Currently, on x86-64, I believe all exceptions have their own dedicated
>>>> stacks in the kernel, but IRQs still come in on the task's kernel stack.
>>>>
>>>> Andy, do you know if there's a reason why IRQs don't use a dedicated IST
>>>> stack?
>>>>
>>>
>>> Because IST is awful due to recursion issues.  We immediately switch to an IRQ stack, though.
>>>
>>> If the kernel wanted a redzone, it would have to use IST for everything, which would entail a bunch of unpleasant hackery.
>>
>> Thanks.
>>
>> I guess it must be finite recursion, because we could not handle
>> infinite with finite stack. I thing that solves it is simply:
>>
>> sub $256, %rsp
>> ... do stuff ...
>> add $256, %rsp
>>
>> Don't know if it's applicable to interrupts or not.
>
> No, it is not.  The processor pushes 5 or 6 words of data on the stack
> (the IRET frame plus an error code for certain exceptions) before the
> interrupt handler gets control.  So without using the IST for stack
> switching on every interrupt, the redzone cannot be used in the kernel
> as it will get smashed by the IRET frame.  In addition, since the
> kernel's stack is limited in size, skipping 128 bytes on every
> interrupt would overrun the stack faster.  The small gain from using
> the redzone in the kernel is outweighed by these limitations.

I see, thanks for educating.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang
  2017-09-20 17:51     ` Josh Poimboeuf
@ 2017-09-21 15:35       ` Ingo Molnar
  2017-09-21 16:18         ` Josh Poimboeuf
  0 siblings, 1 reply; 20+ messages in thread
From: Ingo Molnar @ 2017-09-21 15:35 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: H. Peter Anvin, x86, linux-kernel, Thomas Gleixner,
	Andy Lutomirski, Linus Torvalds, Alexander Potapenko,
	Dmitriy Vyukov, Matthias Kaehlcke, Arnd Bergmann, Peter Zijlstra,
	Andrey Ryabinin


* Josh Poimboeuf <jpoimboe@redhat.com> wrote:

> On Wed, Sep 20, 2017 at 10:32:43AM -0700, H. Peter Anvin wrote:
> > On 09/19/17 11:45, Josh Poimboeuf wrote:
> > > For inline asm statements which have a CALL instruction, we list the
> > > stack pointer as a constraint to convince GCC to ensure the frame
> > > pointer is set up first:
> > > 
> > >   static inline void foo()
> > >   {
> > >   	register void *__sp asm(_ASM_SP);
> > >   	asm("call bar" : "+r" (__sp))
> > >   }
> > > 
> > > Unfortunately, that pattern causes clang to corrupt the stack pointer.
> > > 
> > > There's actually an easier way to achieve the same goal in GCC, without
> > > causing trouble for clang.  If we declare the stack pointer register
> > > variable as a global variable, and remove the constraint altogether,
> > > that convinces GCC to always set up the frame pointer before inserting
> > > *any* inline asm.
> > > 
> > > It basically acts as if *every* inline asm statement has a CALL
> > > instruction.  It's a bit overkill, but the performance impact should be
> > > negligible.
> > > 
> > 
> > Again, probably negligible, but why do we need a frame pointer just
> > because we have a call assembly instruction?
> 
> It's frame pointer convention.  Without it, if dumping the stack from
> the called function, a function will get skipped in the stack trace.

BTW., could we perhaps relax this and simply phase out the frame pointer on x86, 
and simplify all our assembly in a cycle or two? ORC unwinder is working out very 
well so far. Live kernel patching can use ORC data just fine, and nothing else 
actually relies on frame pointers, right?

That would give one more register to assembly code.

I realize that we just rewrote a whole bunch of assembly code... but that was the 
price for ORC, in a way.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang
  2017-09-21 15:35       ` Ingo Molnar
@ 2017-09-21 16:18         ` Josh Poimboeuf
  0 siblings, 0 replies; 20+ messages in thread
From: Josh Poimboeuf @ 2017-09-21 16:18 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: H. Peter Anvin, x86, linux-kernel, Thomas Gleixner,
	Andy Lutomirski, Linus Torvalds, Alexander Potapenko,
	Dmitriy Vyukov, Matthias Kaehlcke, Arnd Bergmann, Peter Zijlstra,
	Andrey Ryabinin

On Thu, Sep 21, 2017 at 05:35:11PM +0200, Ingo Molnar wrote:
> 
> * Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> 
> > On Wed, Sep 20, 2017 at 10:32:43AM -0700, H. Peter Anvin wrote:
> > > On 09/19/17 11:45, Josh Poimboeuf wrote:
> > > > For inline asm statements which have a CALL instruction, we list the
> > > > stack pointer as a constraint to convince GCC to ensure the frame
> > > > pointer is set up first:
> > > > 
> > > >   static inline void foo()
> > > >   {
> > > >   	register void *__sp asm(_ASM_SP);
> > > >   	asm("call bar" : "+r" (__sp))
> > > >   }
> > > > 
> > > > Unfortunately, that pattern causes clang to corrupt the stack pointer.
> > > > 
> > > > There's actually an easier way to achieve the same goal in GCC, without
> > > > causing trouble for clang.  If we declare the stack pointer register
> > > > variable as a global variable, and remove the constraint altogether,
> > > > that convinces GCC to always set up the frame pointer before inserting
> > > > *any* inline asm.
> > > > 
> > > > It basically acts as if *every* inline asm statement has a CALL
> > > > instruction.  It's a bit overkill, but the performance impact should be
> > > > negligible.
> > > > 
> > > 
> > > Again, probably negligible, but why do we need a frame pointer just
> > > because we have a call assembly instruction?
> > 
> > It's frame pointer convention.  Without it, if dumping the stack from
> > the called function, a function will get skipped in the stack trace.
> 
> BTW., could we perhaps relax this and simply phase out the frame pointer on x86, 
> and simplify all our assembly in a cycle or two? ORC unwinder is working out very 
> well so far. Live kernel patching can use ORC data just fine, and nothing else 
> actually relies on frame pointers, right?
> 
> That would give one more register to assembly code.
> 
> I realize that we just rewrote a whole bunch of assembly code... but that was the 
> price for ORC, in a way.

I think ORC isn't *quite* ready for livepatch yet, though it's close.
We could probably make it ready in a cycle or two.

However, I'm not sure whether we would want to remove livepatch support
for frame pointers yet:

- There might be some embedded livepatch users who can't deal with the
  extra 3MB of RAM needed by ORC.  (Though this is purely theoretical, I
  don't actually know anybody with this problem.  I suppose we could
  float the idea on the livepatch/kpatch mailing lists and see if
  anybody objects.)

- Objtool's ORC generation is working fine now, but objtool maintenance
  is currently a little heavy-handed, and I'm currently the only person
  who knows how to do it.  I've got an idea brewing for how to improve
  its maintainability with the help of compiler plugins, but until then,
  if I get hit by a bus tomorrow, who's going to pick it up?  It's nice
  to have frame pointers as a backup solution for live patching.

But also, is this a problem that's even worth fixing?  Do we have any
assembly code that would be noticeably better off with that extra
register?  Most of the changes were:

- adding FRAME_BEGIN/FRAME_END to some asm functions;
- juggling register usage in a few crypto functions; and
- adding the stack pointer constraint to 15 or so inline asm functions.

I think those changes weren't all that disruptive to start with.

-- 
Josh

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2017-09-21 16:18 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-19 18:45 [PATCH 0/2] x86: Fix inline asm call constraints for clang Josh Poimboeuf
2017-09-19 18:45 ` [PATCH 1/2] objtool: Handle another GCC stack pointer adjustment bug Josh Poimboeuf
2017-09-19 18:45 ` [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang Josh Poimboeuf
2017-09-19 21:55   ` Alexander Potapenko
2017-09-19 22:25     ` Alexander Potapenko
2017-09-20  1:47       ` Josh Poimboeuf
2017-09-20  1:18   ` Josh Poimboeuf
2017-09-20 17:54     ` Josh Poimboeuf
2017-09-20 17:32   ` H. Peter Anvin
2017-09-20 17:38     ` Dmitry Vyukov
2017-09-20 17:46       ` H. Peter Anvin
2017-09-20 18:01         ` Dmitry Vyukov
2017-09-20 21:07           ` Josh Poimboeuf
2017-09-20 21:19             ` Andy Lutomirski
2017-09-21  8:12               ` Dmitry Vyukov
2017-09-21 11:52                 ` Brian Gerst
2017-09-21 12:18                   ` Dmitry Vyukov
2017-09-20 17:51     ` Josh Poimboeuf
2017-09-21 15:35       ` Ingo Molnar
2017-09-21 16:18         ` Josh Poimboeuf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.