linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/4] x86/asm: Add ASM_CALL() macro for inline asms with call instructions
@ 2017-08-31 14:11 Josh Poimboeuf
  2017-08-31 14:11 ` [RFC PATCH 1/4] x86/paravirt: Fix output constraint macro names Josh Poimboeuf
                   ` (3 more replies)
  0 siblings, 4 replies; 26+ messages in thread
From: Josh Poimboeuf @ 2017-08-31 14:11 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Andy Lutomirski, Linus Torvalds, Alexander Potapenko,
	Dmitriy Vyukov, Matthias Kaehlcke, Arnd Bergmann, Peter Zijlstra

Making this RFC because I'm not sure if there's a better solution (like
maybe trying to convince the clang folks to support the undocumented GCC
syntax for this).  Opinions and better ideas welcome.


For inline asm statements which have a "call" instruction, we have to
set the stack pointer as a constraint to convince GCC to ensure the
frame pointer is set up first:

  register void *__sp asm(_ASM_SP);
  asm("call foo" : "+r" (__sp))

Clang doesn't have a known way to do the same thing.  Doing the sp
constraint thing causes it to corrupt the stack pointer.

Patch 4/4 adds a wrapper around such inline asm statements.

Before:
  register void *__sp asm(_ASM_SP);
  asm("call foo" : outputs, "+r" (__sp) : inputs : clobbers);

After:
  ASM_CALL("call foo", outputs, inputs, clobbers);


A limitation of the wrapper is that it doesn't support positional
operand names ("%0") and constraints ("0" (foo)).


The benefits of the wrapper are:

- Allows clang-built kernels to boot.

- Removes the stack pointer constraint with CONFIG_FRAME_POINTER=n
  (which may soon become the default).

- Will make it easy to handle if we ever get a documented way to do
  this.


NOTE: Patch 4/4 triggers a bug in the sparse preprocessor which causes
      the kbuild robot to complain.  I've reported the bug to the sparse
      mailing list.


Josh Poimboeuf (4):
  x86/paravirt: Fix output constraint macro names
  x86/asm: Convert some inline asm positional operands to named operands
  x86/asm: Make alternative macro interfaces more clear and consistent
  x86/asm: Use ASM_CALL() macro for inline asm statements with call
    instructions

 arch/x86/include/asm/alternative.h               |  72 ++++-------
 arch/x86/include/asm/apic.h                      |   7 +-
 arch/x86/include/asm/arch_hweight.h              |  14 +-
 arch/x86/include/asm/atomic64_32.h               | 101 +++++++++------
 arch/x86/include/asm/cmpxchg_32.h                |  20 +--
 arch/x86/include/asm/mshyperv.h                  |  51 ++++----
 arch/x86/include/asm/page_64.h                   |   6 +-
 arch/x86/include/asm/paravirt_types.h            |  86 ++++++-------
 arch/x86/include/asm/percpu.h                    |  13 +-
 arch/x86/include/asm/preempt.h                   |  15 +--
 arch/x86/include/asm/processor.h                 |  41 +++---
 arch/x86/include/asm/rwsem.h                     | 155 ++++++++++++-----------
 arch/x86/include/asm/special_insns.h             |   7 +-
 arch/x86/include/asm/uaccess.h                   |  24 ++--
 arch/x86/include/asm/uaccess_64.h                |  16 +--
 arch/x86/include/asm/xen/hypercall.h             |  59 +++++----
 arch/x86/kvm/emulate.c                           |   9 +-
 arch/x86/kvm/vmx.c                               |  17 +--
 arch/x86/mm/fault.c                              |  13 +-
 include/linux/compiler-clang.h                   |   2 +
 include/linux/compiler-gcc.h                     |  19 +++
 include/linux/compiler.h                         |  34 +++++
 tools/objtool/Documentation/stack-validation.txt |  19 +--
 23 files changed, 427 insertions(+), 373 deletions(-)

-- 
2.13.5

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC PATCH 1/4] x86/paravirt: Fix output constraint macro names
  2017-08-31 14:11 [RFC PATCH 0/4] x86/asm: Add ASM_CALL() macro for inline asms with call instructions Josh Poimboeuf
@ 2017-08-31 14:11 ` Josh Poimboeuf
  2017-08-31 14:11 ` [RFC PATCH 2/4] x86/asm: Convert some inline asm positional operands to named operands Josh Poimboeuf
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 26+ messages in thread
From: Josh Poimboeuf @ 2017-08-31 14:11 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Andy Lutomirski, Linus Torvalds, Alexander Potapenko,
	Dmitriy Vyukov, Matthias Kaehlcke, Arnd Bergmann, Peter Zijlstra

Some of the paravirt '*_CLOBBERS' macros refer to output constraints
instead of clobbers, which makes the code extra confusing.  Rename the
output constraint related macros to '*_OUTPUTS'.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/include/asm/paravirt_types.h | 34 +++++++++++++++++-----------------
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 9ffc36bfe4cd..0d793ef08e3d 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -479,12 +479,12 @@ int paravirt_disable_iospace(void);
 #define PVOP_CALL_ARG2(x)		"d" ((unsigned long)(x))
 #define PVOP_CALL_ARG3(x)		"c" ((unsigned long)(x))
 
-#define PVOP_VCALL_CLOBBERS		"=a" (__eax), "=d" (__edx),	\
+#define PVOP_VCALL_OUTPUTS		"=a" (__eax), "=d" (__edx),	\
 					"=c" (__ecx)
-#define PVOP_CALL_CLOBBERS		PVOP_VCALL_CLOBBERS
+#define PVOP_CALL_OUTPUTS		PVOP_VCALL_OUTPUTS
 
-#define PVOP_VCALLEE_CLOBBERS		"=a" (__eax), "=d" (__edx)
-#define PVOP_CALLEE_CLOBBERS		PVOP_VCALLEE_CLOBBERS
+#define PVOP_VCALLEE_OUTPUTS		"=a" (__eax), "=d" (__edx)
+#define PVOP_CALLEE_OUTPUTS		PVOP_VCALLEE_OUTPUTS
 
 #define EXTRA_CLOBBERS
 #define VEXTRA_CLOBBERS
@@ -501,14 +501,14 @@ int paravirt_disable_iospace(void);
 #define PVOP_CALL_ARG3(x)		"d" ((unsigned long)(x))
 #define PVOP_CALL_ARG4(x)		"c" ((unsigned long)(x))
 
-#define PVOP_VCALL_CLOBBERS	"=D" (__edi),				\
+#define PVOP_VCALL_OUTPUTS	"=D" (__edi),				\
 				"=S" (__esi), "=d" (__edx),		\
 				"=c" (__ecx)
-#define PVOP_CALL_CLOBBERS	PVOP_VCALL_CLOBBERS, "=a" (__eax)
+#define PVOP_CALL_OUTPUTS	PVOP_VCALL_OUTPUTS, "=a" (__eax)
 
 /* void functions are still allowed [re]ax for scratch */
-#define PVOP_VCALLEE_CLOBBERS	"=a" (__eax)
-#define PVOP_CALLEE_CLOBBERS	PVOP_VCALLEE_CLOBBERS
+#define PVOP_VCALLEE_OUTPUTS	"=a" (__eax)
+#define PVOP_CALLEE_OUTPUTS	PVOP_VCALLEE_OUTPUTS
 
 #define EXTRA_CLOBBERS	 , "r8", "r9", "r10", "r11"
 #define VEXTRA_CLOBBERS	 , "rax", "r8", "r9", "r10", "r11"
@@ -532,7 +532,7 @@ int paravirt_disable_iospace(void);
 	})
 
 
-#define ____PVOP_CALL(rettype, op, clbr, call_clbr, extra_clbr,		\
+#define ____PVOP_CALL(rettype, op, clbr, outputs, extra_clbr,		\
 		      pre, post, ...)					\
 	({								\
 		rettype __ret;						\
@@ -544,7 +544,7 @@ int paravirt_disable_iospace(void);
 			asm volatile(pre				\
 				     paravirt_alt(PARAVIRT_CALL)	\
 				     post				\
-				     : call_clbr, "+r" (__sp)		\
+				     : outputs, "+r" (__sp)		\
 				     : paravirt_type(op),		\
 				       paravirt_clobber(clbr),		\
 				       ##__VA_ARGS__			\
@@ -554,7 +554,7 @@ int paravirt_disable_iospace(void);
 			asm volatile(pre				\
 				     paravirt_alt(PARAVIRT_CALL)	\
 				     post				\
-				     : call_clbr, "+r" (__sp)		\
+				     : outputs, "+r" (__sp)		\
 				     : paravirt_type(op),		\
 				       paravirt_clobber(clbr),		\
 				       ##__VA_ARGS__			\
@@ -565,23 +565,23 @@ int paravirt_disable_iospace(void);
 	})
 
 #define __PVOP_CALL(rettype, op, pre, post, ...)			\
-	____PVOP_CALL(rettype, op, CLBR_ANY, PVOP_CALL_CLOBBERS,	\
+	____PVOP_CALL(rettype, op, CLBR_ANY, PVOP_CALL_OUTPUTS,		\
 		      EXTRA_CLOBBERS, pre, post, ##__VA_ARGS__)
 
 #define __PVOP_CALLEESAVE(rettype, op, pre, post, ...)			\
 	____PVOP_CALL(rettype, op.func, CLBR_RET_REG,			\
-		      PVOP_CALLEE_CLOBBERS, ,				\
+		      PVOP_CALLEE_OUTPUTS, ,				\
 		      pre, post, ##__VA_ARGS__)
 
 
-#define ____PVOP_VCALL(op, clbr, call_clbr, extra_clbr, pre, post, ...)	\
+#define ____PVOP_VCALL(op, clbr, outputs, extra_clbr, pre, post, ...)	\
 	({								\
 		PVOP_VCALL_ARGS;					\
 		PVOP_TEST_NULL(op);					\
 		asm volatile(pre					\
 			     paravirt_alt(PARAVIRT_CALL)		\
 			     post					\
-			     : call_clbr, "+r" (__sp)			\
+			     : outputs, "+r" (__sp)			\
 			     : paravirt_type(op),			\
 			       paravirt_clobber(clbr),			\
 			       ##__VA_ARGS__				\
@@ -589,13 +589,13 @@ int paravirt_disable_iospace(void);
 	})
 
 #define __PVOP_VCALL(op, pre, post, ...)				\
-	____PVOP_VCALL(op, CLBR_ANY, PVOP_VCALL_CLOBBERS,		\
+	____PVOP_VCALL(op, CLBR_ANY, PVOP_VCALL_OUTPUTS,		\
 		       VEXTRA_CLOBBERS,					\
 		       pre, post, ##__VA_ARGS__)
 
 #define __PVOP_VCALLEESAVE(op, pre, post, ...)				\
 	____PVOP_VCALL(op.func, CLBR_RET_REG,				\
-		      PVOP_VCALLEE_CLOBBERS, ,				\
+		      PVOP_VCALLEE_OUTPUTS, ,				\
 		      pre, post, ##__VA_ARGS__)
 
 
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 2/4] x86/asm: Convert some inline asm positional operands to named operands
  2017-08-31 14:11 [RFC PATCH 0/4] x86/asm: Add ASM_CALL() macro for inline asms with call instructions Josh Poimboeuf
  2017-08-31 14:11 ` [RFC PATCH 1/4] x86/paravirt: Fix output constraint macro names Josh Poimboeuf
@ 2017-08-31 14:11 ` Josh Poimboeuf
  2017-08-31 14:11 ` [RFC PATCH 3/4] x86/asm: Make alternative macro interfaces more clear and consistent Josh Poimboeuf
  2017-08-31 14:11 ` [RFC PATCH 4/4] x86/asm: Use ASM_CALL() macro for inline asm statements with call instructions Josh Poimboeuf
  3 siblings, 0 replies; 26+ messages in thread
From: Josh Poimboeuf @ 2017-08-31 14:11 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Andy Lutomirski, Linus Torvalds, Alexander Potapenko,
	Dmitriy Vyukov, Matthias Kaehlcke, Arnd Bergmann, Peter Zijlstra

Convert some inline asm positional operands (e.g., %0, %1) to named
operands.

This is needed in preparation for the new ASM_CALL() macro, which won't
support the positional operands.

This also is generally a good idea anyway, as it makes the code more
readable and robust.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/include/asm/apic.h           |  6 +--
 arch/x86/include/asm/cmpxchg_32.h     |  4 +-
 arch/x86/include/asm/mshyperv.h       | 19 +++----
 arch/x86/include/asm/page_64.h        |  5 +-
 arch/x86/include/asm/paravirt_types.h |  4 +-
 arch/x86/include/asm/percpu.h         | 11 ++--
 arch/x86/include/asm/processor.h      | 20 ++++----
 arch/x86/include/asm/rwsem.h          | 94 ++++++++++++++++++-----------------
 arch/x86/include/asm/special_insns.h  |  6 +--
 arch/x86/include/asm/uaccess.h        |  6 +--
 arch/x86/include/asm/uaccess_64.h     | 16 +++---
 11 files changed, 99 insertions(+), 92 deletions(-)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index 5f01671c68f2..5a7e0eb38350 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -93,9 +93,9 @@ static inline void native_apic_mem_write(u32 reg, u32 v)
 {
 	volatile u32 *addr = (volatile u32 *)(APIC_BASE + reg);
 
-	alternative_io("movl %0, %P1", "xchgl %0, %P1", X86_BUG_11AP,
-		       ASM_OUTPUT2("=r" (v), "=m" (*addr)),
-		       ASM_OUTPUT2("0" (v), "m" (*addr)));
+	alternative_io("movl %[val], %P[reg]", "xchgl %[val], %P[reg]",
+		       X86_BUG_11AP,
+		       ASM_OUTPUT2([val] "+r" (v), [reg] "+m" (*addr)));
 }
 
 static inline u32 native_apic_mem_read(u32 reg)
diff --git a/arch/x86/include/asm/cmpxchg_32.h b/arch/x86/include/asm/cmpxchg_32.h
index e4959d023af8..8154a317899f 100644
--- a/arch/x86/include/asm/cmpxchg_32.h
+++ b/arch/x86/include/asm/cmpxchg_32.h
@@ -85,7 +85,7 @@ static inline u64 __cmpxchg64_local(volatile u64 *ptr, u64 old, u64 new)
 			"lock; cmpxchg8b (%%esi)" ,		\
 		       X86_FEATURE_CX8,				\
 		       "=A" (__ret),				\
-		       "S" ((ptr)), "0" (__old),		\
+		       "S" ((ptr)), "A" (__old),		\
 		       "b" ((unsigned int)__new),		\
 		       "c" ((unsigned int)(__new>>32))		\
 		       : "memory");				\
@@ -101,7 +101,7 @@ static inline u64 __cmpxchg64_local(volatile u64 *ptr, u64 old, u64 new)
 		       "cmpxchg8b (%%esi)" ,			\
 		       X86_FEATURE_CX8,				\
 		       "=A" (__ret),				\
-		       "S" ((ptr)), "0" (__old),		\
+		       "S" ((ptr)), "A" (__old),		\
 		       "b" ((unsigned int)__new),		\
 		       "c" ((unsigned int)(__new>>32))		\
 		       : "memory");				\
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 0d4b01c5e438..d0675d58fa32 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -183,11 +183,12 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
 	if (!hv_hypercall_pg)
 		return U64_MAX;
 
-	__asm__ __volatile__("mov %4, %%r8\n"
-			     "call *%5"
+	__asm__ __volatile__("mov %[out], %%r8\n"
+			     "call *%[pg]"
 			     : "=a" (hv_status), "+r" (__sp),
 			       "+c" (control), "+d" (input_address)
-			     :  "r" (output_address), "m" (hv_hypercall_pg)
+			     :  [out] "r" (output_address),
+			        [pg] "m" (hv_hypercall_pg)
 			     : "cc", "memory", "r8", "r9", "r10", "r11");
 #else
 	u32 input_address_hi = upper_32_bits(input_address);
@@ -198,13 +199,13 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
 	if (!hv_hypercall_pg)
 		return U64_MAX;
 
-	__asm__ __volatile__("call *%7"
+	__asm__ __volatile__("call *%[pg]"
 			     : "=A" (hv_status),
 			       "+c" (input_address_lo), "+r" (__sp)
 			     : "A" (control),
 			       "b" (input_address_hi),
 			       "D"(output_address_hi), "S"(output_address_lo),
-			       "m" (hv_hypercall_pg)
+			       [pg] "m" (hv_hypercall_pg)
 			     : "cc", "memory");
 #endif /* !x86_64 */
 	return hv_status;
@@ -226,10 +227,10 @@ static inline u64 hv_do_fast_hypercall8(u16 code, u64 input1)
 
 #ifdef CONFIG_X86_64
 	{
-		__asm__ __volatile__("call *%4"
+		__asm__ __volatile__("call *%[pg]"
 				     : "=a" (hv_status), "+r" (__sp),
 				       "+c" (control), "+d" (input1)
-				     : "m" (hv_hypercall_pg)
+				     : [pg] "m" (hv_hypercall_pg)
 				     : "cc", "r8", "r9", "r10", "r11");
 	}
 #else
@@ -237,13 +238,13 @@ static inline u64 hv_do_fast_hypercall8(u16 code, u64 input1)
 		u32 input1_hi = upper_32_bits(input1);
 		u32 input1_lo = lower_32_bits(input1);
 
-		__asm__ __volatile__ ("call *%5"
+		__asm__ __volatile__ ("call *%[pg]"
 				      : "=A"(hv_status),
 					"+c"(input1_lo),
 					"+r"(__sp)
 				      :	"A" (control),
 					"b" (input1_hi),
-					"m" (hv_hypercall_pg)
+					[pg] "m" (hv_hypercall_pg)
 				      : "cc", "edi", "esi");
 	}
 #endif
diff --git a/arch/x86/include/asm/page_64.h b/arch/x86/include/asm/page_64.h
index b50df06ad251..f7dbe752f80d 100644
--- a/arch/x86/include/asm/page_64.h
+++ b/arch/x86/include/asm/page_64.h
@@ -44,9 +44,8 @@ static inline void clear_page(void *page)
 	alternative_call_2(clear_page_orig,
 			   clear_page_rep, X86_FEATURE_REP_GOOD,
 			   clear_page_erms, X86_FEATURE_ERMS,
-			   "=D" (page),
-			   "0" (page)
-			   : "memory", "rax", "rcx");
+			   "+D" (page),
+			   ASM_NO_INPUT_CLOBBER("memory", "rax", "rcx"));
 }
 
 void copy_page(void *to, void *from);
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 0d793ef08e3d..a509259a3181 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -654,8 +654,8 @@ int paravirt_disable_iospace(void);
 #define PVOP_VCALL4(op, arg1, arg2, arg3, arg4)				\
 	__PVOP_VCALL(op,						\
 		    "push %[_arg4];", "lea 4(%%esp),%%esp;",		\
-		    "0" ((u32)(arg1)), "1" ((u32)(arg2)),		\
-		    "2" ((u32)(arg3)), [_arg4] "mr" ((u32)(arg4)))
+		    "a" ((u32)(arg1)), "d" ((u32)(arg2)),		\
+		    "c" ((u32)(arg3)), [_arg4] "mr" ((u32)(arg4)))
 #else
 #define PVOP_CALL4(rettype, op, arg1, arg2, arg3, arg4)			\
 	__PVOP_CALL(rettype, op, "", "",				\
diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index 9fa03604b2b3..2d1753758b0b 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -493,11 +493,14 @@ do {									\
 	bool __ret;							\
 	typeof(pcp1) __o1 = (o1), __n1 = (n1);				\
 	typeof(pcp2) __o2 = (o2), __n2 = (n2);				\
-	alternative_io("leaq %P1,%%rsi\n\tcall this_cpu_cmpxchg16b_emu\n\t", \
-		       "cmpxchg16b " __percpu_arg(1) "\n\tsetz %0\n\t",	\
+	alternative_io("leaq %P[p1], %%rsi\n\t"				\
+		       "call this_cpu_cmpxchg16b_emu\n\t",		\
+		       "cmpxchg16b " __percpu_arg([p1]) "\n\t"		\
+		       "setz %[ret]\n\t",				\
 		       X86_FEATURE_CX16,				\
-		       ASM_OUTPUT2("=a" (__ret), "+m" (pcp1),		\
-				   "+m" (pcp2), "+d" (__o2)),		\
+		       ASM_OUTPUT2([ret] "=a" (__ret),			\
+				   [p1] "+m" (pcp1), "+m" (pcp2),	\
+				   "+d" (__o2)),			\
 		       "b" (__n1), "c" (__n2), "a" (__o1) : "rsi");	\
 	__ret;								\
 })
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 3fa26a61eabc..51cf1c7e9aca 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -692,18 +692,18 @@ static inline void sync_core(void)
 
 	asm volatile (
 		UNWIND_HINT_SAVE
-		"mov %%ss, %0\n\t"
-		"pushq %q0\n\t"
+		"mov %%ss, %[tmp]\n\t"
+		"pushq %q[tmp]\n\t"
 		"pushq %%rsp\n\t"
 		"addq $8, (%%rsp)\n\t"
 		"pushfq\n\t"
-		"mov %%cs, %0\n\t"
-		"pushq %q0\n\t"
+		"mov %%cs, %[tmp]\n\t"
+		"pushq %q[tmp]\n\t"
 		"pushq $1f\n\t"
 		"iretq\n\t"
 		UNWIND_HINT_RESTORE
 		"1:"
-		: "=&r" (tmp), "+r" (__sp) : : "cc", "memory");
+		: [tmp] "=&r" (tmp), "+r" (__sp) : : "cc", "memory");
 #endif
 }
 
@@ -769,7 +769,7 @@ extern char			ignore_fpu_irq;
 # define BASE_PREFETCH		""
 # define ARCH_HAS_PREFETCH
 #else
-# define BASE_PREFETCH		"prefetcht0 %P1"
+# define BASE_PREFETCH		"prefetcht0 %P[x]"
 #endif
 
 /*
@@ -780,9 +780,9 @@ extern char			ignore_fpu_irq;
  */
 static inline void prefetch(const void *x)
 {
-	alternative_input(BASE_PREFETCH, "prefetchnta %P1",
+	alternative_input(BASE_PREFETCH, "prefetchnta %P[x]",
 			  X86_FEATURE_XMM,
-			  "m" (*(const char *)x));
+			  [x] "m" (*(const char *)x));
 }
 
 /*
@@ -792,9 +792,9 @@ static inline void prefetch(const void *x)
  */
 static inline void prefetchw(const void *x)
 {
-	alternative_input(BASE_PREFETCH, "prefetchw %P1",
+	alternative_input(BASE_PREFETCH, "prefetchw %P[x]",
 			  X86_FEATURE_3DNOWPREFETCH,
-			  "m" (*(const char *)x));
+			  [x] "m" (*(const char *)x));
 }
 
 static inline void spin_lock_prefetch(const void *x)
diff --git a/arch/x86/include/asm/rwsem.h b/arch/x86/include/asm/rwsem.h
index a34e0d4b957d..b715152fb2b5 100644
--- a/arch/x86/include/asm/rwsem.h
+++ b/arch/x86/include/asm/rwsem.h
@@ -63,14 +63,14 @@
 static inline void __down_read(struct rw_semaphore *sem)
 {
 	asm volatile("# beginning down_read\n\t"
-		     LOCK_PREFIX _ASM_INC "(%1)\n\t"
+		     LOCK_PREFIX _ASM_INC "(%[sem])\n\t"
 		     /* adds 0x00000001 */
-		     "  jns        1f\n"
-		     "  call call_rwsem_down_read_failed\n"
+		     "  jns        1f\n\t"
+		     "  call call_rwsem_down_read_failed\n\t"
 		     "1:\n\t"
 		     "# ending down_read\n\t"
 		     : "+m" (sem->count)
-		     : "a" (sem)
+		     : [sem] "a" (sem)
 		     : "memory", "cc");
 }
 
@@ -81,17 +81,18 @@ static inline bool __down_read_trylock(struct rw_semaphore *sem)
 {
 	long result, tmp;
 	asm volatile("# beginning __down_read_trylock\n\t"
-		     "  mov          %0,%1\n\t"
+		     "  mov          %[count],%[result]\n\t"
 		     "1:\n\t"
-		     "  mov          %1,%2\n\t"
-		     "  add          %3,%2\n\t"
+		     "  mov          %[result],%[tmp]\n\t"
+		     "  add          %[bias],%[tmp]\n\t"
 		     "  jle	     2f\n\t"
-		     LOCK_PREFIX "  cmpxchg  %2,%0\n\t"
+		     LOCK_PREFIX "  cmpxchg  %[tmp],%[count]\n\t"
 		     "  jnz	     1b\n\t"
 		     "2:\n\t"
 		     "# ending __down_read_trylock\n\t"
-		     : "+m" (sem->count), "=&a" (result), "=&r" (tmp)
-		     : "i" (RWSEM_ACTIVE_READ_BIAS)
+		     : [count] "+m" (sem->count), [result] "=&a" (result),
+		       [tmp] "=&r" (tmp)
+		     : [bias] "i" (RWSEM_ACTIVE_READ_BIAS)
 		     : "memory", "cc");
 	return result >= 0;
 }
@@ -99,25 +100,27 @@ static inline bool __down_read_trylock(struct rw_semaphore *sem)
 /*
  * lock for writing
  */
-#define ____down_write(sem, slow_path)			\
-({							\
-	long tmp;					\
-	struct rw_semaphore* ret;			\
-	register void *__sp asm(_ASM_SP);		\
-							\
-	asm volatile("# beginning down_write\n\t"	\
-		     LOCK_PREFIX "  xadd      %1,(%4)\n\t"	\
-		     /* adds 0xffff0001, returns the old value */ \
-		     "  test " __ASM_SEL(%w1,%k1) "," __ASM_SEL(%w1,%k1) "\n\t" \
-		     /* was the active mask 0 before? */\
-		     "  jz        1f\n"			\
-		     "  call " slow_path "\n"		\
-		     "1:\n"				\
-		     "# ending down_write"		\
-		     : "+m" (sem->count), "=d" (tmp), "=a" (ret), "+r" (__sp) \
-		     : "a" (sem), "1" (RWSEM_ACTIVE_WRITE_BIAS) \
-		     : "memory", "cc");			\
-	ret;						\
+#define ____down_write(sem, slow_path)					\
+({									\
+	long tmp;							\
+	struct rw_semaphore* ret;					\
+	register void *__sp asm(_ASM_SP);				\
+									\
+	asm volatile("# beginning down_write\n\t"			\
+		     LOCK_PREFIX "  xadd      %[tmp],(%[sem])\n\t"	\
+		     /* adds 0xffff0001, returns the old value */	\
+		     "  test      " __ASM_SEL_RAW(%w,%k) "[tmp],"	\
+				    __ASM_SEL_RAW(%w,%k) "[tmp]\n\t"	\
+		     /* was the active mask 0 before? */		\
+		     "  jz        1f\n"					\
+		     "  call " slow_path "\n\t"				\
+		     "1:\n\t"						\
+		     "# ending down_write\n\t"				\
+		     : "+m" (sem->count), [tmp] "=d" (tmp), "=a" (ret),	\
+		       "+r" (__sp)					\
+		     : [sem] "a" (sem), "d" (RWSEM_ACTIVE_WRITE_BIAS)	\
+		     : "memory", "cc");					\
+	ret;								\
 })
 
 static inline void __down_write(struct rw_semaphore *sem)
@@ -141,21 +144,22 @@ static inline bool __down_write_trylock(struct rw_semaphore *sem)
 	bool result;
 	long tmp0, tmp1;
 	asm volatile("# beginning __down_write_trylock\n\t"
-		     "  mov          %0,%1\n\t"
+		     "  mov          %[count],%[tmp0]\n\t"
 		     "1:\n\t"
-		     "  test " __ASM_SEL(%w1,%k1) "," __ASM_SEL(%w1,%k1) "\n\t"
+		     "  test " __ASM_SEL_RAW(%w,%k) "[tmp0],"
+			       __ASM_SEL_RAW(%w,%k) "[tmp0]\n\t"
 		     /* was the active mask 0 before? */
 		     "  jnz          2f\n\t"
-		     "  mov          %1,%2\n\t"
-		     "  add          %4,%2\n\t"
-		     LOCK_PREFIX "  cmpxchg  %2,%0\n\t"
+		     "  mov          %[tmp0],%[tmp1]\n\t"
+		     "  add          %[bias],%[tmp1]\n\t"
+		     LOCK_PREFIX "  cmpxchg  %[tmp1],%[count]\n\t"
 		     "  jnz	     1b\n\t"
 		     "2:\n\t"
 		     CC_SET(e)
 		     "# ending __down_write_trylock\n\t"
-		     : "+m" (sem->count), "=&a" (tmp0), "=&r" (tmp1),
-		       CC_OUT(e) (result)
-		     : "er" (RWSEM_ACTIVE_WRITE_BIAS)
+		     : [count] "+m" (sem->count), [tmp0] "=&a" (tmp0),
+		       [tmp1] "=&r" (tmp1), CC_OUT(e) (result)
+		     : [bias] "er" (RWSEM_ACTIVE_WRITE_BIAS)
 		     : "memory");
 	return result;
 }
@@ -167,14 +171,14 @@ static inline void __up_read(struct rw_semaphore *sem)
 {
 	long tmp;
 	asm volatile("# beginning __up_read\n\t"
-		     LOCK_PREFIX "  xadd      %1,(%2)\n\t"
+		     LOCK_PREFIX "  xadd      %[tmp],(%[sem])\n\t"
 		     /* subtracts 1, returns the old value */
 		     "  jns        1f\n\t"
 		     "  call call_rwsem_wake\n" /* expects old value in %edx */
 		     "1:\n"
 		     "# ending __up_read\n"
-		     : "+m" (sem->count), "=d" (tmp)
-		     : "a" (sem), "1" (-RWSEM_ACTIVE_READ_BIAS)
+		     : "+m" (sem->count), [tmp] "=d" (tmp)
+		     : [sem] "a" (sem), "d" (-RWSEM_ACTIVE_READ_BIAS)
 		     : "memory", "cc");
 }
 
@@ -185,14 +189,14 @@ static inline void __up_write(struct rw_semaphore *sem)
 {
 	long tmp;
 	asm volatile("# beginning __up_write\n\t"
-		     LOCK_PREFIX "  xadd      %1,(%2)\n\t"
+		     LOCK_PREFIX "  xadd      %[tmp],(%[sem])\n\t"
 		     /* subtracts 0xffff0001, returns the old value */
 		     "  jns        1f\n\t"
 		     "  call call_rwsem_wake\n" /* expects old value in %edx */
 		     "1:\n\t"
 		     "# ending __up_write\n"
-		     : "+m" (sem->count), "=d" (tmp)
-		     : "a" (sem), "1" (-RWSEM_ACTIVE_WRITE_BIAS)
+		     : "+m" (sem->count), [tmp] "=d" (tmp)
+		     : [sem] "a" (sem), "d" (-RWSEM_ACTIVE_WRITE_BIAS)
 		     : "memory", "cc");
 }
 
@@ -202,7 +206,7 @@ static inline void __up_write(struct rw_semaphore *sem)
 static inline void __downgrade_write(struct rw_semaphore *sem)
 {
 	asm volatile("# beginning __downgrade_write\n\t"
-		     LOCK_PREFIX _ASM_ADD "%2,(%1)\n\t"
+		     LOCK_PREFIX _ASM_ADD "%[bias],(%[sem])\n\t"
 		     /*
 		      * transitions 0xZZZZ0001 -> 0xYYYY0001 (i386)
 		      *     0xZZZZZZZZ00000001 -> 0xYYYYYYYY00000001 (x86_64)
@@ -212,7 +216,7 @@ static inline void __downgrade_write(struct rw_semaphore *sem)
 		     "1:\n\t"
 		     "# ending __downgrade_write\n"
 		     : "+m" (sem->count)
-		     : "a" (sem), "er" (-RWSEM_WAITING_BIAS)
+		     : [sem] "a" (sem), [bias] "er" (-RWSEM_WAITING_BIAS)
 		     : "memory", "cc");
 }
 
diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h
index 9efaabf5b54b..aeee6517ccc6 100644
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -216,10 +216,10 @@ static inline void clflush(volatile void *__p)
 
 static inline void clflushopt(volatile void *__p)
 {
-	alternative_io(".byte " __stringify(NOP_DS_PREFIX) "; clflush %P0",
-		       ".byte 0x66; clflush %P0",
+	alternative_io(".byte " __stringify(NOP_DS_PREFIX) "; clflush %P[p]",
+		       ".byte 0x66; clflush %P[p]",
 		       X86_FEATURE_CLFLUSHOPT,
-		       "+m" (*(volatile char __force *)__p));
+		       [p] "+m" (*(volatile char __force *)__p));
 }
 
 static inline void clwb(volatile void *__p)
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 184eb9894dae..12fb37310872 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -169,16 +169,16 @@ __typeof__(__builtin_choose_expr(sizeof(x) > sizeof(0UL), 0ULL, 0UL))
 	register void *__sp asm(_ASM_SP);				\
 	__chk_user_ptr(ptr);						\
 	might_fault();							\
-	asm volatile("call __get_user_%P4"				\
+	asm volatile("call __get_user_%P[size]"				\
 		     : "=a" (__ret_gu), "=r" (__val_gu), "+r" (__sp)	\
-		     : "0" (ptr), "i" (sizeof(*(ptr))));		\
+		     : "a" (ptr), [size] "i" (sizeof(*(ptr))));		\
 	(x) = (__force __typeof__(*(ptr))) __val_gu;			\
 	__builtin_expect(__ret_gu, 0);					\
 })
 
 #define __put_user_x(size, x, ptr, __ret_pu)			\
 	asm volatile("call __put_user_" #size : "=a" (__ret_pu)	\
-		     : "0" ((typeof(*(ptr)))(x)), "c" (ptr) : "ebx")
+		     : "a" ((typeof(*(ptr)))(x)), "c" (ptr) : "ebx")
 
 
 
diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h
index b16f6a1d8b26..e09f71424795 100644
--- a/arch/x86/include/asm/uaccess_64.h
+++ b/arch/x86/include/asm/uaccess_64.h
@@ -34,14 +34,14 @@ copy_user_generic(void *to, const void *from, unsigned len)
 	 * Otherwise, use copy_user_generic_unrolled.
 	 */
 	alternative_call_2(copy_user_generic_unrolled,
-			 copy_user_generic_string,
-			 X86_FEATURE_REP_GOOD,
-			 copy_user_enhanced_fast_string,
-			 X86_FEATURE_ERMS,
-			 ASM_OUTPUT2("=a" (ret), "=D" (to), "=S" (from),
-				     "=d" (len)),
-			 "1" (to), "2" (from), "3" (len)
-			 : "memory", "rcx", "r8", "r9", "r10", "r11");
+			   copy_user_generic_string,
+			   X86_FEATURE_REP_GOOD,
+			   copy_user_enhanced_fast_string,
+			   X86_FEATURE_ERMS,
+			   ASM_OUTPUT2("=a" (ret), "+D" (to), "+S" (from),
+				       "+d" (len)),
+			   ASM_NO_INPUT_CLOBBER("memory", "rcx", "r8", "r9",
+						"r10", "r11"));
 	return ret;
 }
 
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 3/4] x86/asm: Make alternative macro interfaces more clear and consistent
  2017-08-31 14:11 [RFC PATCH 0/4] x86/asm: Add ASM_CALL() macro for inline asms with call instructions Josh Poimboeuf
  2017-08-31 14:11 ` [RFC PATCH 1/4] x86/paravirt: Fix output constraint macro names Josh Poimboeuf
  2017-08-31 14:11 ` [RFC PATCH 2/4] x86/asm: Convert some inline asm positional operands to named operands Josh Poimboeuf
@ 2017-08-31 14:11 ` Josh Poimboeuf
  2017-08-31 16:11   ` Linus Torvalds
  2017-08-31 14:11 ` [RFC PATCH 4/4] x86/asm: Use ASM_CALL() macro for inline asm statements with call instructions Josh Poimboeuf
  3 siblings, 1 reply; 26+ messages in thread
From: Josh Poimboeuf @ 2017-08-31 14:11 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Andy Lutomirski, Linus Torvalds, Alexander Potapenko,
	Dmitriy Vyukov, Matthias Kaehlcke, Arnd Bergmann, Peter Zijlstra

The alternative code has some macros which are used for wrapping inline
asm statements.  These macros can be confusing:

- Some of them require the caller to start their positional operands at
  '%1' instead of '%0'.

- There are multiple variants of the macros which basically do the same
  thing, but their interfaces are subtly different.

- There's an optional ASM_OUTPUT2() macro which is used for grouping
  output constraint arguments.  This macro is poorly named as it doesn't
  clearly describe its purpose.  And it's only used some of the time
  (when there's more than one output constraint), making the interface
  more confusing.

- There's no corresponding ASM_INPUT2() macro for inputs, so it's not
  always intuitive to figure out which arguments are the outputs and
  which ones are the inputs.

- The ASM_NO_INPUT_CLOBBER() macro is also confusing unless you know
  what it does.

Make the following changes:

- Give alternative_io(), alternative_call(), and alternative_call_2()
  consistent interfaces.  The constraints are provided by use of the
  OUTPUTS(), INPUTS(), and CLOBBERS() macros.

- Get rid of the alternative_input() macro and replace it with a more
  general purpose version of alternative_io().

- Also get rid of alternative_input_2() since it's not used anywhere.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/include/asm/alternative.h   |  74 ++++++++++---------------
 arch/x86/include/asm/apic.h          |   3 +-
 arch/x86/include/asm/atomic64_32.h   | 101 ++++++++++++++++++++++-------------
 arch/x86/include/asm/cmpxchg_32.h    |  20 +++----
 arch/x86/include/asm/page_64.h       |   5 +-
 arch/x86/include/asm/percpu.h        |   8 +--
 arch/x86/include/asm/processor.h     |  14 ++---
 arch/x86/include/asm/special_insns.h |   3 +-
 arch/x86/include/asm/uaccess_64.h    |   8 +--
 include/linux/compiler.h             |  29 ++++++++++
 10 files changed, 155 insertions(+), 110 deletions(-)

diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
index 1b020381ab38..53f18258c86f 100644
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -171,43 +171,37 @@ static inline int alternatives_text_reserved(void *start, void *end)
 	asm volatile(ALTERNATIVE_2(oldinstr, newinstr1, feature1, newinstr2, feature2) ::: "memory")
 
 /*
- * Alternative inline assembly with input.
+ * Alternative inline assembly with user-specified constraints.  Use the
+ * OUTPUTS(), INPUTS(), and CLOBBERS() macros to combine the arguments as
+ * needed.
  *
  * Pecularities:
  * No memory clobber here.
- * Argument numbers start with 1.
  * Best is to use constraints that are fixed size (like (%1) ... "r")
  * If you use variable sized constraints like "m" or "g" in the
  * replacement make sure to pad to the worst case length.
  * Leaving an unused argument 0 to keep API compatibility.
  */
-#define alternative_input(oldinstr, newinstr, feature, input...)	\
+#define alternative_io(oldinstr, newinstr, feature, outputs, inputs,	\
+		       clobbers...)					\
 	asm volatile (ALTERNATIVE(oldinstr, newinstr, feature)		\
-		: : "i" (0), ## input)
+		      : outputs						\
+		      : inputs						\
+		      CLOBBERS_APPEND(clobbers))
 
 /*
- * This is similar to alternative_input. But it has two features and
- * respective instructions.
+ * Like alternative_io, but for replacing a direct call with another one.
  *
- * If CPU has feature2, newinstr2 is used.
- * Otherwise, if CPU has feature1, newinstr1 is used.
- * Otherwise, oldinstr is used.
+ * Positional operand names ("%0") and constraints ("0" (foo)) are not allowed.
  */
-#define alternative_input_2(oldinstr, newinstr1, feature1, newinstr2,	     \
-			   feature2, input...)				     \
-	asm volatile(ALTERNATIVE_2(oldinstr, newinstr1, feature1,	     \
-		newinstr2, feature2)					     \
-		: : "i" (0), ## input)
-
-/* Like alternative_input, but with a single output argument */
-#define alternative_io(oldinstr, newinstr, feature, output, input...)	\
-	asm volatile (ALTERNATIVE(oldinstr, newinstr, feature)		\
-		: output : "i" (0), ## input)
-
-/* Like alternative_io, but for replacing a direct call with another one. */
-#define alternative_call(oldfunc, newfunc, feature, output, input...)	\
-	asm volatile (ALTERNATIVE("call %P[old]", "call %P[new]", feature) \
-		: output : [old] "i" (oldfunc), [new] "i" (newfunc), ## input)
+#define alternative_call(oldfunc, newfunc, feature, outputs, inputs,	\
+			 clobbers...)					\
+	asm volatile (ALTERNATIVE("call %P[old]", "call %P[new]",	\
+				  feature),				\
+		      : outputs						\
+		      : [old] "i" (oldfunc), [new] "i" (newfunc)	\
+		        ARGS_APPEND(inputs)				\
+		      CLOBBERS_APPEND(clobbers))
 
 /*
  * Like alternative_call, but there are two features and respective functions.
@@ -215,29 +209,19 @@ static inline int alternatives_text_reserved(void *start, void *end)
  * Otherwise, if CPU has feature1, function1 is used.
  * Otherwise, old function is used.
  */
-#define alternative_call_2(oldfunc, newfunc1, feature1, newfunc2, feature2,   \
-			   output, input...)				      \
-{									      \
-	register void *__sp asm(_ASM_SP);				      \
-	asm volatile (ALTERNATIVE_2("call %P[old]", "call %P[new1]", feature1,\
-		"call %P[new2]", feature2)				      \
-		: output, "+r" (__sp)					      \
-		: [old] "i" (oldfunc), [new1] "i" (newfunc1),		      \
-		  [new2] "i" (newfunc2), ## input);			      \
+#define alternative_call_2(oldfunc, newfunc1, feature1, newfunc2,	\
+			   feature2, outputs, inputs, clobbers...)	\
+{									\
+	register void *__sp asm(_ASM_SP);				\
+	asm volatile (ALTERNATIVE_2("call %P[old]",			\
+				    "call %P[new1]", feature1,		\
+				    "call %P[new2]", feature2)		\
+		      : "+r" (__sp) ARGS_APPEND(outputs)		\
+		      : [old] "i" (oldfunc), [new1] "i" (newfunc1),	\
+			[new2] "i" (newfunc2) ARGS_APPEND(inputs)	\
+		      CLOBBERS_APPEND(clobbers));			\
 }
 
-/*
- * use this macro(s) if you need more than one output parameter
- * in alternative_io
- */
-#define ASM_OUTPUT2(a...) a
-
-/*
- * use this macro if you need clobbers but no inputs in
- * alternative_{input,io,call}()
- */
-#define ASM_NO_INPUT_CLOBBER(clbr...) "i" (0) : clbr
-
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_X86_ALTERNATIVE_H */
diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index 5a7e0eb38350..77d743d10b21 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -95,7 +95,8 @@ static inline void native_apic_mem_write(u32 reg, u32 v)
 
 	alternative_io("movl %[val], %P[reg]", "xchgl %[val], %P[reg]",
 		       X86_BUG_11AP,
-		       ASM_OUTPUT2([val] "+r" (v), [reg] "+m" (*addr)));
+		       OUTPUTS([val] "+r" (v), [reg] "+m" (*addr)),
+		       INPUTS());
 }
 
 static inline u32 native_apic_mem_read(u32 reg)
diff --git a/arch/x86/include/asm/atomic64_32.h b/arch/x86/include/asm/atomic64_32.h
index 9e206f31ce2a..7ab0efe8a13d 100644
--- a/arch/x86/include/asm/atomic64_32.h
+++ b/arch/x86/include/asm/atomic64_32.h
@@ -22,15 +22,19 @@ typedef struct {
 #endif
 
 #ifdef CONFIG_X86_CMPXCHG64
-#define __alternative_atomic64(f, g, out, in...) \
-	asm volatile("call %P[func]" \
-		     : out : [func] "i" (atomic64_##g##_cx8), ## in)
+#define __alternative_atomic64(f, g, outputs, inputs, clobbers...)	\
+	asm volatile("call %P[func]"					\
+		     : outputs						\
+		     : [func] "i" (atomic64_##g##_cx8)			\
+		       ARGS_APPEND(inputs)				\
+		     CLOBBERS_APPEND(clobbers))
 
 #define ATOMIC64_DECL(sym) ATOMIC64_DECL_ONE(sym##_cx8)
 #else
-#define __alternative_atomic64(f, g, out, in...) \
-	alternative_call(atomic64_##f##_386, atomic64_##g##_cx8, \
-			 X86_FEATURE_CX8, ASM_OUTPUT2(out), ## in)
+#define __alternative_atomic64(f, g, outputs, inputs, clobbers...)	\
+	alternative_call(atomic64_##f##_386, atomic64_##g##_cx8,	\
+			 X86_FEATURE_CX8,				\
+			 OUTPUTS(outputs), INPUTS(inputs), clobbers)
 
 #define ATOMIC64_DECL(sym) ATOMIC64_DECL_ONE(sym##_cx8); \
 	ATOMIC64_DECL_ONE(sym##_386)
@@ -41,8 +45,9 @@ ATOMIC64_DECL_ONE(inc_386);
 ATOMIC64_DECL_ONE(dec_386);
 #endif
 
-#define alternative_atomic64(f, out, in...) \
-	__alternative_atomic64(f, f, ASM_OUTPUT2(out), ## in)
+#define alternative_atomic64(f, outputs, inputs, clobbers...)		\
+	__alternative_atomic64(f, f, OUTPUTS(outputs), INPUTS(inputs),	\
+			       clobbers)
 
 ATOMIC64_DECL(read);
 ATOMIC64_DECL(set);
@@ -88,9 +93,10 @@ static inline long long atomic64_xchg(atomic64_t *v, long long n)
 	long long o;
 	unsigned high = (unsigned)(n >> 32);
 	unsigned low = (unsigned)n;
-	alternative_atomic64(xchg, "=&A" (o),
-			     "S" (v), "b" (low), "c" (high)
-			     : "memory");
+	alternative_atomic64(xchg,
+			     OUTPUTS("=&A" (o)),
+			     INPUTS("S" (v), "b" (low), "c" (high)),
+			     CLOBBERS("memory"));
 	return o;
 }
 
@@ -105,9 +111,10 @@ static inline void atomic64_set(atomic64_t *v, long long i)
 {
 	unsigned high = (unsigned)(i >> 32);
 	unsigned low = (unsigned)i;
-	alternative_atomic64(set, /* no output */,
-			     "S" (v), "b" (low), "c" (high)
-			     : "eax", "edx", "memory");
+	alternative_atomic64(set,
+			     OUTPUTS(),
+			     INPUTS("S" (v), "b" (low), "c" (high)),
+			     CLOBBERS("eax", "edx", "memory"));
 }
 
 /**
@@ -119,7 +126,10 @@ static inline void atomic64_set(atomic64_t *v, long long i)
 static inline long long atomic64_read(const atomic64_t *v)
 {
 	long long r;
-	alternative_atomic64(read, "=&A" (r), "c" (v) : "memory");
+	alternative_atomic64(read,
+			     OUTPUTS("=&A" (r)),
+			     INPUTS("c" (v)),
+			     CLOBBERS("memory"));
 	return r;
  }
 
@@ -133,8 +143,9 @@ static inline long long atomic64_read(const atomic64_t *v)
 static inline long long atomic64_add_return(long long i, atomic64_t *v)
 {
 	alternative_atomic64(add_return,
-			     ASM_OUTPUT2("+A" (i), "+c" (v)),
-			     ASM_NO_INPUT_CLOBBER("memory"));
+			     OUTPUTS("+A" (i), "+c" (v)),
+			     INPUTS(),
+			     CLOBBERS("memory"));
 	return i;
 }
 
@@ -144,24 +155,29 @@ static inline long long atomic64_add_return(long long i, atomic64_t *v)
 static inline long long atomic64_sub_return(long long i, atomic64_t *v)
 {
 	alternative_atomic64(sub_return,
-			     ASM_OUTPUT2("+A" (i), "+c" (v)),
-			     ASM_NO_INPUT_CLOBBER("memory"));
+			     OUTPUTS("+A" (i), "+c" (v)),
+			     INPUTS(),
+			     CLOBBERS("memory"));
 	return i;
 }
 
 static inline long long atomic64_inc_return(atomic64_t *v)
 {
 	long long a;
-	alternative_atomic64(inc_return, "=&A" (a),
-			     "S" (v) : "memory", "ecx");
+	alternative_atomic64(inc_return,
+			     OUTPUTS("=&A" (a)),
+			     INPUTS("S" (v)),
+			     CLOBBERS("memory", "ecx"));
 	return a;
 }
 
 static inline long long atomic64_dec_return(atomic64_t *v)
 {
 	long long a;
-	alternative_atomic64(dec_return, "=&A" (a),
-			     "S" (v) : "memory", "ecx");
+	alternative_atomic64(dec_return,
+			     OUTPUTS("=&A" (a)),
+			     INPUTS("S" (v)),
+			     CLOBBERS("memory", "ecx"));
 	return a;
 }
 
@@ -175,8 +191,9 @@ static inline long long atomic64_dec_return(atomic64_t *v)
 static inline long long atomic64_add(long long i, atomic64_t *v)
 {
 	__alternative_atomic64(add, add_return,
-			       ASM_OUTPUT2("+A" (i), "+c" (v)),
-			       ASM_NO_INPUT_CLOBBER("memory"));
+			       OUTPUTS("+A" (i), "+c" (v)),
+			       INPUTS(),
+			       CLOBBERS("memory"));
 	return i;
 }
 
@@ -190,8 +207,9 @@ static inline long long atomic64_add(long long i, atomic64_t *v)
 static inline long long atomic64_sub(long long i, atomic64_t *v)
 {
 	__alternative_atomic64(sub, sub_return,
-			       ASM_OUTPUT2("+A" (i), "+c" (v)),
-			       ASM_NO_INPUT_CLOBBER("memory"));
+			       OUTPUTS("+A" (i), "+c" (v)),
+			       INPUTS(),
+			       CLOBBERS("memory"));
 	return i;
 }
 
@@ -217,8 +235,10 @@ static inline int atomic64_sub_and_test(long long i, atomic64_t *v)
  */
 static inline void atomic64_inc(atomic64_t *v)
 {
-	__alternative_atomic64(inc, inc_return, /* no output */,
-			       "S" (v) : "memory", "eax", "ecx", "edx");
+	__alternative_atomic64(inc, inc_return,
+			       OUTPUTS(),
+			       INPUTS("S" (v)),
+			       CLOBBERS("memory", "eax", "ecx", "edx"));
 }
 
 /**
@@ -229,8 +249,10 @@ static inline void atomic64_inc(atomic64_t *v)
  */
 static inline void atomic64_dec(atomic64_t *v)
 {
-	__alternative_atomic64(dec, dec_return, /* no output */,
-			       "S" (v) : "memory", "eax", "ecx", "edx");
+	__alternative_atomic64(dec, dec_return,
+			       OUTPUTS(),
+			       INPUTS("S" (v)),
+			       CLOBBERS("memory", "eax", "ecx", "edx"));
 }
 
 /**
@@ -287,8 +309,9 @@ static inline int atomic64_add_unless(atomic64_t *v, long long a, long long u)
 	unsigned low = (unsigned)u;
 	unsigned high = (unsigned)(u >> 32);
 	alternative_atomic64(add_unless,
-			     ASM_OUTPUT2("+A" (a), "+c" (low), "+D" (high)),
-			     "S" (v) : "memory");
+			     OUTPUTS("+A" (a), "+c" (low), "+D" (high)),
+			     INPUTS("S" (v)),
+			     CLOBBERS("memory"));
 	return (int)a;
 }
 
@@ -296,16 +319,20 @@ static inline int atomic64_add_unless(atomic64_t *v, long long a, long long u)
 static inline int atomic64_inc_not_zero(atomic64_t *v)
 {
 	int r;
-	alternative_atomic64(inc_not_zero, "=&a" (r),
-			     "S" (v) : "ecx", "edx", "memory");
+	alternative_atomic64(inc_not_zero,
+			     OUTPUTS("=&a" (r)),
+			     INPUTS("S" (v)),
+			     CLOBBERS("ecx", "edx", "memory"));
 	return r;
 }
 
 static inline long long atomic64_dec_if_positive(atomic64_t *v)
 {
 	long long r;
-	alternative_atomic64(dec_if_positive, "=&A" (r),
-			     "S" (v) : "ecx", "memory");
+	alternative_atomic64(dec_if_positive,
+			     OUTPUTS("=&A" (r)),
+			     INPUTS("S" (v)),
+			     CLOBBERS("ecx", "memory"));
 	return r;
 }
 
diff --git a/arch/x86/include/asm/cmpxchg_32.h b/arch/x86/include/asm/cmpxchg_32.h
index 8154a317899f..9d0db650f73a 100644
--- a/arch/x86/include/asm/cmpxchg_32.h
+++ b/arch/x86/include/asm/cmpxchg_32.h
@@ -84,11 +84,11 @@ static inline u64 __cmpxchg64_local(volatile u64 *ptr, u64 old, u64 new)
 			"call cmpxchg8b_emu",			\
 			"lock; cmpxchg8b (%%esi)" ,		\
 		       X86_FEATURE_CX8,				\
-		       "=A" (__ret),				\
-		       "S" ((ptr)), "A" (__old),		\
-		       "b" ((unsigned int)__new),		\
-		       "c" ((unsigned int)(__new>>32))		\
-		       : "memory");				\
+		       OUTPUTS("=A" (__ret)),			\
+		       INPUTS("S" ((ptr)), "A" (__old),		\
+			      "b" ((unsigned int)__new),	\
+			      "c" ((unsigned int)(__new>>32))),	\
+		       CLOBBERS("memory"));			\
 	__ret; })
 
 
@@ -100,11 +100,11 @@ static inline u64 __cmpxchg64_local(volatile u64 *ptr, u64 old, u64 new)
 	alternative_io("call cmpxchg8b_emu",			\
 		       "cmpxchg8b (%%esi)" ,			\
 		       X86_FEATURE_CX8,				\
-		       "=A" (__ret),				\
-		       "S" ((ptr)), "A" (__old),		\
-		       "b" ((unsigned int)__new),		\
-		       "c" ((unsigned int)(__new>>32))		\
-		       : "memory");				\
+		       OUTPUTS("=A" (__ret)),			\
+		       INPUTS("S" ((ptr)), "A" (__old),		\
+			      "b" ((unsigned int)__new),	\
+			      "c" ((unsigned int)(__new>>32))),	\
+		       CLOBBERS("memory"));			\
 	__ret; })
 
 #endif
diff --git a/arch/x86/include/asm/page_64.h b/arch/x86/include/asm/page_64.h
index f7dbe752f80d..c6bf768f7708 100644
--- a/arch/x86/include/asm/page_64.h
+++ b/arch/x86/include/asm/page_64.h
@@ -44,8 +44,9 @@ static inline void clear_page(void *page)
 	alternative_call_2(clear_page_orig,
 			   clear_page_rep, X86_FEATURE_REP_GOOD,
 			   clear_page_erms, X86_FEATURE_ERMS,
-			   "+D" (page),
-			   ASM_NO_INPUT_CLOBBER("memory", "rax", "rcx"));
+			   OUTPUTS("+D" (page)),
+			   INPUTS(),
+			   CLOBBERS("memory", "rax", "rcx"));
 }
 
 void copy_page(void *to, void *from);
diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index 2d1753758b0b..dc421c754e69 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -498,10 +498,10 @@ do {									\
 		       "cmpxchg16b " __percpu_arg([p1]) "\n\t"		\
 		       "setz %[ret]\n\t",				\
 		       X86_FEATURE_CX16,				\
-		       ASM_OUTPUT2([ret] "=a" (__ret),			\
-				   [p1] "+m" (pcp1), "+m" (pcp2),	\
-				   "+d" (__o2)),			\
-		       "b" (__n1), "c" (__n2), "a" (__o1) : "rsi");	\
+		       OUTPUTS([ret] "=a" (__ret), [p1] "+m" (pcp1),	\
+			       "+m" (pcp2), "+d" (__o2)),		\
+		       INPUTS("b" (__n1), "c" (__n2), "a" (__o1)),	\
+		       CLOBBERS("rsi"));				\
 	__ret;								\
 })
 
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 51cf1c7e9aca..44d7c6f033c9 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -780,9 +780,10 @@ extern char			ignore_fpu_irq;
  */
 static inline void prefetch(const void *x)
 {
-	alternative_input(BASE_PREFETCH, "prefetchnta %P[x]",
-			  X86_FEATURE_XMM,
-			  [x] "m" (*(const char *)x));
+	alternative_io(BASE_PREFETCH, "prefetchnta %P[x]",
+		       X86_FEATURE_XMM,
+		       OUTPUTS(),
+		       INPUTS([x] "m" (*(const char *)x)));
 }
 
 /*
@@ -792,9 +793,10 @@ static inline void prefetch(const void *x)
  */
 static inline void prefetchw(const void *x)
 {
-	alternative_input(BASE_PREFETCH, "prefetchw %P[x]",
-			  X86_FEATURE_3DNOWPREFETCH,
-			  [x] "m" (*(const char *)x));
+	alternative_io(BASE_PREFETCH, "prefetchw %P[x]",
+		       X86_FEATURE_3DNOWPREFETCH,
+		       OUTPUTS(),
+		       INPUTS([x] "m" (*(const char *)x)));
 }
 
 static inline void spin_lock_prefetch(const void *x)
diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h
index aeee6517ccc6..fa2caa1e77a5 100644
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -219,7 +219,8 @@ static inline void clflushopt(volatile void *__p)
 	alternative_io(".byte " __stringify(NOP_DS_PREFIX) "; clflush %P[p]",
 		       ".byte 0x66; clflush %P[p]",
 		       X86_FEATURE_CLFLUSHOPT,
-		       [p] "+m" (*(volatile char __force *)__p));
+		       OUTPUTS([p] "+m" (*(volatile char __force *)__p)),
+		       INPUTS());
 }
 
 static inline void clwb(volatile void *__p)
diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h
index e09f71424795..9afd40681e43 100644
--- a/arch/x86/include/asm/uaccess_64.h
+++ b/arch/x86/include/asm/uaccess_64.h
@@ -38,10 +38,10 @@ copy_user_generic(void *to, const void *from, unsigned len)
 			   X86_FEATURE_REP_GOOD,
 			   copy_user_enhanced_fast_string,
 			   X86_FEATURE_ERMS,
-			   ASM_OUTPUT2("=a" (ret), "+D" (to), "+S" (from),
-				       "+d" (len)),
-			   ASM_NO_INPUT_CLOBBER("memory", "rcx", "r8", "r9",
-						"r10", "r11"));
+			   OUTPUTS("=a" (ret), "+D" (to), "+S" (from),
+				   "+d" (len)),
+			   INPUTS(),
+			   CLOBBERS("memory", "rcx", "r8", "r9", "r10", "r11"));
 	return ret;
 }
 
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index dfaaeec4a32e..c738966434c1 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -620,4 +620,33 @@ static __always_inline void __write_once_size(volatile void *p, void *res, int s
 	(_________p1); \
 })
 
+#define __ARG17(_0, _1, _2, _3, _4, _5, _6, _7, _8, _9, _10, _11, _12,	\
+		_13, _14, _15, _16, _17, ...) _17
+
+/* Return 1 if the macro has arguments, 0 otherwise. */
+#define HAS_ARGS(...) __ARG17(0, ## __VA_ARGS__, 1, 1, 1, 1, 1, 1, 1,	\
+			      1, 1, 1, 1, 1, 1, 1, 1, 1, 0)
+
+/* Macros for passing inline asm constraints as macro arguments */
+#define ARGS(args...) args
+#define OUTPUTS ARGS
+#define INPUTS ARGS
+#define CLOBBERS ARGS
+
+/* If the macro has arguments, prepend them with a comma. */
+#define ARGS_APPEND(...) , ## __VA_ARGS__
+
+#define __CLOBBERS_APPEND_0(...)
+#define __CLOBBERS_APPEND_1(...)  : __VA_ARGS__
+
+#define __CLOBBERS_APPEND(has_args, ...) \
+	__CLOBBERS_APPEND_ ## has_args (__VA_ARGS__)
+
+#define _CLOBBERS_APPEND(has_args, ...) \
+	__CLOBBERS_APPEND(has_args, __VA_ARGS__)
+
+/* If the macro has arguments, prepend them with a colon. */
+#define CLOBBERS_APPEND(...) \
+	_CLOBBERS_APPEND(HAS_ARGS(__VA_ARGS__), __VA_ARGS__)
+
 #endif /* __LINUX_COMPILER_H */
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 4/4] x86/asm: Use ASM_CALL() macro for inline asm statements with call instructions
  2017-08-31 14:11 [RFC PATCH 0/4] x86/asm: Add ASM_CALL() macro for inline asms with call instructions Josh Poimboeuf
                   ` (2 preceding siblings ...)
  2017-08-31 14:11 ` [RFC PATCH 3/4] x86/asm: Make alternative macro interfaces more clear and consistent Josh Poimboeuf
@ 2017-08-31 14:11 ` Josh Poimboeuf
  2017-08-31 14:50   ` Peter Zijlstra
  3 siblings, 1 reply; 26+ messages in thread
From: Josh Poimboeuf @ 2017-08-31 14:11 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Andy Lutomirski, Linus Torvalds, Alexander Potapenko,
	Dmitriy Vyukov, Matthias Kaehlcke, Arnd Bergmann, Peter Zijlstra

Inline asm statements which have call instructions can be problematic.
GCC doesn't know about the call instructions, so in some cases it can
insert the asm before setting up the frame pointer.  This can result in
bad stack traces when unwinding from the called function.

Previously we worked around this issue by listing the stack pointer as
an input/output constraint for the inline asm.  That works for GCC, but
unfortunately it doesn't work for Clang.  In fact, it causes Clang to
corrupt the stack pointer.

Introduce a new ASM_CALL() macro, which should be used for all inline
statements which have call instructions.  On GCC with frame pointers, it
sets the stack pointer as an input/output constraint, like before.  On
GCC without frame pointers, it does nothing, which saves a small amount
of text.  On Clang, it does nothing (for now).

Reported-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/include/asm/alternative.h               |  28 +++---
 arch/x86/include/asm/arch_hweight.h              |  14 +--
 arch/x86/include/asm/atomic64_32.h               |  10 +-
 arch/x86/include/asm/mshyperv.h                  |  52 +++++-----
 arch/x86/include/asm/paravirt_types.h            |  54 +++++------
 arch/x86/include/asm/preempt.h                   |  15 +--
 arch/x86/include/asm/processor.h                 |  17 ++--
 arch/x86/include/asm/rwsem.h                     | 115 +++++++++++------------
 arch/x86/include/asm/uaccess.h                   |  24 ++---
 arch/x86/include/asm/xen/hypercall.h             |  59 ++++++------
 arch/x86/kvm/emulate.c                           |   9 +-
 arch/x86/kvm/vmx.c                               |  17 ++--
 arch/x86/mm/fault.c                              |  13 ++-
 include/linux/compiler-clang.h                   |   2 +
 include/linux/compiler-gcc.h                     |  19 ++++
 include/linux/compiler.h                         |   5 +
 tools/objtool/Documentation/stack-validation.txt |  19 ++--
 17 files changed, 237 insertions(+), 235 deletions(-)

diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
index 53f18258c86f..d10179652999 100644
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -196,12 +196,11 @@ static inline int alternatives_text_reserved(void *start, void *end)
  */
 #define alternative_call(oldfunc, newfunc, feature, outputs, inputs,	\
 			 clobbers...)					\
-	asm volatile (ALTERNATIVE("call %P[old]", "call %P[new]",	\
-				  feature),				\
-		      : outputs						\
-		      : [old] "i" (oldfunc), [new] "i" (newfunc)	\
-		        ARGS_APPEND(inputs)				\
-		      CLOBBERS_APPEND(clobbers))
+	ASM_CALL(ALTERNATIVE("call %P[old]", "call %P[new]", feature),	\
+		 OUTPUTS(outputs),					\
+		 INPUTS([old] "i" (oldfunc), [new] "i" (newfunc)	\
+			ARGS_APPEND(inputs)),				\
+		 clobbers)
 
 /*
  * Like alternative_call, but there are two features and respective functions.
@@ -211,16 +210,13 @@ static inline int alternatives_text_reserved(void *start, void *end)
  */
 #define alternative_call_2(oldfunc, newfunc1, feature1, newfunc2,	\
 			   feature2, outputs, inputs, clobbers...)	\
-{									\
-	register void *__sp asm(_ASM_SP);				\
-	asm volatile (ALTERNATIVE_2("call %P[old]",			\
-				    "call %P[new1]", feature1,		\
-				    "call %P[new2]", feature2)		\
-		      : "+r" (__sp) ARGS_APPEND(outputs)		\
-		      : [old] "i" (oldfunc), [new1] "i" (newfunc1),	\
-			[new2] "i" (newfunc2) ARGS_APPEND(inputs)	\
-		      CLOBBERS_APPEND(clobbers));			\
-}
+	ASM_CALL(ALTERNATIVE_2("call %P[old]",				\
+			       "call %P[new1]", feature1,		\
+			       "call %P[new2]", feature2),		\
+		 OUTPUTS(outputs),					\
+		 INPUTS([old] "i" (oldfunc), [new1] "i" (newfunc1),	\
+			[new2] "i" (newfunc2) ARGS_APPEND(inputs)),	\
+		 clobbers)
 
 #endif /* __ASSEMBLY__ */
 
diff --git a/arch/x86/include/asm/arch_hweight.h b/arch/x86/include/asm/arch_hweight.h
index e7cd63175de4..3da74407eca0 100644
--- a/arch/x86/include/asm/arch_hweight.h
+++ b/arch/x86/include/asm/arch_hweight.h
@@ -23,9 +23,10 @@ static __always_inline unsigned int __arch_hweight32(unsigned int w)
 {
 	unsigned int res;
 
-	asm (ALTERNATIVE("call __sw_hweight32", POPCNT32, X86_FEATURE_POPCNT)
-			 : "="REG_OUT (res)
-			 : REG_IN (w));
+	ASM_CALL(ALTERNATIVE("call __sw_hweight32", POPCNT32,
+			     X86_FEATURE_POPCNT),
+		 OUTPUTS("="REG_OUT (res)),
+		 INPUTS(REG_IN (w)));
 
 	return res;
 }
@@ -51,9 +52,10 @@ static __always_inline unsigned long __arch_hweight64(__u64 w)
 {
 	unsigned long res;
 
-	asm (ALTERNATIVE("call __sw_hweight64", POPCNT64, X86_FEATURE_POPCNT)
-			 : "="REG_OUT (res)
-			 : REG_IN (w));
+	ASM_CALL(ALTERNATIVE("call __sw_hweight64", POPCNT64,
+			     X86_FEATURE_POPCNT),
+		 OUTPUTS("="REG_OUT (res)),
+		 INPUTS(REG_IN (w)));
 
 	return res;
 }
diff --git a/arch/x86/include/asm/atomic64_32.h b/arch/x86/include/asm/atomic64_32.h
index 7ab0efe8a13d..cb74b47ea71b 100644
--- a/arch/x86/include/asm/atomic64_32.h
+++ b/arch/x86/include/asm/atomic64_32.h
@@ -23,11 +23,11 @@ typedef struct {
 
 #ifdef CONFIG_X86_CMPXCHG64
 #define __alternative_atomic64(f, g, outputs, inputs, clobbers...)	\
-	asm volatile("call %P[func]"					\
-		     : outputs						\
-		     : [func] "i" (atomic64_##g##_cx8)			\
-		       ARGS_APPEND(inputs)				\
-		     CLOBBERS_APPEND(clobbers))
+	ASM_CALL("call %P[func]",					\
+		 OUTPUTS(outputs),					\
+		 INPUTS([func] "i" (atomic64_##g##_cx8)			\
+			ARGS_APPEND(inputs)),				\
+		 clobbers)
 
 #define ATOMIC64_DECL(sym) ATOMIC64_DECL_ONE(sym##_cx8)
 #else
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index d0675d58fa32..d46b384ec987 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -177,19 +177,17 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
 	u64 input_address = input ? virt_to_phys(input) : 0;
 	u64 output_address = output ? virt_to_phys(output) : 0;
 	u64 hv_status;
-	register void *__sp asm(_ASM_SP);
 
 #ifdef CONFIG_X86_64
 	if (!hv_hypercall_pg)
 		return U64_MAX;
 
-	__asm__ __volatile__("mov %[out], %%r8\n"
-			     "call *%[pg]"
-			     : "=a" (hv_status), "+r" (__sp),
-			       "+c" (control), "+d" (input_address)
-			     :  [out] "r" (output_address),
-			        [pg] "m" (hv_hypercall_pg)
-			     : "cc", "memory", "r8", "r9", "r10", "r11");
+	ASM_CALL("mov %[out], %%r8\n"
+		 "call *%[pg]",
+		 OUTPUTS("=a" (hv_status), "+c" (control),
+			 "+d" (input_address)),
+		 INPUTS([out] "r" (output_address), [pg] "m" (hv_hypercall_pg)),
+		 CLOBBERS("cc", "memory", "r8", "r9", "r10", "r11"));
 #else
 	u32 input_address_hi = upper_32_bits(input_address);
 	u32 input_address_lo = lower_32_bits(input_address);
@@ -199,14 +197,12 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
 	if (!hv_hypercall_pg)
 		return U64_MAX;
 
-	__asm__ __volatile__("call *%[pg]"
-			     : "=A" (hv_status),
-			       "+c" (input_address_lo), "+r" (__sp)
-			     : "A" (control),
-			       "b" (input_address_hi),
-			       "D"(output_address_hi), "S"(output_address_lo),
-			       [pg] "m" (hv_hypercall_pg)
-			     : "cc", "memory");
+	ASM_CALL("call *%[pg]",
+		 OUTPUTS("=A" (hv_status), "+c" (input_address_lo)),
+		 INPUTS("A" (control), "b" (input_address_hi),
+			"D" (output_address_hi), "S" (output_address_lo),
+			[pg] "m" (hv_hypercall_pg)),
+		 CLOBBERS("cc", "memory"));
 #endif /* !x86_64 */
 	return hv_status;
 }
@@ -223,29 +219,25 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
 static inline u64 hv_do_fast_hypercall8(u16 code, u64 input1)
 {
 	u64 hv_status, control = (u64)code | HV_HYPERCALL_FAST_BIT;
-	register void *__sp asm(_ASM_SP);
 
 #ifdef CONFIG_X86_64
 	{
-		__asm__ __volatile__("call *%[pg]"
-				     : "=a" (hv_status), "+r" (__sp),
-				       "+c" (control), "+d" (input1)
-				     : [pg] "m" (hv_hypercall_pg)
-				     : "cc", "r8", "r9", "r10", "r11");
+		ASM_CALL("call *%[pg]",
+			 OUTPUTS("=a" (hv_status), "+c" (control),
+				 "+d" (input1)),
+			 INPUTS([pg] "m" (hv_hypercall_pg)),
+			 CLOBBERS("cc", "r8", "r9", "r10", "r11"));
 	}
 #else
 	{
 		u32 input1_hi = upper_32_bits(input1);
 		u32 input1_lo = lower_32_bits(input1);
 
-		__asm__ __volatile__ ("call *%[pg]"
-				      : "=A"(hv_status),
-					"+c"(input1_lo),
-					"+r"(__sp)
-				      :	"A" (control),
-					"b" (input1_hi),
-					[pg] "m" (hv_hypercall_pg)
-				      : "cc", "edi", "esi");
+		ASM_CALL("call *%[pg]",
+			 OUTPUTS("=A" (hv_status), "+c" (input1_lo)),
+			 INPUTS("A" (control), "b" (input1_hi),
+				[pg] "m" (hv_hypercall_pg)),
+			 CLOBBERS("cc", "edi", "esi"));
 	}
 #endif
 		return hv_status;
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index a509259a3181..e97143fbc4c0 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -471,8 +471,7 @@ int paravirt_disable_iospace(void);
  */
 #ifdef CONFIG_X86_32
 #define PVOP_VCALL_ARGS							\
-	unsigned long __eax = __eax, __edx = __edx, __ecx = __ecx;	\
-	register void *__sp asm("esp")
+	unsigned long __eax = __eax, __edx = __edx, __ecx = __ecx;
 #define PVOP_CALL_ARGS			PVOP_VCALL_ARGS
 
 #define PVOP_CALL_ARG1(x)		"a" ((unsigned long)(x))
@@ -492,8 +491,7 @@ int paravirt_disable_iospace(void);
 /* [re]ax isn't an arg, but the return val */
 #define PVOP_VCALL_ARGS						\
 	unsigned long __edi = __edi, __esi = __esi,		\
-		__edx = __edx, __ecx = __ecx, __eax = __eax;	\
-	register void *__sp asm("rsp")
+		__edx = __edx, __ecx = __ecx, __eax = __eax;
 #define PVOP_CALL_ARGS		PVOP_VCALL_ARGS
 
 #define PVOP_CALL_ARG1(x)		"D" ((unsigned long)(x))
@@ -541,24 +539,24 @@ int paravirt_disable_iospace(void);
 		/* This is 32-bit specific, but is okay in 64-bit */	\
 		/* since this condition will never hold */		\
 		if (sizeof(rettype) > sizeof(unsigned long)) {		\
-			asm volatile(pre				\
-				     paravirt_alt(PARAVIRT_CALL)	\
-				     post				\
-				     : outputs, "+r" (__sp)		\
-				     : paravirt_type(op),		\
-				       paravirt_clobber(clbr),		\
-				       ##__VA_ARGS__			\
-				     : "memory", "cc" extra_clbr);	\
+			ASM_CALL(pre					\
+				 paravirt_alt(PARAVIRT_CALL)		\
+				 post,					\
+				 OUTPUTS(outputs),			\
+				 INPUTS(paravirt_type(op),		\
+					paravirt_clobber(clbr),		\
+					##__VA_ARGS__),			\
+				 CLOBBERS("memory", "cc" extra_clbr));	\
 			__ret = (rettype)((((u64)__edx) << 32) | __eax); \
 		} else {						\
-			asm volatile(pre				\
-				     paravirt_alt(PARAVIRT_CALL)	\
-				     post				\
-				     : outputs, "+r" (__sp)		\
-				     : paravirt_type(op),		\
-				       paravirt_clobber(clbr),		\
-				       ##__VA_ARGS__			\
-				     : "memory", "cc" extra_clbr);	\
+			ASM_CALL(pre					\
+				 paravirt_alt(PARAVIRT_CALL)		\
+				 post,					\
+				 OUTPUTS(outputs),			\
+				 INPUTS(paravirt_type(op),		\
+					paravirt_clobber(clbr),		\
+					##__VA_ARGS__),			\
+				 CLOBBERS("memory", "cc" extra_clbr));	\
 			__ret = (rettype)(__eax & PVOP_RETMASK(rettype));	\
 		}							\
 		__ret;							\
@@ -578,14 +576,14 @@ int paravirt_disable_iospace(void);
 	({								\
 		PVOP_VCALL_ARGS;					\
 		PVOP_TEST_NULL(op);					\
-		asm volatile(pre					\
-			     paravirt_alt(PARAVIRT_CALL)		\
-			     post					\
-			     : outputs, "+r" (__sp)			\
-			     : paravirt_type(op),			\
-			       paravirt_clobber(clbr),			\
-			       ##__VA_ARGS__				\
-			     : "memory", "cc" extra_clbr);		\
+		ASM_CALL(pre						\
+			 paravirt_alt(PARAVIRT_CALL)			\
+			 post,						\
+			 OUTPUTS(outputs),				\
+			 INPUTS(paravirt_type(op),			\
+				paravirt_clobber(clbr),			\
+				##__VA_ARGS__),				\
+			 CLOBBERS("memory", "cc" extra_clbr));		\
 	})
 
 #define __PVOP_VCALL(op, pre, post, ...)				\
diff --git a/arch/x86/include/asm/preempt.h b/arch/x86/include/asm/preempt.h
index ec1f3c651150..0d9316a73af0 100644
--- a/arch/x86/include/asm/preempt.h
+++ b/arch/x86/include/asm/preempt.h
@@ -100,19 +100,14 @@ static __always_inline bool should_resched(int preempt_offset)
 
 #ifdef CONFIG_PREEMPT
   extern asmlinkage void ___preempt_schedule(void);
-# define __preempt_schedule()					\
-({								\
-	register void *__sp asm(_ASM_SP);			\
-	asm volatile ("call ___preempt_schedule" : "+r"(__sp));	\
-})
+# define __preempt_schedule() \
+	ASM_CALL("call ___preempt_schedule", OUTPUTS(), INPUTS())
 
   extern asmlinkage void preempt_schedule(void);
   extern asmlinkage void ___preempt_schedule_notrace(void);
-# define __preempt_schedule_notrace()					\
-({									\
-	register void *__sp asm(_ASM_SP);				\
-	asm volatile ("call ___preempt_schedule_notrace" : "+r"(__sp));	\
-})
+# define __preempt_schedule_notrace() \
+	ASM_CALL("call ___preempt_schedule_notrace", OUTPUTS(), INPUTS())
+
   extern asmlinkage void preempt_schedule_notrace(void);
 #endif
 
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 44d7c6f033c9..8f5cd252be53 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -677,20 +677,21 @@ static inline void sync_core(void)
 	 * Like all of Linux's memory ordering operations, this is a
 	 * compiler barrier as well.
 	 */
-	register void *__sp asm(_ASM_SP);
 
 #ifdef CONFIG_X86_32
-	asm volatile (
+	ASM_CALL(
 		"pushfl\n\t"
 		"pushl %%cs\n\t"
 		"pushl $1f\n\t"
 		"iret\n\t"
-		"1:"
-		: "+r" (__sp) : : "memory");
+		"1:",
+		OUTPUTS(),
+		INPUTS(),
+		CLOBBERS("memory"));
 #else
 	unsigned int tmp;
 
-	asm volatile (
+	ASM_CALL(
 		UNWIND_HINT_SAVE
 		"mov %%ss, %[tmp]\n\t"
 		"pushq %q[tmp]\n\t"
@@ -702,8 +703,10 @@ static inline void sync_core(void)
 		"pushq $1f\n\t"
 		"iretq\n\t"
 		UNWIND_HINT_RESTORE
-		"1:"
-		: [tmp] "=&r" (tmp), "+r" (__sp) : : "cc", "memory");
+		"1:",
+		OUTPUTS([tmp] "=&r" (tmp)),
+		INPUTS(),
+		CLOBBERS("cc", "memory"));
 #endif
 }
 
diff --git a/arch/x86/include/asm/rwsem.h b/arch/x86/include/asm/rwsem.h
index b715152fb2b5..72253028b740 100644
--- a/arch/x86/include/asm/rwsem.h
+++ b/arch/x86/include/asm/rwsem.h
@@ -62,16 +62,16 @@
  */
 static inline void __down_read(struct rw_semaphore *sem)
 {
-	asm volatile("# beginning down_read\n\t"
-		     LOCK_PREFIX _ASM_INC "(%[sem])\n\t"
-		     /* adds 0x00000001 */
-		     "  jns        1f\n\t"
-		     "  call call_rwsem_down_read_failed\n\t"
-		     "1:\n\t"
-		     "# ending down_read\n\t"
-		     : "+m" (sem->count)
-		     : [sem] "a" (sem)
-		     : "memory", "cc");
+	ASM_CALL("# beginning down_read\n\t"
+		 LOCK_PREFIX _ASM_INC "(%[sem])\n\t"
+		 /* adds 0x00000001 */
+		 "  jns        1f\n\t"
+		 "  call call_rwsem_down_read_failed\n\t"
+		 "1:\n\t"
+		 "# ending down_read\n\t",
+		 OUTPUTS("+m" (sem->count)),
+		 INPUTS([sem] "a" (sem)),
+		 CLOBBERS("memory", "cc"));
 }
 
 /*
@@ -104,22 +104,21 @@ static inline bool __down_read_trylock(struct rw_semaphore *sem)
 ({									\
 	long tmp;							\
 	struct rw_semaphore* ret;					\
-	register void *__sp asm(_ASM_SP);				\
 									\
-	asm volatile("# beginning down_write\n\t"			\
-		     LOCK_PREFIX "  xadd      %[tmp],(%[sem])\n\t"	\
-		     /* adds 0xffff0001, returns the old value */	\
-		     "  test      " __ASM_SEL_RAW(%w,%k) "[tmp],"	\
-				    __ASM_SEL_RAW(%w,%k) "[tmp]\n\t"	\
-		     /* was the active mask 0 before? */		\
-		     "  jz        1f\n"					\
-		     "  call " slow_path "\n\t"				\
-		     "1:\n\t"						\
-		     "# ending down_write\n\t"				\
-		     : "+m" (sem->count), [tmp] "=d" (tmp), "=a" (ret),	\
-		       "+r" (__sp)					\
-		     : [sem] "a" (sem), "d" (RWSEM_ACTIVE_WRITE_BIAS)	\
-		     : "memory", "cc");					\
+	ASM_CALL("# beginning down_write\n\t"				\
+		 LOCK_PREFIX "  xadd %[tmp],(%[sem])\n\t"		\
+		 /* adds 0xffff0001, returns the old value */		\
+		 "  test      " __ASM_SEL_RAW(%w,%k) "[tmp],"		\
+				__ASM_SEL_RAW(%w,%k) "[tmp]\n\t"	\
+		 /* was the active mask 0 before? */			\
+		 "  jz        1f\n"					\
+		 "  call " slow_path "\n\t"				\
+		 "1:\n\t"						\
+		 "# ending down_write\n\t",				\
+		 OUTPUTS("+m" (sem->count), [tmp] "=d" (tmp),		\
+			 "=a" (ret)),					\
+		 INPUTS([sem] "a" (sem), "d" (RWSEM_ACTIVE_WRITE_BIAS)),\
+		 CLOBBERS("memory", "cc"));				\
 	ret;								\
 })
 
@@ -170,16 +169,16 @@ static inline bool __down_write_trylock(struct rw_semaphore *sem)
 static inline void __up_read(struct rw_semaphore *sem)
 {
 	long tmp;
-	asm volatile("# beginning __up_read\n\t"
-		     LOCK_PREFIX "  xadd      %[tmp],(%[sem])\n\t"
-		     /* subtracts 1, returns the old value */
-		     "  jns        1f\n\t"
-		     "  call call_rwsem_wake\n" /* expects old value in %edx */
-		     "1:\n"
-		     "# ending __up_read\n"
-		     : "+m" (sem->count), [tmp] "=d" (tmp)
-		     : [sem] "a" (sem), "d" (-RWSEM_ACTIVE_READ_BIAS)
-		     : "memory", "cc");
+	ASM_CALL("# beginning __up_read\n\t"
+		 LOCK_PREFIX "  xadd      %[tmp],(%[sem])\n\t"
+		 /* subtracts 1, returns the old value */
+		 "  jns        1f\n\t"
+		 "  call call_rwsem_wake\n\t" /* expects old value in %edx */
+		 "1:\n\t"
+		 "# ending __up_read\n\t",
+		 OUTPUTS("+m" (sem->count), [tmp] "=d" (tmp)),
+		 INPUTS([sem] "a" (sem), "d" (-RWSEM_ACTIVE_READ_BIAS)),
+		 CLOBBERS("memory", "cc"));
 }
 
 /*
@@ -188,16 +187,16 @@ static inline void __up_read(struct rw_semaphore *sem)
 static inline void __up_write(struct rw_semaphore *sem)
 {
 	long tmp;
-	asm volatile("# beginning __up_write\n\t"
-		     LOCK_PREFIX "  xadd      %[tmp],(%[sem])\n\t"
-		     /* subtracts 0xffff0001, returns the old value */
-		     "  jns        1f\n\t"
-		     "  call call_rwsem_wake\n" /* expects old value in %edx */
-		     "1:\n\t"
-		     "# ending __up_write\n"
-		     : "+m" (sem->count), [tmp] "=d" (tmp)
-		     : [sem] "a" (sem), "d" (-RWSEM_ACTIVE_WRITE_BIAS)
-		     : "memory", "cc");
+	ASM_CALL("# beginning __up_write\n\t"
+		 LOCK_PREFIX "  xadd      %[tmp],(%[sem])\n\t"
+		 /* subtracts 0xffff0001, returns the old value */
+		 "  jns        1f\n\t"
+		 "  call call_rwsem_wake\n\t" /* expects old value in %edx */
+		 "1:\n\t"
+		 "# ending __up_write\n\t",
+		 OUTPUTS("+m" (sem->count), [tmp] "=d" (tmp)),
+		 INPUTS([sem] "a" (sem), "d" (-RWSEM_ACTIVE_WRITE_BIAS)),
+		 CLOBBERS("memory", "cc"));
 }
 
 /*
@@ -205,19 +204,19 @@ static inline void __up_write(struct rw_semaphore *sem)
  */
 static inline void __downgrade_write(struct rw_semaphore *sem)
 {
-	asm volatile("# beginning __downgrade_write\n\t"
-		     LOCK_PREFIX _ASM_ADD "%[bias],(%[sem])\n\t"
-		     /*
-		      * transitions 0xZZZZ0001 -> 0xYYYY0001 (i386)
-		      *     0xZZZZZZZZ00000001 -> 0xYYYYYYYY00000001 (x86_64)
-		      */
-		     "  jns       1f\n\t"
-		     "  call call_rwsem_downgrade_wake\n"
-		     "1:\n\t"
-		     "# ending __downgrade_write\n"
-		     : "+m" (sem->count)
-		     : [sem] "a" (sem), [bias] "er" (-RWSEM_WAITING_BIAS)
-		     : "memory", "cc");
+	ASM_CALL("# beginning __downgrade_write\n\t"
+		 LOCK_PREFIX _ASM_ADD "%[bias],(%[sem])\n\t"
+		 /*
+		  * transitions 0xZZZZ0001 -> 0xYYYY0001 (i386)
+		  * 0xZZZZZZZZ00000001 -> 0xYYYYYYYY00000001 (x86_64)
+		  */
+		  "  jns       1f\n\t"
+		  "  call call_rwsem_downgrade_wake\n\t"
+		  "1:\n\t"
+		  "# ending __downgrade_write\n\t",
+		  OUTPUTS("+m" (sem->count)),
+		  INPUTS([sem] "a" (sem), [bias] "er" (-RWSEM_WAITING_BIAS)),
+		  CLOBBERS("memory", "cc"));
 }
 
 #endif /* __KERNEL__ */
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 12fb37310872..1d43478f95bd 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -166,20 +166,20 @@ __typeof__(__builtin_choose_expr(sizeof(x) > sizeof(0UL), 0ULL, 0UL))
 ({									\
 	int __ret_gu;							\
 	register __inttype(*(ptr)) __val_gu asm("%"_ASM_DX);		\
-	register void *__sp asm(_ASM_SP);				\
 	__chk_user_ptr(ptr);						\
 	might_fault();							\
-	asm volatile("call __get_user_%P[size]"				\
-		     : "=a" (__ret_gu), "=r" (__val_gu), "+r" (__sp)	\
-		     : "a" (ptr), [size] "i" (sizeof(*(ptr))));		\
+	ASM_CALL("call __get_user_%P[size]",				\
+		 OUTPUTS("=a" (__ret_gu), "=r" (__val_gu)),		\
+		 INPUTS("a" (ptr), [size] "i" (sizeof(*(ptr)))));	\
 	(x) = (__force __typeof__(*(ptr))) __val_gu;			\
 	__builtin_expect(__ret_gu, 0);					\
 })
 
-#define __put_user_x(size, x, ptr, __ret_pu)			\
-	asm volatile("call __put_user_" #size : "=a" (__ret_pu)	\
-		     : "a" ((typeof(*(ptr)))(x)), "c" (ptr) : "ebx")
-
+#define __put_user_x(size, x, ptr, __ret_pu)				\
+	ASM_CALL("call __put_user_" #size,				\
+		 OUTPUTS("=a" (__ret_pu)),				\
+		 INPUTS("a" ((typeof(*(ptr)))(x)), "c" (ptr)),		\
+		 CLOBBERS("ebx"))
 
 
 #ifdef CONFIG_X86_32
@@ -206,9 +206,11 @@ __typeof__(__builtin_choose_expr(sizeof(x) > sizeof(0UL), 0ULL, 0UL))
 		     _ASM_EXTABLE_EX(2b, 3b)				\
 		     : : "A" (x), "r" (addr))
 
-#define __put_user_x8(x, ptr, __ret_pu)				\
-	asm volatile("call __put_user_8" : "=a" (__ret_pu)	\
-		     : "A" ((typeof(*(ptr)))(x)), "c" (ptr) : "ebx")
+#define __put_user_x8(x, ptr, __ret_pu)					\
+	ASM_CALL("call __put_user_8",					\
+		 OUTPUTS("=a" (__ret_pu)),				\
+		 INPUTS("A" ((typeof(*(ptr)))(x)), "c" (ptr)),		\
+		 CLOBBERS("ebx"))
 #else
 #define __put_user_asm_u64(x, ptr, retval, errret) \
 	__put_user_asm(x, ptr, retval, "q", "", "er", errret)
diff --git a/arch/x86/include/asm/xen/hypercall.h b/arch/x86/include/asm/xen/hypercall.h
index 9606688caa4b..7f205139525d 100644
--- a/arch/x86/include/asm/xen/hypercall.h
+++ b/arch/x86/include/asm/xen/hypercall.h
@@ -114,9 +114,8 @@ extern struct { char _entry[32]; } hypercall_page[];
 	register unsigned long __arg3 asm(__HYPERCALL_ARG3REG) = __arg3; \
 	register unsigned long __arg4 asm(__HYPERCALL_ARG4REG) = __arg4; \
 	register unsigned long __arg5 asm(__HYPERCALL_ARG5REG) = __arg5; \
-	register void *__sp asm(_ASM_SP);
 
-#define __HYPERCALL_0PARAM	"=r" (__res), "+r" (__sp)
+#define __HYPERCALL_0PARAM	"=r" (__res)
 #define __HYPERCALL_1PARAM	__HYPERCALL_0PARAM, "+r" (__arg1)
 #define __HYPERCALL_2PARAM	__HYPERCALL_1PARAM, "+r" (__arg2)
 #define __HYPERCALL_3PARAM	__HYPERCALL_2PARAM, "+r" (__arg3)
@@ -146,10 +145,10 @@ extern struct { char _entry[32]; } hypercall_page[];
 ({									\
 	__HYPERCALL_DECLS;						\
 	__HYPERCALL_0ARG();						\
-	asm volatile (__HYPERCALL					\
-		      : __HYPERCALL_0PARAM				\
-		      : __HYPERCALL_ENTRY(name)				\
-		      : __HYPERCALL_CLOBBER0);				\
+	ASM_CALL(__HYPERCALL,						\
+		 OUTPUTS(__HYPERCALL_0PARAM),				\
+		 INPUTS(__HYPERCALL_ENTRY(name)),			\
+		 CLOBBERS(__HYPERCALL_CLOBBER0));			\
 	(type)__res;							\
 })
 
@@ -157,10 +156,10 @@ extern struct { char _entry[32]; } hypercall_page[];
 ({									\
 	__HYPERCALL_DECLS;						\
 	__HYPERCALL_1ARG(a1);						\
-	asm volatile (__HYPERCALL					\
-		      : __HYPERCALL_1PARAM				\
-		      : __HYPERCALL_ENTRY(name)				\
-		      : __HYPERCALL_CLOBBER1);				\
+	ASM_CALL(__HYPERCALL,						\
+		 OUTPUTS(__HYPERCALL_1PARAM),				\
+		 INPUTS(__HYPERCALL_ENTRY(name)),			\
+		 CLOBBERS(__HYPERCALL_CLOBBER1));			\
 	(type)__res;							\
 })
 
@@ -168,10 +167,10 @@ extern struct { char _entry[32]; } hypercall_page[];
 ({									\
 	__HYPERCALL_DECLS;						\
 	__HYPERCALL_2ARG(a1, a2);					\
-	asm volatile (__HYPERCALL					\
-		      : __HYPERCALL_2PARAM				\
-		      : __HYPERCALL_ENTRY(name)				\
-		      : __HYPERCALL_CLOBBER2);				\
+	ASM_CALL(__HYPERCALL,						\
+		 OUTPUTS(__HYPERCALL_2PARAM),				\
+		 INPUTS(__HYPERCALL_ENTRY(name)),			\
+		 CLOBBERS(__HYPERCALL_CLOBBER2));			\
 	(type)__res;							\
 })
 
@@ -179,10 +178,10 @@ extern struct { char _entry[32]; } hypercall_page[];
 ({									\
 	__HYPERCALL_DECLS;						\
 	__HYPERCALL_3ARG(a1, a2, a3);					\
-	asm volatile (__HYPERCALL					\
-		      : __HYPERCALL_3PARAM				\
-		      : __HYPERCALL_ENTRY(name)				\
-		      : __HYPERCALL_CLOBBER3);				\
+	ASM_CALL(__HYPERCALL,						\
+		 OUTPUTS(__HYPERCALL_3PARAM),				\
+		 INPUTS(__HYPERCALL_ENTRY(name)),			\
+		 CLOBBERS(__HYPERCALL_CLOBBER3));			\
 	(type)__res;							\
 })
 
@@ -190,10 +189,10 @@ extern struct { char _entry[32]; } hypercall_page[];
 ({									\
 	__HYPERCALL_DECLS;						\
 	__HYPERCALL_4ARG(a1, a2, a3, a4);				\
-	asm volatile (__HYPERCALL					\
-		      : __HYPERCALL_4PARAM				\
-		      : __HYPERCALL_ENTRY(name)				\
-		      : __HYPERCALL_CLOBBER4);				\
+	ASM_CALL(__HYPERCALL,						\
+		 OUTPUTS(__HYPERCALL_4PARAM),				\
+		 INPUTS(__HYPERCALL_ENTRY(name)),			\
+		 CLOBBERS(__HYPERCALL_CLOBBER4));			\
 	(type)__res;							\
 })
 
@@ -201,10 +200,10 @@ extern struct { char _entry[32]; } hypercall_page[];
 ({									\
 	__HYPERCALL_DECLS;						\
 	__HYPERCALL_5ARG(a1, a2, a3, a4, a5);				\
-	asm volatile (__HYPERCALL					\
-		      : __HYPERCALL_5PARAM				\
-		      : __HYPERCALL_ENTRY(name)				\
-		      : __HYPERCALL_CLOBBER5);				\
+	ASM_CALL(__HYPERCALL,						\
+		 OUTPUTS(__HYPERCALL_5PARAM),				\
+		 INPUTS(__HYPERCALL_ENTRY(name)),			\
+		 CLOBBERS(__HYPERCALL_CLOBBER5));			\
 	(type)__res;							\
 })
 
@@ -218,10 +217,10 @@ privcmd_call(unsigned call,
 	__HYPERCALL_5ARG(a1, a2, a3, a4, a5);
 
 	stac();
-	asm volatile("call *%[call]"
-		     : __HYPERCALL_5PARAM
-		     : [call] "a" (&hypercall_page[call])
-		     : __HYPERCALL_CLOBBER5);
+	ASM_CALL("call *%[call]",
+		 OUTPUTS(__HYPERCALL_5PARAM),
+		 INPUTS([call] "a" (&hypercall_page[call])),
+		 CLOBBERS(__HYPERCALL_CLOBBER5));
 	clac();
 
 	return (long)__res;
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index fb0055953fbc..d1ac8b58e5df 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -5284,16 +5284,15 @@ static void fetch_possible_mmx_operand(struct x86_emulate_ctxt *ctxt,
 
 static int fastop(struct x86_emulate_ctxt *ctxt, void (*fop)(struct fastop *))
 {
-	register void *__sp asm(_ASM_SP);
 	ulong flags = (ctxt->eflags & EFLAGS_MASK) | X86_EFLAGS_IF;
 
 	if (!(ctxt->d & ByteOp))
 		fop += __ffs(ctxt->dst.bytes) * FASTOP_SIZE;
 
-	asm("push %[flags]; popf; call *%[fastop]; pushf; pop %[flags]\n"
-	    : "+a"(ctxt->dst.val), "+d"(ctxt->src.val), [flags]"+D"(flags),
-	      [fastop]"+S"(fop), "+r"(__sp)
-	    : "c"(ctxt->src2.val));
+	ASM_CALL("push %[flags]; popf; call *%[fastop]; pushf; pop %[flags]\n",
+		 OUTPUTS("+a"(ctxt->dst.val), "+d"(ctxt->src.val),
+			 [flags]"+D"(flags), [fastop]"+S"(fop)),
+		 INPUTS("c"(ctxt->src2.val)));
 
 	ctxt->eflags = (ctxt->eflags & ~EFLAGS_MASK) | (flags & EFLAGS_MASK);
 	if (!fop) /* exception is returned in fop variable */
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 70b90c0810d0..16f0c782204f 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -8765,7 +8765,6 @@ static void vmx_complete_atomic_exit(struct vcpu_vmx *vmx)
 static void vmx_handle_external_intr(struct kvm_vcpu *vcpu)
 {
 	u32 exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
-	register void *__sp asm(_ASM_SP);
 
 	if ((exit_intr_info & (INTR_INFO_VALID_MASK | INTR_INFO_INTR_TYPE_MASK))
 			== (INTR_INFO_VALID_MASK | INTR_TYPE_EXT_INTR)) {
@@ -8780,7 +8779,7 @@ static void vmx_handle_external_intr(struct kvm_vcpu *vcpu)
 		vector =  exit_intr_info & INTR_INFO_VECTOR_MASK;
 		desc = (gate_desc *)vmx->host_idt_base + vector;
 		entry = gate_offset(desc);
-		asm volatile(
+		ASM_CALL(
 #ifdef CONFIG_X86_64
 			"mov %%" _ASM_SP ", %[sp]\n\t"
 			"and $0xfffffffffffffff0, %%" _ASM_SP "\n\t"
@@ -8789,16 +8788,14 @@ static void vmx_handle_external_intr(struct kvm_vcpu *vcpu)
 #endif
 			"pushf\n\t"
 			__ASM_SIZE(push) " $%c[cs]\n\t"
-			"call *%[entry]\n\t"
-			:
+			"call *%[entry]\n\t",
 #ifdef CONFIG_X86_64
-			[sp]"=&r"(tmp),
+			[sp]"=&r"(tmp)
 #endif
-			"+r"(__sp)
-			:
-			[entry]"r"(entry),
-			[ss]"i"(__KERNEL_DS),
-			[cs]"i"(__KERNEL_CS)
+			OUTPUTS(),
+			INPUTS([entry] "r" (entry),
+			       [ss] "i" (__KERNEL_DS),
+			       [cs] "i" (__KERNEL_CS))
 			);
 	}
 }
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index b836a7274e12..395fb8108744 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -806,7 +806,6 @@ no_context(struct pt_regs *regs, unsigned long error_code,
 	if (is_vmalloc_addr((void *)address) &&
 	    (((unsigned long)tsk->stack - 1 - address < PAGE_SIZE) ||
 	     address - ((unsigned long)tsk->stack + THREAD_SIZE) < PAGE_SIZE)) {
-		register void *__sp asm("rsp");
 		unsigned long stack = this_cpu_read(orig_ist.ist[DOUBLEFAULT_STACK]) - sizeof(void *);
 		/*
 		 * We're likely to be running with very little stack space
@@ -818,13 +817,13 @@ no_context(struct pt_regs *regs, unsigned long error_code,
 		 * and then double-fault, though, because we're likely to
 		 * break the console driver and lose most of the stack dump.
 		 */
-		asm volatile ("movq %[stack], %%rsp\n\t"
-			      "call handle_stack_overflow\n\t"
-			      "1: jmp 1b"
-			      : "+r" (__sp)
-			      : "D" ("kernel stack overflow (page fault)"),
+		ASM_CALL("movq %[stack], %%rsp\n\t"
+			 "call handle_stack_overflow\n\t"
+			 "1: jmp 1b",
+			 OUTPUTS(),
+			 INPUTS("D" ("kernel stack overflow (page fault)"),
 				"S" (regs), "d" (address),
-				[stack] "rm" (stack));
+				[stack] "rm" (stack)));
 		unreachable();
 	}
 #endif
diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h
index de179993e039..8523591a092a 100644
--- a/include/linux/compiler-clang.h
+++ b/include/linux/compiler-clang.h
@@ -15,3 +15,5 @@
  * with any version that can compile the kernel
  */
 #define __UNIQUE_ID(prefix) __PASTE(__PASTE(__UNIQUE_ID_, prefix), __COUNTER__)
+
+#undef ASM_CALL
diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
index 16d41de92ee3..08e5f24147ed 100644
--- a/include/linux/compiler-gcc.h
+++ b/include/linux/compiler-gcc.h
@@ -128,6 +128,25 @@
 #define __always_unused		__attribute__((unused))
 #define __mode(x)               __attribute__((mode(x)))
 
+#ifdef CONFIG_FRAME_POINTER
+/*
+ * All x86 inline asm statements with a 'call' instruction must use this macro.
+ * It ensures that GCC sets up the containing function's frame pointer before
+ * inserting the asm.
+ *
+ * WARNING: Positional operand names ("%0") and constraints ("0" (foo)) are
+ * 	    not allowed.
+ */
+#define ASM_CALL(str, outputs, inputs, clobbers...)			\
+({									\
+	register void *__sp asm(_ASM_SP);				\
+	asm volatile(str						\
+		     : "+r" (__sp) ARGS_APPEND(outputs)			\
+		     : inputs						\
+		     CLOBBERS_APPEND(clobbers));			\
+})
+#endif
+
 /* gcc version specific checks */
 
 #if GCC_VERSION < 30200
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index c738966434c1..4843a3843103 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -649,4 +649,9 @@ static __always_inline void __write_once_size(volatile void *p, void *res, int s
 #define CLOBBERS_APPEND(...) \
 	_CLOBBERS_APPEND(HAS_ARGS(__VA_ARGS__), __VA_ARGS__)
 
+#ifndef ASM_CALL
+# define ASM_CALL(str, outputs, inputs, clobbers...) \
+	asm volatile(str : outputs : inputs CLOBBERS_APPEND(clobbers))
+#endif
+
 #endif /* __LINUX_COMPILER_H */
diff --git a/tools/objtool/Documentation/stack-validation.txt b/tools/objtool/Documentation/stack-validation.txt
index 6a1af43862df..766a00ebf80c 100644
--- a/tools/objtool/Documentation/stack-validation.txt
+++ b/tools/objtool/Documentation/stack-validation.txt
@@ -193,13 +193,8 @@ they mean, and suggestions for how to fix them.
 
    If it's a GCC-compiled .c file, the error may be because the function
    uses an inline asm() statement which has a "call" instruction.  An
-   asm() statement with a call instruction must declare the use of the
-   stack pointer in its output operand.  For example, on x86_64:
-
-     register void *__sp asm("rsp");
-     asm volatile("call func" : "+r" (__sp));
-
-   Otherwise the stack frame may not get created before the call.
+   asm() statement with a call instruction must use the ASM_CALL macro,
+   which forces the frame pointer to be saved before the call.
 
 
 2. file.o: warning: objtool: .text+0x53: unreachable instruction
@@ -221,7 +216,7 @@ they mean, and suggestions for how to fix them.
    change ENDPROC to END.
 
 
-4. file.o: warning: objtool: func(): can't find starting instruction
+3. file.o: warning: objtool: func(): can't find starting instruction
    or
    file.o: warning: objtool: func()+0x11dd: can't decode instruction
 
@@ -230,7 +225,7 @@ they mean, and suggestions for how to fix them.
    section like .data or .rodata.
 
 
-5. file.o: warning: objtool: func()+0x6: unsupported instruction in callable function
+4. file.o: warning: objtool: func()+0x6: unsupported instruction in callable function
 
    This is a kernel entry/exit instruction like sysenter or iret.  Such
    instructions aren't allowed in a callable function, and are most
@@ -239,7 +234,7 @@ they mean, and suggestions for how to fix them.
    annotated with the unwind hint macros in asm/unwind_hints.h.
 
 
-6. file.o: warning: objtool: func()+0x26: sibling call from callable instruction with modified stack frame
+5. file.o: warning: objtool: func()+0x26: sibling call from callable instruction with modified stack frame
 
    This is a dynamic jump or a jump to an undefined symbol.  Objtool
    assumed it's a sibling call and detected that the frame pointer
@@ -253,7 +248,7 @@ they mean, and suggestions for how to fix them.
    the unwind hint macros in asm/unwind_hints.h.
 
 
-7. file: warning: objtool: func()+0x5c: stack state mismatch
+6. file: warning: objtool: func()+0x5c: stack state mismatch
 
    The instruction's frame pointer state is inconsistent, depending on
    which execution path was taken to reach the instruction.
@@ -270,7 +265,7 @@ they mean, and suggestions for how to fix them.
    asm/unwind_hints.h.
 
 
-8. file.o: warning: objtool: funcA() falls through to next function funcB()
+7. file.o: warning: objtool: funcA() falls through to next function funcB()
 
    This means that funcA() doesn't end with a return instruction or an
    unconditional jump, and that objtool has determined that the function
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 4/4] x86/asm: Use ASM_CALL() macro for inline asm statements with call instructions
  2017-08-31 14:11 ` [RFC PATCH 4/4] x86/asm: Use ASM_CALL() macro for inline asm statements with call instructions Josh Poimboeuf
@ 2017-08-31 14:50   ` Peter Zijlstra
  2017-08-31 15:21     ` Josh Poimboeuf
  0 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2017-08-31 14:50 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, linux-kernel, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Andy Lutomirski, Linus Torvalds, Alexander Potapenko,
	Dmitriy Vyukov, Matthias Kaehlcke, Arnd Bergmann

On Thu, Aug 31, 2017 at 09:11:20AM -0500, Josh Poimboeuf wrote:
> Inline asm statements which have call instructions can be problematic.
> GCC doesn't know about the call instructions, so in some cases it can
> insert the asm before setting up the frame pointer.  This can result in
> bad stack traces when unwinding from the called function.
> 
> Previously we worked around this issue by listing the stack pointer as
> an input/output constraint for the inline asm.  That works for GCC, but
> unfortunately it doesn't work for Clang.  In fact, it causes Clang to
> corrupt the stack pointer.

Sounds like it ought to get fixed regardless and then it might as well
do the right thing ;-)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 4/4] x86/asm: Use ASM_CALL() macro for inline asm statements with call instructions
  2017-08-31 14:50   ` Peter Zijlstra
@ 2017-08-31 15:21     ` Josh Poimboeuf
  2017-08-31 15:36       ` Dmitry Vyukov
  0 siblings, 1 reply; 26+ messages in thread
From: Josh Poimboeuf @ 2017-08-31 15:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, linux-kernel, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Andy Lutomirski, Linus Torvalds, Alexander Potapenko,
	Dmitriy Vyukov, Matthias Kaehlcke, Arnd Bergmann

On Thu, Aug 31, 2017 at 04:50:41PM +0200, Peter Zijlstra wrote:
> On Thu, Aug 31, 2017 at 09:11:20AM -0500, Josh Poimboeuf wrote:
> > Inline asm statements which have call instructions can be problematic.
> > GCC doesn't know about the call instructions, so in some cases it can
> > insert the asm before setting up the frame pointer.  This can result in
> > bad stack traces when unwinding from the called function.
> > 
> > Previously we worked around this issue by listing the stack pointer as
> > an input/output constraint for the inline asm.  That works for GCC, but
> > unfortunately it doesn't work for Clang.  In fact, it causes Clang to
> > corrupt the stack pointer.
> 
> Sounds like it ought to get fixed regardless and then it might as well
> do the right thing ;-)

There was some disagreement about what the "right thing" is because it's
an undocumented and unintuitive interface.

And I use the term "interface" loosely.  It was apparently a side effect
which was mentioned to me on the GCC mailing list.

-- 
Josh

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 4/4] x86/asm: Use ASM_CALL() macro for inline asm statements with call instructions
  2017-08-31 15:21     ` Josh Poimboeuf
@ 2017-08-31 15:36       ` Dmitry Vyukov
  0 siblings, 0 replies; 26+ messages in thread
From: Dmitry Vyukov @ 2017-08-31 15:36 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Peter Zijlstra, x86, LKML, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, Andy Lutomirski, Linus Torvalds,
	Alexander Potapenko, Matthias Kaehlcke, Arnd Bergmann

On Thu, Aug 31, 2017 at 5:21 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> On Thu, Aug 31, 2017 at 04:50:41PM +0200, Peter Zijlstra wrote:
>> On Thu, Aug 31, 2017 at 09:11:20AM -0500, Josh Poimboeuf wrote:
>> > Inline asm statements which have call instructions can be problematic.
>> > GCC doesn't know about the call instructions, so in some cases it can
>> > insert the asm before setting up the frame pointer.  This can result in
>> > bad stack traces when unwinding from the called function.
>> >
>> > Previously we worked around this issue by listing the stack pointer as
>> > an input/output constraint for the inline asm.  That works for GCC, but
>> > unfortunately it doesn't work for Clang.  In fact, it causes Clang to
>> > corrupt the stack pointer.
>>
>> Sounds like it ought to get fixed regardless and then it might as well
>> do the right thing ;-)
>
> There was some disagreement about what the "right thing" is because it's
> an undocumented and unintuitive interface.
>
> And I use the term "interface" loosely.  It was apparently a side effect
> which was mentioned to me on the GCC mailing list.

Yes, as far as I understand, there is just no defined semantics for
this. Passing sp as is when asm block asks to pass in sp looks like a
perfectly reasonable thing to do (also faster code). We could use
something like asm("..." ::: "frame"), but we don't have this in
compilers.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 3/4] x86/asm: Make alternative macro interfaces more clear and consistent
  2017-08-31 14:11 ` [RFC PATCH 3/4] x86/asm: Make alternative macro interfaces more clear and consistent Josh Poimboeuf
@ 2017-08-31 16:11   ` Linus Torvalds
  2017-08-31 17:25     ` Josh Poimboeuf
  0 siblings, 1 reply; 26+ messages in thread
From: Linus Torvalds @ 2017-08-31 16:11 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Ingo Molnar,
	Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Alexander Potapenko, Dmitriy Vyukov, Matthias Kaehlcke,
	Arnd Bergmann, Peter Zijlstra

On Thu, Aug 31, 2017 at 7:11 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>
> Make the following changes:
>
> - Give alternative_io(), alternative_call(), and alternative_call_2()
>   consistent interfaces.  The constraints are provided by use of the
>   OUTPUTS(), INPUTS(), and CLOBBERS() macros.

I really think those macro names are way too generic. Putting them in
a core header file and expecting people to never use them is wrong.

So please rename things like "OUTPUTS()" to "ASM_OUTPUTS()" or something.

Yes, it might look slightly worse in the asm expansion.  And no, we
don't actually seem to have anybody using those right now, but I did
find people using both OUTPUT and INPUT in some C files, so those
names are clearly not very unique or distinct.

On the whole, I'm not entirely sure this is the right approach. I
think we should

 (a) approach clang about their obvious bug (a compiler that clobbers
%rsp because we mark it as in/out is clearly buggy)

 (b) ask gcc people if there's some other alternative that would work
with clang as-is rather than the "mark %rsp register as clobbered"

I couldn't actually find the %rsp trick in any docs, I assume it came
from discussions with gcc developers directly. Maybe there is
something else we could do that doesn't upset clang?

Perhaps we can mark the frame pointer as an input, for example? Inputs
also have the advantage that appending to the input list doesn't
change the argument numbering, so we don't need to worry about
numbered arguments (not that I mind the naming of arguments, but I
kind of hate having to do it as part of this series).

Hmm?

              Linus

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 3/4] x86/asm: Make alternative macro interfaces more clear and consistent
  2017-08-31 16:11   ` Linus Torvalds
@ 2017-08-31 17:25     ` Josh Poimboeuf
  2017-08-31 17:31       ` Josh Poimboeuf
  2017-09-15 16:53       ` Andrey Ryabinin
  0 siblings, 2 replies; 26+ messages in thread
From: Josh Poimboeuf @ 2017-08-31 17:25 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Ingo Molnar,
	Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Alexander Potapenko, Dmitriy Vyukov, Matthias Kaehlcke,
	Arnd Bergmann, Peter Zijlstra

On Thu, Aug 31, 2017 at 09:11:54AM -0700, Linus Torvalds wrote:
> On Thu, Aug 31, 2017 at 7:11 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> >
> > Make the following changes:
> >
> > - Give alternative_io(), alternative_call(), and alternative_call_2()
> >   consistent interfaces.  The constraints are provided by use of the
> >   OUTPUTS(), INPUTS(), and CLOBBERS() macros.
> 
> I really think those macro names are way too generic. Putting them in
> a core header file and expecting people to never use them is wrong.
> 
> So please rename things like "OUTPUTS()" to "ASM_OUTPUTS()" or something.
> 
> Yes, it might look slightly worse in the asm expansion.  And no, we
> don't actually seem to have anybody using those right now, but I did
> find people using both OUTPUT and INPUT in some C files, so those
> names are clearly not very unique or distinct.

Makes sense.  I can prepend them with "ASM_".

> On the whole, I'm not entirely sure this is the right approach. I
> think we should
> 
>  (a) approach clang about their obvious bug (a compiler that clobbers
> %rsp because we mark it as in/out is clearly buggy)

Yeah, this would be a good idea.

>  (b) ask gcc people if there's some other alternative that would work
> with clang as-is rather than the "mark %rsp register as clobbered"
>
> I couldn't actually find the %rsp trick in any docs, I assume it came
> from discussions with gcc developers directly. Maybe there is
> something else we could do that doesn't upset clang?

There have been a few other ideas which have *almost* worked:

1) Make the 'register void *__sp asm(_ASM_SP)' a global variable instead
   of a local one.  This works for GCC and doesn't break clang.  However
   it resulted in a lot of changed code on the GCC side.  It looked like
   some optimizations had been disabled, even in functions which
   shouldn't have been affected.

2) Put "sp" in the clobbers list instead of as an i/o constraint.  This
   mostly works for GCC, and doesn't break clang.  However, it causes
   GCC to insert a "lea -0x10(%rbp),%rsp" in the epilogue of every
   affected function.

I can ping the GCC list again and see if there are any other ideas.

> Perhaps we can mark the frame pointer as an input, for example? Inputs
> also have the advantage that appending to the input list doesn't
> change the argument numbering, so we don't need to worry about
> numbered arguments (not that I mind the naming of arguments, but I
> kind of hate having to do it as part of this series).

I'll give it a shot :-)

I'm about to disappear for a few days to celebrate the American labor
movement, so I'll try to follow up on this stuff next week.

-- 
Josh

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 3/4] x86/asm: Make alternative macro interfaces more clear and consistent
  2017-08-31 17:25     ` Josh Poimboeuf
@ 2017-08-31 17:31       ` Josh Poimboeuf
  2017-09-02 10:32         ` Ingo Molnar
  2017-09-15 16:53       ` Andrey Ryabinin
  1 sibling, 1 reply; 26+ messages in thread
From: Josh Poimboeuf @ 2017-08-31 17:31 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Ingo Molnar,
	Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Alexander Potapenko, Dmitriy Vyukov, Matthias Kaehlcke,
	Arnd Bergmann, Peter Zijlstra

On Thu, Aug 31, 2017 at 12:25:42PM -0500, Josh Poimboeuf wrote:
> 2) Put "sp" in the clobbers list instead of as an i/o constraint.  This
>    mostly works for GCC, and doesn't break clang.  However, it causes
>    GCC to insert a "lea -0x10(%rbp),%rsp" in the epilogue of every
>    affected function.

And maybe this extra instruction is negligible for performance and not a
big deal?  I might look at this one after the holiday too.

-- 
Josh

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 3/4] x86/asm: Make alternative macro interfaces more clear and consistent
  2017-08-31 17:31       ` Josh Poimboeuf
@ 2017-09-02 10:32         ` Ingo Molnar
  2017-09-14 14:48           ` Josh Poimboeuf
  0 siblings, 1 reply; 26+ messages in thread
From: Ingo Molnar @ 2017-09-02 10:32 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Linus Torvalds, the arch/x86 maintainers,
	Linux Kernel Mailing List, Thomas Gleixner, H. Peter Anvin,
	Andy Lutomirski, Alexander Potapenko, Dmitriy Vyukov,
	Matthias Kaehlcke, Arnd Bergmann, Peter Zijlstra


* Josh Poimboeuf <jpoimboe@redhat.com> wrote:

> On Thu, Aug 31, 2017 at 12:25:42PM -0500, Josh Poimboeuf wrote:
> > 2) Put "sp" in the clobbers list instead of as an i/o constraint.  This
> >    mostly works for GCC, and doesn't break clang.  However, it causes
> >    GCC to insert a "lea -0x10(%rbp),%rsp" in the epilogue of every
> >    affected function.
> 
> And maybe this extra instruction is negligible for performance and not a
> big deal?  I might look at this one after the holiday too.

Please do statistics of how many functions are affected, on a defconfig-ish 
kernel.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 3/4] x86/asm: Make alternative macro interfaces more clear and consistent
  2017-09-02 10:32         ` Ingo Molnar
@ 2017-09-14 14:48           ` Josh Poimboeuf
  2017-09-14 17:16             ` Linus Torvalds
  0 siblings, 1 reply; 26+ messages in thread
From: Josh Poimboeuf @ 2017-09-14 14:48 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, the arch/x86 maintainers,
	Linux Kernel Mailing List, Thomas Gleixner, H. Peter Anvin,
	Andy Lutomirski, Alexander Potapenko, Dmitriy Vyukov,
	Matthias Kaehlcke, Arnd Bergmann, Peter Zijlstra

On Sat, Sep 02, 2017 at 12:32:21PM +0200, Ingo Molnar wrote:
> 
> * Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> 
> > On Thu, Aug 31, 2017 at 12:25:42PM -0500, Josh Poimboeuf wrote:
> > > 2) Put "sp" in the clobbers list instead of as an i/o constraint.  This
> > >    mostly works for GCC, and doesn't break clang.  However, it causes
> > >    GCC to insert a "lea -0x10(%rbp),%rsp" in the epilogue of every
> > >    affected function.
> > 
> > And maybe this extra instruction is negligible for performance and not a
> > big deal?  I might look at this one after the holiday too.
> 
> Please do statistics of how many functions are affected, on a defconfig-ish 
> kernel.

As it turns out, the real problem with this option is that it imposes a
penalty for CONFIG_FRAME_POINTER=n: even with frame pointers disabled,
it forces the frame pointer to be saved for each function which uses the
inline asm "call" statements.  Our current solution doesn't do that.

- On a defconfig-based kernel, this adds +6k of .text (+0.06%).

- On a Fedora distro-based config, it adds +27k of .text (+0.3%).
  (I think the difference from defconfig is mostly caused by
  CONFIG_PARAVIRT.)

I'll try a few more experiments, but I'll probably end up engaging the
compiler people as Linus suggested.

-- 
Josh

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 3/4] x86/asm: Make alternative macro interfaces more clear and consistent
  2017-09-14 14:48           ` Josh Poimboeuf
@ 2017-09-14 17:16             ` Linus Torvalds
  2017-09-14 17:26               ` Josh Poimboeuf
  0 siblings, 1 reply; 26+ messages in thread
From: Linus Torvalds @ 2017-09-14 17:16 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Ingo Molnar, the arch/x86 maintainers, Linux Kernel Mailing List,
	Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Alexander Potapenko, Dmitriy Vyukov, Matthias Kaehlcke,
	Arnd Bergmann, Peter Zijlstra

On Thu, Sep 14, 2017 at 7:48 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>
> As it turns out, the real problem with this option is that it imposes a
> penalty for CONFIG_FRAME_POINTER=n: even with frame pointers disabled,
> it forces the frame pointer to be saved for each function which uses the
> inline asm "call" statements.  Our current solution doesn't do that.

But couldn't we make the whole stack pointer clobber be dependent on
CONFIG_FRAME_POINTER?

The only reason we do it is to make sure the frame pointer is set up
before the inline asm is emitted, but with frame pointers disabled we
don't need to.

Or was there some other compiler issue?

But yes, talking to the compiler people is a good idea anyway. Both
the clang ones (marking rsp as an in/out register *really* shouldn't
cause them to generate wrong code) and the gcc people (to see if there
are other alternatives).

                      Linus

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 3/4] x86/asm: Make alternative macro interfaces more clear and consistent
  2017-09-14 17:16             ` Linus Torvalds
@ 2017-09-14 17:26               ` Josh Poimboeuf
  2017-09-14 17:33                 ` Josh Poimboeuf
  0 siblings, 1 reply; 26+ messages in thread
From: Josh Poimboeuf @ 2017-09-14 17:26 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, the arch/x86 maintainers, Linux Kernel Mailing List,
	Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Alexander Potapenko, Dmitriy Vyukov, Matthias Kaehlcke,
	Arnd Bergmann, Peter Zijlstra

On Thu, Sep 14, 2017 at 10:16:08AM -0700, Linus Torvalds wrote:
> On Thu, Sep 14, 2017 at 7:48 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> >
> > As it turns out, the real problem with this option is that it imposes a
> > penalty for CONFIG_FRAME_POINTER=n: even with frame pointers disabled,
> > it forces the frame pointer to be saved for each function which uses the
> > inline asm "call" statements.  Our current solution doesn't do that.
> 
> But couldn't we make the whole stack pointer clobber be dependent on
> CONFIG_FRAME_POINTER?
> 
> The only reason we do it is to make sure the frame pointer is set up
> before the inline asm is emitted, but with frame pointers disabled we
> don't need to.

We could, but then that would mean either:

 a) uglifying the 15 or so relevant inline asm locations with ifdefs; or

 b) using my ASM_CALL macro, which I think you frowned upon?

-- 
Josh

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 3/4] x86/asm: Make alternative macro interfaces more clear and consistent
  2017-09-14 17:26               ` Josh Poimboeuf
@ 2017-09-14 17:33                 ` Josh Poimboeuf
  2017-09-14 18:28                   ` Linus Torvalds
  0 siblings, 1 reply; 26+ messages in thread
From: Josh Poimboeuf @ 2017-09-14 17:33 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, the arch/x86 maintainers, Linux Kernel Mailing List,
	Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Alexander Potapenko, Dmitriy Vyukov, Matthias Kaehlcke,
	Arnd Bergmann, Peter Zijlstra

On Thu, Sep 14, 2017 at 12:26:27PM -0500, Josh Poimboeuf wrote:
> On Thu, Sep 14, 2017 at 10:16:08AM -0700, Linus Torvalds wrote:
> > On Thu, Sep 14, 2017 at 7:48 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > >
> > > As it turns out, the real problem with this option is that it imposes a
> > > penalty for CONFIG_FRAME_POINTER=n: even with frame pointers disabled,
> > > it forces the frame pointer to be saved for each function which uses the
> > > inline asm "call" statements.  Our current solution doesn't do that.
> > 
> > But couldn't we make the whole stack pointer clobber be dependent on
> > CONFIG_FRAME_POINTER?
> > 
> > The only reason we do it is to make sure the frame pointer is set up
> > before the inline asm is emitted, but with frame pointers disabled we
> > don't need to.
> 
> We could, but then that would mean either:
> 
>  a) uglifying the 15 or so relevant inline asm locations with ifdefs; or

Actually I guess we could put the "sp" in a macro...  I'll try it.

-- 
Josh

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 3/4] x86/asm: Make alternative macro interfaces more clear and consistent
  2017-09-14 17:33                 ` Josh Poimboeuf
@ 2017-09-14 18:28                   ` Linus Torvalds
  2017-09-14 18:45                     ` Josh Poimboeuf
  0 siblings, 1 reply; 26+ messages in thread
From: Linus Torvalds @ 2017-09-14 18:28 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Ingo Molnar, the arch/x86 maintainers, Linux Kernel Mailing List,
	Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Alexander Potapenko, Dmitriy Vyukov, Matthias Kaehlcke,
	Arnd Bergmann, Peter Zijlstra

On Thu, Sep 14, 2017 at 10:33 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>>
>>  a) uglifying the 15 or so relevant inline asm locations with ifdefs; or
>
> Actually I guess we could put the "sp" in a macro...  I'll try it.

Exactly. Do something like

   #ifdef CONFIG_FRAME_POINTER
   # define EXTRA_ASM_CLOBBERS "rsp"
   #else
   # define EXTRA_ASM_CLOBBERS
   #endif

and then replace the nasty

        register void *__sp asm(_ASM_SP);
        ..
        "+r" (__sp)

games with just that EXTRA_ASM_CLOBBERS thing at the end of the clobbers.

Yes, you'd probably have to document that the alternative_call_2()
thing doesn't take a "input" argument, but a input_and_clobbers, but
all users do that anyway.

I dunno.

               Linus

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 3/4] x86/asm: Make alternative macro interfaces more clear and consistent
  2017-09-14 18:28                   ` Linus Torvalds
@ 2017-09-14 18:45                     ` Josh Poimboeuf
  2017-09-15 16:10                       ` Josh Poimboeuf
  0 siblings, 1 reply; 26+ messages in thread
From: Josh Poimboeuf @ 2017-09-14 18:45 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, the arch/x86 maintainers, Linux Kernel Mailing List,
	Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Alexander Potapenko, Dmitriy Vyukov, Matthias Kaehlcke,
	Arnd Bergmann, Peter Zijlstra

On Thu, Sep 14, 2017 at 11:28:30AM -0700, Linus Torvalds wrote:
> On Thu, Sep 14, 2017 at 10:33 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> >>
> >>  a) uglifying the 15 or so relevant inline asm locations with ifdefs; or
> >
> > Actually I guess we could put the "sp" in a macro...  I'll try it.
> 
> Exactly. Do something like
> 
>    #ifdef CONFIG_FRAME_POINTER
>    # define EXTRA_ASM_CLOBBERS "rsp"
>    #else
>    # define EXTRA_ASM_CLOBBERS
>    #endif
> 
> and then replace the nasty
> 
>         register void *__sp asm(_ASM_SP);
>         ..
>         "+r" (__sp)
> 
> games with just that EXTRA_ASM_CLOBBERS thing at the end of the clobbers.
> 
> Yes, you'd probably have to document that the alternative_call_2()
> thing doesn't take a "input" argument, but a input_and_clobbers, but
> all users do that anyway.
> 
> I dunno.

There's also alternative_call(), which doesn't yet have the '__rsp'
annotation, but it probably should.  It has some callers which pass
clobbers and some which don't, so its conversion would be trickier.

So my plan is to keep patch 3 of this series, which clarifies those
alternative macro interfaces, and also separates the inputs from the
clobbers.  That'll make it really easy to add something like
EXTRA_ASM_CLOBBERS above.

In fact I'll probably keep patches 1-3, because they're all
improvements.  Then I'll replace the original patch 4 (ASM_CALL) with
the "sp" clobbers thing.

-- 
Josh

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 3/4] x86/asm: Make alternative macro interfaces more clear and consistent
  2017-09-14 18:45                     ` Josh Poimboeuf
@ 2017-09-15 16:10                       ` Josh Poimboeuf
  0 siblings, 0 replies; 26+ messages in thread
From: Josh Poimboeuf @ 2017-09-15 16:10 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, the arch/x86 maintainers, Linux Kernel Mailing List,
	Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Alexander Potapenko, Dmitriy Vyukov, Matthias Kaehlcke,
	Arnd Bergmann, Peter Zijlstra

On Thu, Sep 14, 2017 at 01:45:29PM -0500, Josh Poimboeuf wrote:
> On Thu, Sep 14, 2017 at 11:28:30AM -0700, Linus Torvalds wrote:
> > On Thu, Sep 14, 2017 at 10:33 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > >>
> > >>  a) uglifying the 15 or so relevant inline asm locations with ifdefs; or
> > >
> > > Actually I guess we could put the "sp" in a macro...  I'll try it.
> > 
> > Exactly. Do something like
> > 
> >    #ifdef CONFIG_FRAME_POINTER
> >    # define EXTRA_ASM_CLOBBERS "rsp"
> >    #else
> >    # define EXTRA_ASM_CLOBBERS
> >    #endif
> > 
> > and then replace the nasty
> > 
> >         register void *__sp asm(_ASM_SP);
> >         ..
> >         "+r" (__sp)
> > 
> > games with just that EXTRA_ASM_CLOBBERS thing at the end of the clobbers.
> > 
> > Yes, you'd probably have to document that the alternative_call_2()
> > thing doesn't take a "input" argument, but a input_and_clobbers, but
> > all users do that anyway.
> > 
> > I dunno.
> 
> There's also alternative_call(), which doesn't yet have the '__rsp'
> annotation, but it probably should.  It has some callers which pass
> clobbers and some which don't, so its conversion would be trickier.
> 
> So my plan is to keep patch 3 of this series, which clarifies those
> alternative macro interfaces, and also separates the inputs from the
> clobbers.  That'll make it really easy to add something like
> EXTRA_ASM_CLOBBERS above.
> 
> In fact I'll probably keep patches 1-3, because they're all
> improvements.  Then I'll replace the original patch 4 (ASM_CALL) with
> the "sp" clobbers thing.

So I couldn't figure out how to make it any simpler than this:

#ifdef CONFIG_FRAME_POINTER
# define ASM_CALL_CLOBBERS "sp"
# define ASM_CALL_CLOBBERS_APPEND , ASM_CALL_CLOBBERS
# define ASM_CALL_CLOBBERS_ARGS(args...) ASM_CALL_CLOBBERS, ## args
#else
# define ASM_CALL_CLOBBERS
# define ASM_CALL_CLOBBERS_APPEND
# define ASM_CALL_CLOBBERS_ARGS(args...) args
#endif


ASM_CALL_CLOBBERS is the normal one:

  asm volatile("call foo" : : : ASM_CALL_CLOBBERS);


ASM_CALL_CLOBBERS_APPEND is needed when combining the option with other
clobbers options, like:

  asm volatile("call foo" : : : "memory" ASM_CALL_CLOBBERS_APPEND);


ASM_CALL_CLOBBERS_ARGS is needed for the pesky alternative_call() macro
so it can work with the variadic argument:

  #define alternative_call(oldfunc, newfunc, feature, outputs, inputs,	\
  			 clobbers...)					\
  	asm volatile (ALTERNATIVE("call %P[old]", "call %P[new]",	\
  				  feature),				\
  		      : outputs						\
  		      : [old] "i" (oldfunc), [new] "i" (newfunc)	\
  		        ARGS_APPEND(inputs)				\
  		      : ASM_CALL_CLOBBERS_ARGS(clobbers))


So I *was* about ready to post something like the above.  But, of
course, the kbuild robot found that the new version of my patches
manages to crash GCC with a certain randconfig.

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82221

So the saga continues.  Now I've managed to break clang, sparse, and GCC
with each consecutive iteration of these patches...  I'm starting to
lose my faith in compilers.

-- 
Josh

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 3/4] x86/asm: Make alternative macro interfaces more clear and consistent
  2017-08-31 17:25     ` Josh Poimboeuf
  2017-08-31 17:31       ` Josh Poimboeuf
@ 2017-09-15 16:53       ` Andrey Ryabinin
  2017-09-15 17:20         ` Josh Poimboeuf
  2017-09-15 18:01         ` Linus Torvalds
  1 sibling, 2 replies; 26+ messages in thread
From: Andrey Ryabinin @ 2017-09-15 16:53 UTC (permalink / raw)
  To: Josh Poimboeuf, Linus Torvalds
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Ingo Molnar,
	Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Alexander Potapenko, Dmitriy Vyukov, Matthias Kaehlcke,
	Arnd Bergmann, Peter Zijlstra



On 08/31/2017 08:25 PM, Josh Poimboeuf wrote:
>
> There have been a few other ideas which have *almost* worked:
> 
> 1) Make the 'register void *__sp asm(_ASM_SP)' a global variable instead
>    of a local one.  This works for GCC and doesn't break clang.  However
>    it resulted in a lot of changed code on the GCC side.  It looked like
>    some optimizations had been disabled, even in functions which
>    shouldn't have been affected.
> 

(For those who wasn't following previous discussion there is the patch - http://lkml.kernel.org/r/<75850bb7-a60e-057d-d88b-afa0c79e94a0@gmail.com>
But, Josh discovered that the patch causes regression in .text size with gcc > 7.x
http://lkml.kernel.org/r/<20170721132452.ihpws67e3e7ym3al@treble> )

I'm not so sure that this is disabled optimization. I assume that global rsp makes
changes something in gcc's register allocation logic, or something like that leading
to subtle changes in generated code.

But what I recently find out, is that this "regression" sometimes is actually improvement in .text size.
It all depends on .config, e.g:


allnoconfig:
               text    data     bss     dec     hex filename
patched	       938094  207700 1215752 2361546  2408ca allnoconfig_p/vmlinux
unpatched      938768  211796 1215752 2366316  241b6c allnoconfig/vmlinux                                                                                                                                                         


defconfig:
               text    data     bss     dec     hex filename
patched	       10691469        4882856  876544 16450869         fb0535 defconfig_p/vmlinux
unpatched      10691035        4882856  876544 16450435         fb0383 defconfig/vmlinux

allyesconfig:
               text    data     bss     dec     hex filename
patched	       159216239       153294411       49692672        362203322       1596c8ba        allyesconfig_p/vmlinux
unpatched      159224951       153294387       49692672        362212010       1596eaaa        allyesconfig/vmlinux


So there is regression with Josh's config and defconfig, but allnoconfig and allyesconfig shows improvement.
Note that sometimes there is the difference in .data size too, don't ask me why.

Given that, perhaps the best thing to do here would be asking gcc devs why is generated code changed.
And if such changes are harmless (as I suspect) we can just fix the problem with this simple patch http://lkml.kernel.org/r/<75850bb7-a60e-057d-d88b-afa0c79e94a0@gmail.com> ?
Despite the increased .text on some configs.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 3/4] x86/asm: Make alternative macro interfaces more clear and consistent
  2017-09-15 16:53       ` Andrey Ryabinin
@ 2017-09-15 17:20         ` Josh Poimboeuf
  2017-09-15 18:01         ` Linus Torvalds
  1 sibling, 0 replies; 26+ messages in thread
From: Josh Poimboeuf @ 2017-09-15 17:20 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: Linus Torvalds, the arch/x86 maintainers,
	Linux Kernel Mailing List, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, Andy Lutomirski, Alexander Potapenko,
	Dmitriy Vyukov, Matthias Kaehlcke, Arnd Bergmann, Peter Zijlstra

On Fri, Sep 15, 2017 at 07:53:46PM +0300, Andrey Ryabinin wrote:
> 
> 
> On 08/31/2017 08:25 PM, Josh Poimboeuf wrote:
> >
> > There have been a few other ideas which have *almost* worked:
> > 
> > 1) Make the 'register void *__sp asm(_ASM_SP)' a global variable instead
> >    of a local one.  This works for GCC and doesn't break clang.  However
> >    it resulted in a lot of changed code on the GCC side.  It looked like
> >    some optimizations had been disabled, even in functions which
> >    shouldn't have been affected.
> > 
> 
> (For those who wasn't following previous discussion there is the patch - http://lkml.kernel.org/r/<75850bb7-a60e-057d-d88b-afa0c79e94a0@gmail.com>
> But, Josh discovered that the patch causes regression in .text size with gcc > 7.x
> http://lkml.kernel.org/r/<20170721132452.ihpws67e3e7ym3al@treble> )
> 
> I'm not so sure that this is disabled optimization. I assume that global rsp makes
> changes something in gcc's register allocation logic, or something like that leading
> to subtle changes in generated code.
> 
> But what I recently find out, is that this "regression" sometimes is actually improvement in .text size.
> It all depends on .config, e.g:
> 
> 
> allnoconfig:
>                text    data     bss     dec     hex filename
> patched	       938094  207700 1215752 2361546  2408ca allnoconfig_p/vmlinux
> unpatched      938768  211796 1215752 2366316  241b6c allnoconfig/vmlinux                                                                                                                                                         
> 
> 
> defconfig:
>                text    data     bss     dec     hex filename
> patched	       10691469        4882856  876544 16450869         fb0535 defconfig_p/vmlinux
> unpatched      10691035        4882856  876544 16450435         fb0383 defconfig/vmlinux
> 
> allyesconfig:
>                text    data     bss     dec     hex filename
> patched	       159216239       153294411       49692672        362203322       1596c8ba        allyesconfig_p/vmlinux
> unpatched      159224951       153294387       49692672        362212010       1596eaaa        allyesconfig/vmlinux
> 
> 
> So there is regression with Josh's config and defconfig, but allnoconfig and allyesconfig shows improvement.
> Note that sometimes there is the difference in .data size too, don't ask me why.
> 
> Given that, perhaps the best thing to do here would be asking gcc devs why is generated code changed.
> And if such changes are harmless (as I suspect) we can just fix the problem with this simple patch http://lkml.kernel.org/r/<75850bb7-a60e-057d-d88b-afa0c79e94a0@gmail.com> ?
> Despite the increased .text on some configs.

Can you change all the other code sites in a similar way, and then run
the numbers again?

  $ git grep __sp |grep register |grep void
  arch/x86/include/asm/alternative.h:	register void *__sp asm(_ASM_SP);				      \
  arch/x86/include/asm/mshyperv.h:	register void *__sp asm(_ASM_SP);
  arch/x86/include/asm/mshyperv.h:	register void *__sp asm(_ASM_SP);
  arch/x86/include/asm/paravirt_types.h:	register void *__sp asm("esp")
  arch/x86/include/asm/paravirt_types.h:	register void *__sp asm("rsp")
  arch/x86/include/asm/preempt.h:	register void *__sp asm(_ASM_SP);			\
  arch/x86/include/asm/preempt.h:	register void *__sp asm(_ASM_SP);				\
  arch/x86/include/asm/processor.h:	register void *__sp asm(_ASM_SP);
  arch/x86/include/asm/rwsem.h:	register void *__sp asm(_ASM_SP);		\
  arch/x86/include/asm/uaccess.h:	register void *__sp asm(_ASM_SP);				\
  arch/x86/include/asm/xen/hypercall.h:	register void *__sp asm(_ASM_SP);
  arch/x86/kvm/emulate.c:	register void *__sp asm(_ASM_SP);
  arch/x86/kvm/vmx.c:	register void *__sp asm(_ASM_SP);
  arch/x86/mm/fault.c:		register void *__sp asm("rsp");

-- 
Josh

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 3/4] x86/asm: Make alternative macro interfaces more clear and consistent
  2017-09-15 16:53       ` Andrey Ryabinin
  2017-09-15 17:20         ` Josh Poimboeuf
@ 2017-09-15 18:01         ` Linus Torvalds
  2017-09-15 23:29           ` Josh Poimboeuf
  2017-09-19 16:02           ` Josh Poimboeuf
  1 sibling, 2 replies; 26+ messages in thread
From: Linus Torvalds @ 2017-09-15 18:01 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: Josh Poimboeuf, the arch/x86 maintainers,
	Linux Kernel Mailing List, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, Andy Lutomirski, Alexander Potapenko,
	Dmitriy Vyukov, Matthias Kaehlcke, Arnd Bergmann, Peter Zijlstra

On Fri, Sep 15, 2017 at 9:53 AM, Andrey Ryabinin
<aryabinin@virtuozzo.com> wrote:
>
> I'm not so sure that this is disabled optimization. I assume that global rsp makes
> changes something in gcc's register allocation logic, or something like that leading
> to subtle changes in generated code.
>
> But what I recently find out, is that this "regression" sometimes is actually improvement in .text size.
> It all depends on .config, e.g:

Oh, that would be lovely and solve all the issues.

And looking at the code generation differences for one file
(kernel/futex.c) and one single config (my default config), the thing
that the global stack register seems to change is that it moves some
code - particularly completely unrelated inline asm code - inside the
region protected by frame pointers.

There are a few register allocation changes too, but they didn't seem
to make code worse, and I think they were just "incidental" from code
movement. And most code movement really seemed to be around inline
asms, I  wonder if the gcc logic simply is something like "if the
stack pointer is visible as a register, don't move any inline asm
across a frame setup".

In fact, on that one file and one configuration, the resulting
assembler file had three fewer lines of code with that global stack
register declaration than with the local one.

So at least from just that one case, I can back up Andrey's
observation: it's not that the code gets worse, it just is slightly
different. Sometimes it's better.

So maybe that simple patch to just make the stack pointer be a global
register declaration really is the fix for this issue.

It's not *pretty*, and I'd much rather just see some explicit way for
us to say "this asm wants the frame to be set up", but of the
alternatives we've seen, maybe it's the right thing to do?

                   Linus

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 3/4] x86/asm: Make alternative macro interfaces more clear and consistent
  2017-09-15 18:01         ` Linus Torvalds
@ 2017-09-15 23:29           ` Josh Poimboeuf
  2017-09-16 22:22             ` Andrey Ryabinin
  2017-09-19 16:02           ` Josh Poimboeuf
  1 sibling, 1 reply; 26+ messages in thread
From: Josh Poimboeuf @ 2017-09-15 23:29 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrey Ryabinin, the arch/x86 maintainers,
	Linux Kernel Mailing List, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, Andy Lutomirski, Alexander Potapenko,
	Dmitriy Vyukov, Matthias Kaehlcke, Arnd Bergmann, Peter Zijlstra

On Fri, Sep 15, 2017 at 11:01:19AM -0700, Linus Torvalds wrote:
> On Fri, Sep 15, 2017 at 9:53 AM, Andrey Ryabinin
> <aryabinin@virtuozzo.com> wrote:
> >
> > I'm not so sure that this is disabled optimization. I assume that global rsp makes
> > changes something in gcc's register allocation logic, or something like that leading
> > to subtle changes in generated code.
> >
> > But what I recently find out, is that this "regression" sometimes is actually improvement in .text size.
> > It all depends on .config, e.g:
> 
> Oh, that would be lovely and solve all the issues.
> 
> And looking at the code generation differences for one file
> (kernel/futex.c) and one single config (my default config), the thing
> that the global stack register seems to change is that it moves some
> code - particularly completely unrelated inline asm code - inside the
> region protected by frame pointers.
> 
> There are a few register allocation changes too, but they didn't seem
> to make code worse, and I think they were just "incidental" from code
> movement. And most code movement really seemed to be around inline
> asms, I  wonder if the gcc logic simply is something like "if the
> stack pointer is visible as a register, don't move any inline asm
> across a frame setup".
> 
> In fact, on that one file and one configuration, the resulting
> assembler file had three fewer lines of code with that global stack
> register declaration than with the local one.
> 
> So at least from just that one case, I can back up Andrey's
> observation: it's not that the code gets worse, it just is slightly
> different. Sometimes it's better.
> 
> So maybe that simple patch to just make the stack pointer be a global
> register declaration really is the fix for this issue.
> 
> It's not *pretty*, and I'd much rather just see some explicit way for
> us to say "this asm wants the frame to be set up", but of the
> alternatives we've seen, maybe it's the right thing to do?

Ok, here's the (compile tested) patch in case anybody wants to try it
out.


diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
index 1b020381ab38..c096624137ae 100644
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -218,10 +218,9 @@ static inline int alternatives_text_reserved(void *start, void *end)
 #define alternative_call_2(oldfunc, newfunc1, feature1, newfunc2, feature2,   \
 			   output, input...)				      \
 {									      \
-	register void *__sp asm(_ASM_SP);				      \
 	asm volatile (ALTERNATIVE_2("call %P[old]", "call %P[new1]", feature1,\
 		"call %P[new2]", feature2)				      \
-		: output, "+r" (__sp)					      \
+		: output, ASM_CALL_CONSTRAINT				      \
 		: [old] "i" (oldfunc), [new1] "i" (newfunc1),		      \
 		  [new2] "i" (newfunc2), ## input);			      \
 }
diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h
index 676ee5807d86..3914cdf0e488 100644
--- a/arch/x86/include/asm/asm.h
+++ b/arch/x86/include/asm/asm.h
@@ -132,4 +132,9 @@
 /* For C file, we already have NOKPROBE_SYMBOL macro */
 #endif
 
+#ifndef __ASSEMBLY__
+register void *__asm_call_sp asm(_ASM_SP);
+#define ASM_CALL_CONSTRAINT "+r" (__asm_call_sp)
+#endif
+
 #endif /* _ASM_X86_ASM_H */
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 63cc96f064dc..647e2f1d4cb6 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -179,7 +179,6 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
 	u64 input_address = input ? virt_to_phys(input) : 0;
 	u64 output_address = output ? virt_to_phys(output) : 0;
 	u64 hv_status;
-	register void *__sp asm(_ASM_SP);
 
 #ifdef CONFIG_X86_64
 	if (!hv_hypercall_pg)
@@ -187,7 +186,7 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
 
 	__asm__ __volatile__("mov %4, %%r8\n"
 			     "call *%5"
-			     : "=a" (hv_status), "+r" (__sp),
+			     : "=a" (hv_status), ASM_CALL_CONSTRAINT,
 			       "+c" (control), "+d" (input_address)
 			     :  "r" (output_address), "m" (hv_hypercall_pg)
 			     : "cc", "memory", "r8", "r9", "r10", "r11");
@@ -202,7 +201,7 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
 
 	__asm__ __volatile__("call *%7"
 			     : "=A" (hv_status),
-			       "+c" (input_address_lo), "+r" (__sp)
+			       "+c" (input_address_lo), ASM_CALL_CONSTRAINT,
 			     : "A" (control),
 			       "b" (input_address_hi),
 			       "D"(output_address_hi), "S"(output_address_lo),
@@ -224,12 +223,11 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
 static inline u64 hv_do_fast_hypercall8(u16 code, u64 input1)
 {
 	u64 hv_status, control = (u64)code | HV_HYPERCALL_FAST_BIT;
-	register void *__sp asm(_ASM_SP);
 
 #ifdef CONFIG_X86_64
 	{
 		__asm__ __volatile__("call *%4"
-				     : "=a" (hv_status), "+r" (__sp),
+				     : "=a" (hv_status), ASM_CALL_CONSTRAINT,
 				       "+c" (control), "+d" (input1)
 				     : "m" (hv_hypercall_pg)
 				     : "cc", "r8", "r9", "r10", "r11");
@@ -242,7 +240,7 @@ static inline u64 hv_do_fast_hypercall8(u16 code, u64 input1)
 		__asm__ __volatile__ ("call *%5"
 				      : "=A"(hv_status),
 					"+c"(input1_lo),
-					"+r"(__sp)
+					ASM_CALL_CONSTRAINT
 				      :	"A" (control),
 					"b" (input1_hi),
 					"m" (hv_hypercall_pg)
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 42873edd9f9d..280d94c36dad 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -459,8 +459,8 @@ int paravirt_disable_iospace(void);
  */
 #ifdef CONFIG_X86_32
 #define PVOP_VCALL_ARGS							\
-	unsigned long __eax = __eax, __edx = __edx, __ecx = __ecx;	\
-	register void *__sp asm("esp")
+	unsigned long __eax = __eax, __edx = __edx, __ecx = __ecx;
+
 #define PVOP_CALL_ARGS			PVOP_VCALL_ARGS
 
 #define PVOP_CALL_ARG1(x)		"a" ((unsigned long)(x))
@@ -480,8 +480,8 @@ int paravirt_disable_iospace(void);
 /* [re]ax isn't an arg, but the return val */
 #define PVOP_VCALL_ARGS						\
 	unsigned long __edi = __edi, __esi = __esi,		\
-		__edx = __edx, __ecx = __ecx, __eax = __eax;	\
-	register void *__sp asm("rsp")
+		__edx = __edx, __ecx = __ecx, __eax = __eax;
+
 #define PVOP_CALL_ARGS		PVOP_VCALL_ARGS
 
 #define PVOP_CALL_ARG1(x)		"D" ((unsigned long)(x))
@@ -532,7 +532,7 @@ int paravirt_disable_iospace(void);
 			asm volatile(pre				\
 				     paravirt_alt(PARAVIRT_CALL)	\
 				     post				\
-				     : call_clbr, "+r" (__sp)		\
+				     : call_clbr, ASM_CALL_CONSTRAINT	\
 				     : paravirt_type(op),		\
 				       paravirt_clobber(clbr),		\
 				       ##__VA_ARGS__			\
@@ -542,7 +542,7 @@ int paravirt_disable_iospace(void);
 			asm volatile(pre				\
 				     paravirt_alt(PARAVIRT_CALL)	\
 				     post				\
-				     : call_clbr, "+r" (__sp)		\
+				     : call_clbr, ASM_CALL_CONSTRAINT	\
 				     : paravirt_type(op),		\
 				       paravirt_clobber(clbr),		\
 				       ##__VA_ARGS__			\
@@ -569,7 +569,7 @@ int paravirt_disable_iospace(void);
 		asm volatile(pre					\
 			     paravirt_alt(PARAVIRT_CALL)		\
 			     post					\
-			     : call_clbr, "+r" (__sp)			\
+			     : call_clbr, ASM_CALL_CONSTRAINT		\
 			     : paravirt_type(op),			\
 			       paravirt_clobber(clbr),			\
 			       ##__VA_ARGS__				\
diff --git a/arch/x86/include/asm/preempt.h b/arch/x86/include/asm/preempt.h
index ec1f3c651150..0bfe1c272db7 100644
--- a/arch/x86/include/asm/preempt.h
+++ b/arch/x86/include/asm/preempt.h
@@ -100,19 +100,14 @@ static __always_inline bool should_resched(int preempt_offset)
 
 #ifdef CONFIG_PREEMPT
   extern asmlinkage void ___preempt_schedule(void);
-# define __preempt_schedule()					\
-({								\
-	register void *__sp asm(_ASM_SP);			\
-	asm volatile ("call ___preempt_schedule" : "+r"(__sp));	\
-})
+# define __preempt_schedule() \
+	asm volatile ("call ___preempt_schedule" : ASM_CALL_CONSTRAINT);
 
   extern asmlinkage void preempt_schedule(void);
   extern asmlinkage void ___preempt_schedule_notrace(void);
-# define __preempt_schedule_notrace()					\
-({									\
-	register void *__sp asm(_ASM_SP);				\
-	asm volatile ("call ___preempt_schedule_notrace" : "+r"(__sp));	\
-})
+# define __preempt_schedule_notrace() \
+	asm volatile ("call ___preempt_schedule_notrace" : ASM_CALL_CONSTRAINT);
+
   extern asmlinkage void preempt_schedule_notrace(void);
 #endif
 
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 3fa26a61eabc..b390ff76e58f 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -677,8 +677,6 @@ static inline void sync_core(void)
 	 * Like all of Linux's memory ordering operations, this is a
 	 * compiler barrier as well.
 	 */
-	register void *__sp asm(_ASM_SP);
-
 #ifdef CONFIG_X86_32
 	asm volatile (
 		"pushfl\n\t"
@@ -686,7 +684,7 @@ static inline void sync_core(void)
 		"pushl $1f\n\t"
 		"iret\n\t"
 		"1:"
-		: "+r" (__sp) : : "memory");
+		: ASM_CALL_CONSTRAINT : : "memory");
 #else
 	unsigned int tmp;
 
@@ -703,7 +701,7 @@ static inline void sync_core(void)
 		"iretq\n\t"
 		UNWIND_HINT_RESTORE
 		"1:"
-		: "=&r" (tmp), "+r" (__sp) : : "cc", "memory");
+		: "=&r" (tmp), ASM_CALL_CONSTRAINT : : "cc", "memory");
 #endif
 }
 
diff --git a/arch/x86/include/asm/rwsem.h b/arch/x86/include/asm/rwsem.h
index a34e0d4b957d..7116b7931c7b 100644
--- a/arch/x86/include/asm/rwsem.h
+++ b/arch/x86/include/asm/rwsem.h
@@ -103,7 +103,6 @@ static inline bool __down_read_trylock(struct rw_semaphore *sem)
 ({							\
 	long tmp;					\
 	struct rw_semaphore* ret;			\
-	register void *__sp asm(_ASM_SP);		\
 							\
 	asm volatile("# beginning down_write\n\t"	\
 		     LOCK_PREFIX "  xadd      %1,(%4)\n\t"	\
@@ -114,7 +113,8 @@ static inline bool __down_read_trylock(struct rw_semaphore *sem)
 		     "  call " slow_path "\n"		\
 		     "1:\n"				\
 		     "# ending down_write"		\
-		     : "+m" (sem->count), "=d" (tmp), "=a" (ret), "+r" (__sp) \
+		     : "+m" (sem->count), "=d" (tmp),	\
+		       "=a" (ret), ASM_CALL_CONSTRAINT	\
 		     : "a" (sem), "1" (RWSEM_ACTIVE_WRITE_BIAS) \
 		     : "memory", "cc");			\
 	ret;						\
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 184eb9894dae..78e8fcc87d4c 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -166,11 +166,11 @@ __typeof__(__builtin_choose_expr(sizeof(x) > sizeof(0UL), 0ULL, 0UL))
 ({									\
 	int __ret_gu;							\
 	register __inttype(*(ptr)) __val_gu asm("%"_ASM_DX);		\
-	register void *__sp asm(_ASM_SP);				\
 	__chk_user_ptr(ptr);						\
 	might_fault();							\
 	asm volatile("call __get_user_%P4"				\
-		     : "=a" (__ret_gu), "=r" (__val_gu), "+r" (__sp)	\
+		     : "=a" (__ret_gu), "=r" (__val_gu),		\
+			ASM_CALL_CONSTRAINT				\
 		     : "0" (ptr), "i" (sizeof(*(ptr))));		\
 	(x) = (__force __typeof__(*(ptr))) __val_gu;			\
 	__builtin_expect(__ret_gu, 0);					\
diff --git a/arch/x86/include/asm/xen/hypercall.h b/arch/x86/include/asm/xen/hypercall.h
index 9606688caa4b..128a1a0b1450 100644
--- a/arch/x86/include/asm/xen/hypercall.h
+++ b/arch/x86/include/asm/xen/hypercall.h
@@ -113,10 +113,9 @@ extern struct { char _entry[32]; } hypercall_page[];
 	register unsigned long __arg2 asm(__HYPERCALL_ARG2REG) = __arg2; \
 	register unsigned long __arg3 asm(__HYPERCALL_ARG3REG) = __arg3; \
 	register unsigned long __arg4 asm(__HYPERCALL_ARG4REG) = __arg4; \
-	register unsigned long __arg5 asm(__HYPERCALL_ARG5REG) = __arg5; \
-	register void *__sp asm(_ASM_SP);
+	register unsigned long __arg5 asm(__HYPERCALL_ARG5REG) = __arg5;
 
-#define __HYPERCALL_0PARAM	"=r" (__res), "+r" (__sp)
+#define __HYPERCALL_0PARAM	"=r" (__res), ASM_CALL_CONSTRAINT
 #define __HYPERCALL_1PARAM	__HYPERCALL_0PARAM, "+r" (__arg1)
 #define __HYPERCALL_2PARAM	__HYPERCALL_1PARAM, "+r" (__arg2)
 #define __HYPERCALL_3PARAM	__HYPERCALL_2PARAM, "+r" (__arg3)
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 16bf6655aa85..f23f13403f33 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -5296,7 +5296,6 @@ static void fetch_possible_mmx_operand(struct x86_emulate_ctxt *ctxt,
 
 static int fastop(struct x86_emulate_ctxt *ctxt, void (*fop)(struct fastop *))
 {
-	register void *__sp asm(_ASM_SP);
 	ulong flags = (ctxt->eflags & EFLAGS_MASK) | X86_EFLAGS_IF;
 
 	if (!(ctxt->d & ByteOp))
@@ -5304,7 +5303,7 @@ static int fastop(struct x86_emulate_ctxt *ctxt, void (*fop)(struct fastop *))
 
 	asm("push %[flags]; popf; call *%[fastop]; pushf; pop %[flags]\n"
 	    : "+a"(ctxt->dst.val), "+d"(ctxt->src.val), [flags]"+D"(flags),
-	      [fastop]"+S"(fop), "+r"(__sp)
+	      [fastop]"+S"(fop), ASM_CALL_CONSTRAINT
 	    : "c"(ctxt->src2.val));
 
 	ctxt->eflags = (ctxt->eflags & ~EFLAGS_MASK) | (flags & EFLAGS_MASK);
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 699704d4bc9e..efedabe435c9 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -9036,7 +9036,6 @@ static void vmx_complete_atomic_exit(struct vcpu_vmx *vmx)
 static void vmx_handle_external_intr(struct kvm_vcpu *vcpu)
 {
 	u32 exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
-	register void *__sp asm(_ASM_SP);
 
 	if ((exit_intr_info & (INTR_INFO_VALID_MASK | INTR_INFO_INTR_TYPE_MASK))
 			== (INTR_INFO_VALID_MASK | INTR_TYPE_EXT_INTR)) {
@@ -9065,7 +9064,7 @@ static void vmx_handle_external_intr(struct kvm_vcpu *vcpu)
 #ifdef CONFIG_X86_64
 			[sp]"=&r"(tmp),
 #endif
-			"+r"(__sp)
+			ASM_CALL_CONSTRAINT
 			:
 			[entry]"r"(entry),
 			[ss]"i"(__KERNEL_DS),
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index b836a7274e12..39567b5c33da 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -806,7 +806,6 @@ no_context(struct pt_regs *regs, unsigned long error_code,
 	if (is_vmalloc_addr((void *)address) &&
 	    (((unsigned long)tsk->stack - 1 - address < PAGE_SIZE) ||
 	     address - ((unsigned long)tsk->stack + THREAD_SIZE) < PAGE_SIZE)) {
-		register void *__sp asm("rsp");
 		unsigned long stack = this_cpu_read(orig_ist.ist[DOUBLEFAULT_STACK]) - sizeof(void *);
 		/*
 		 * We're likely to be running with very little stack space
@@ -821,7 +820,7 @@ no_context(struct pt_regs *regs, unsigned long error_code,
 		asm volatile ("movq %[stack], %%rsp\n\t"
 			      "call handle_stack_overflow\n\t"
 			      "1: jmp 1b"
-			      : "+r" (__sp)
+			      : ASM_CALL_CONSTRAINT
 			      : "D" ("kernel stack overflow (page fault)"),
 				"S" (regs), "d" (address),
 				[stack] "rm" (stack));

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 3/4] x86/asm: Make alternative macro interfaces more clear and consistent
  2017-09-15 23:29           ` Josh Poimboeuf
@ 2017-09-16 22:22             ` Andrey Ryabinin
  2017-09-18 17:40               ` Josh Poimboeuf
  0 siblings, 1 reply; 26+ messages in thread
From: Andrey Ryabinin @ 2017-09-16 22:22 UTC (permalink / raw)
  To: Josh Poimboeuf, Linus Torvalds
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Ingo Molnar,
	Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Alexander Potapenko, Dmitriy Vyukov, Matthias Kaehlcke,
	Arnd Bergmann, Peter Zijlstra

On 09/16/2017 02:29 AM, Josh Poimboeuf wrote:
> On Fri, Sep 15, 2017 at 11:01:19AM -0700, Linus Torvalds wrote:
>> On Fri, Sep 15, 2017 at 9:53 AM, Andrey Ryabinin
>> <aryabinin@virtuozzo.com> wrote:
>>>
>>> I'm not so sure that this is disabled optimization. I assume that global rsp makes
>>> changes something in gcc's register allocation logic, or something like that leading
>>> to subtle changes in generated code.
>>>
>>> But what I recently find out, is that this "regression" sometimes is actually improvement in .text size.
>>> It all depends on .config, e.g:
>>
>> Oh, that would be lovely and solve all the issues.
>>
>> And looking at the code generation differences for one file
>> (kernel/futex.c) and one single config (my default config), the thing
>> that the global stack register seems to change is that it moves some
>> code - particularly completely unrelated inline asm code - inside the
>> region protected by frame pointers.
>>
>> There are a few register allocation changes too, but they didn't seem
>> to make code worse, and I think they were just "incidental" from code
>> movement. And most code movement really seemed to be around inline
>> asms, I  wonder if the gcc logic simply is something like "if the
>> stack pointer is visible as a register, don't move any inline asm
>> across a frame setup".
>>
>> In fact, on that one file and one configuration, the resulting
>> assembler file had three fewer lines of code with that global stack
>> register declaration than with the local one.
>>
>> So at least from just that one case, I can back up Andrey's
>> observation: it's not that the code gets worse, it just is slightly
>> different. Sometimes it's better.
>>
>> So maybe that simple patch to just make the stack pointer be a global
>> register declaration really is the fix for this issue.
>>
>> It's not *pretty*, and I'd much rather just see some explicit way for
>> us to say "this asm wants the frame to be set up", but of the
>> alternatives we've seen, maybe it's the right thing to do?
> 
> Ok, here's the (compile tested) patch in case anybody wants to try it
> out.
> 

I already had almost the same patch, so I just used mine. Patch seems to be
the same as yours except cosmetic details which shouldn't affect the end result.
But just in case it posted below.


The patch was applied on top of b38923a068c10fc36ca8f596d650d095ce390b85 commit,
kernel compiled gcc 7.2.0 

allnoconfig:   text    data     bss     dec     hex filename
base           946920  206384 1215752 2369056  242620 allnoconfig/vmlinux
patched        946501  206384 1215752 2368637  24247d allnoconfig_p/vmlinux

defconfig:     text    data     bss     dec     hex filename
base           10729978        4840000  876544 16446522         faf43a defconfig/vmlinux
patched        10730666        4840000  876544 16447210         faf6ea defconfig_p/vmlinux

allyesconfig:   text    data     bss     dec     hex filename
base            161016572       152888347       49303552        363208471       15a61f17        allyesconfig/vmlinux
patched         161003557       152888347       49303552        363195456       15a5ec40        allyesconfig_p/vmlinux


---
 arch/x86/include/asm/alternative.h    |  3 +--
 arch/x86/include/asm/asm.h            |  2 ++
 arch/x86/include/asm/mshyperv.h       | 10 ++++------
 arch/x86/include/asm/paravirt_types.h | 12 +++++-------
 arch/x86/include/asm/preempt.h        |  6 ++----
 arch/x86/include/asm/processor.h      |  6 ++----
 arch/x86/include/asm/rwsem.h          |  3 +--
 arch/x86/include/asm/thread_info.h    |  1 -
 arch/x86/include/asm/uaccess.h        |  5 +++--
 arch/x86/include/asm/xen/hypercall.h  |  5 ++---
 arch/x86/kvm/emulate.c                |  3 +--
 arch/x86/kvm/vmx.c                    |  3 +--
 arch/x86/mm/fault.c                   |  3 +--
 13 files changed, 25 insertions(+), 37 deletions(-)

diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
index 1b020381ab38..8e3174258c5f 100644
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -218,10 +218,9 @@ static inline int alternatives_text_reserved(void *start, void *end)
 #define alternative_call_2(oldfunc, newfunc1, feature1, newfunc2, feature2,   \
 			   output, input...)				      \
 {									      \
-	register void *__sp asm(_ASM_SP);				      \
 	asm volatile (ALTERNATIVE_2("call %P[old]", "call %P[new1]", feature1,\
 		"call %P[new2]", feature2)				      \
-		: output, "+r" (__sp)					      \
+		: output, "+r" (__current_sp)					      \
 		: [old] "i" (oldfunc), [new1] "i" (newfunc1),		      \
 		  [new2] "i" (newfunc2), ## input);			      \
 }
diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h
index 676ee5807d86..2ebaddb6958d 100644
--- a/arch/x86/include/asm/asm.h
+++ b/arch/x86/include/asm/asm.h
@@ -129,6 +129,8 @@
 # define _ASM_EXTABLE_REFCOUNT(from, to)			\
 	_ASM_EXTABLE_HANDLE(from, to, ex_handler_refcount)
 
+register unsigned long __current_sp asm(_ASM_SP);
+
 /* For C file, we already have NOKPROBE_SYMBOL macro */
 #endif
 
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 63cc96f064dc..761cd7596296 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -179,7 +179,6 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
 	u64 input_address = input ? virt_to_phys(input) : 0;
 	u64 output_address = output ? virt_to_phys(output) : 0;
 	u64 hv_status;
-	register void *__sp asm(_ASM_SP);
 
 #ifdef CONFIG_X86_64
 	if (!hv_hypercall_pg)
@@ -187,7 +186,7 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
 
 	__asm__ __volatile__("mov %4, %%r8\n"
 			     "call *%5"
-			     : "=a" (hv_status), "+r" (__sp),
+			     : "=a" (hv_status), "+r" (__current_sp),
 			       "+c" (control), "+d" (input_address)
 			     :  "r" (output_address), "m" (hv_hypercall_pg)
 			     : "cc", "memory", "r8", "r9", "r10", "r11");
@@ -202,7 +201,7 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
 
 	__asm__ __volatile__("call *%7"
 			     : "=A" (hv_status),
-			       "+c" (input_address_lo), "+r" (__sp)
+			       "+c" (input_address_lo), "+r" (__current_sp)
 			     : "A" (control),
 			       "b" (input_address_hi),
 			       "D"(output_address_hi), "S"(output_address_lo),
@@ -224,12 +223,11 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
 static inline u64 hv_do_fast_hypercall8(u16 code, u64 input1)
 {
 	u64 hv_status, control = (u64)code | HV_HYPERCALL_FAST_BIT;
-	register void *__sp asm(_ASM_SP);
 
 #ifdef CONFIG_X86_64
 	{
 		__asm__ __volatile__("call *%4"
-				     : "=a" (hv_status), "+r" (__sp),
+				     : "=a" (hv_status), "+r" (__current_sp),
 				       "+c" (control), "+d" (input1)
 				     : "m" (hv_hypercall_pg)
 				     : "cc", "r8", "r9", "r10", "r11");
@@ -242,7 +240,7 @@ static inline u64 hv_do_fast_hypercall8(u16 code, u64 input1)
 		__asm__ __volatile__ ("call *%5"
 				      : "=A"(hv_status),
 					"+c"(input1_lo),
-					"+r"(__sp)
+					"+r"(__current_sp)
 				      :	"A" (control),
 					"b" (input1_hi),
 					"m" (hv_hypercall_pg)
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 42873edd9f9d..8e0e9d39ea01 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -459,8 +459,7 @@ int paravirt_disable_iospace(void);
  */
 #ifdef CONFIG_X86_32
 #define PVOP_VCALL_ARGS							\
-	unsigned long __eax = __eax, __edx = __edx, __ecx = __ecx;	\
-	register void *__sp asm("esp")
+	unsigned long __eax = __eax, __edx = __edx, __ecx = __ecx;
 #define PVOP_CALL_ARGS			PVOP_VCALL_ARGS
 
 #define PVOP_CALL_ARG1(x)		"a" ((unsigned long)(x))
@@ -480,8 +479,7 @@ int paravirt_disable_iospace(void);
 /* [re]ax isn't an arg, but the return val */
 #define PVOP_VCALL_ARGS						\
 	unsigned long __edi = __edi, __esi = __esi,		\
-		__edx = __edx, __ecx = __ecx, __eax = __eax;	\
-	register void *__sp asm("rsp")
+		__edx = __edx, __ecx = __ecx, __eax = __eax;
 #define PVOP_CALL_ARGS		PVOP_VCALL_ARGS
 
 #define PVOP_CALL_ARG1(x)		"D" ((unsigned long)(x))
@@ -532,7 +530,7 @@ int paravirt_disable_iospace(void);
 			asm volatile(pre				\
 				     paravirt_alt(PARAVIRT_CALL)	\
 				     post				\
-				     : call_clbr, "+r" (__sp)		\
+				     : call_clbr, "+r" (__current_sp)		\
 				     : paravirt_type(op),		\
 				       paravirt_clobber(clbr),		\
 				       ##__VA_ARGS__			\
@@ -542,7 +540,7 @@ int paravirt_disable_iospace(void);
 			asm volatile(pre				\
 				     paravirt_alt(PARAVIRT_CALL)	\
 				     post				\
-				     : call_clbr, "+r" (__sp)		\
+				     : call_clbr, "+r" (__current_sp)		\
 				     : paravirt_type(op),		\
 				       paravirt_clobber(clbr),		\
 				       ##__VA_ARGS__			\
@@ -569,7 +567,7 @@ int paravirt_disable_iospace(void);
 		asm volatile(pre					\
 			     paravirt_alt(PARAVIRT_CALL)		\
 			     post					\
-			     : call_clbr, "+r" (__sp)			\
+			     : call_clbr, "+r" (__current_sp)			\
 			     : paravirt_type(op),			\
 			       paravirt_clobber(clbr),			\
 			       ##__VA_ARGS__				\
diff --git a/arch/x86/include/asm/preempt.h b/arch/x86/include/asm/preempt.h
index ec1f3c651150..82ff7cbc3c3a 100644
--- a/arch/x86/include/asm/preempt.h
+++ b/arch/x86/include/asm/preempt.h
@@ -102,16 +102,14 @@ static __always_inline bool should_resched(int preempt_offset)
   extern asmlinkage void ___preempt_schedule(void);
 # define __preempt_schedule()					\
 ({								\
-	register void *__sp asm(_ASM_SP);			\
-	asm volatile ("call ___preempt_schedule" : "+r"(__sp));	\
+	asm volatile ("call ___preempt_schedule" : "+r"(__current_sp));	\
 })
 
   extern asmlinkage void preempt_schedule(void);
   extern asmlinkage void ___preempt_schedule_notrace(void);
 # define __preempt_schedule_notrace()					\
 ({									\
-	register void *__sp asm(_ASM_SP);				\
-	asm volatile ("call ___preempt_schedule_notrace" : "+r"(__sp));	\
+	asm volatile ("call ___preempt_schedule_notrace" : "+r"(__current_sp));	\
 })
   extern asmlinkage void preempt_schedule_notrace(void);
 #endif
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 3fa26a61eabc..a6ced4b30c2f 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -677,8 +677,6 @@ static inline void sync_core(void)
 	 * Like all of Linux's memory ordering operations, this is a
 	 * compiler barrier as well.
 	 */
-	register void *__sp asm(_ASM_SP);
-
 #ifdef CONFIG_X86_32
 	asm volatile (
 		"pushfl\n\t"
@@ -686,7 +684,7 @@ static inline void sync_core(void)
 		"pushl $1f\n\t"
 		"iret\n\t"
 		"1:"
-		: "+r" (__sp) : : "memory");
+		: "+r" (__current_sp) : : "memory");
 #else
 	unsigned int tmp;
 
@@ -703,7 +701,7 @@ static inline void sync_core(void)
 		"iretq\n\t"
 		UNWIND_HINT_RESTORE
 		"1:"
-		: "=&r" (tmp), "+r" (__sp) : : "cc", "memory");
+		: "=&r" (tmp), "+r" (__current_sp) : : "cc", "memory");
 #endif
 }
 
diff --git a/arch/x86/include/asm/rwsem.h b/arch/x86/include/asm/rwsem.h
index a34e0d4b957d..a6b4c7e2b90a 100644
--- a/arch/x86/include/asm/rwsem.h
+++ b/arch/x86/include/asm/rwsem.h
@@ -103,7 +103,6 @@ static inline bool __down_read_trylock(struct rw_semaphore *sem)
 ({							\
 	long tmp;					\
 	struct rw_semaphore* ret;			\
-	register void *__sp asm(_ASM_SP);		\
 							\
 	asm volatile("# beginning down_write\n\t"	\
 		     LOCK_PREFIX "  xadd      %1,(%4)\n\t"	\
@@ -114,7 +113,7 @@ static inline bool __down_read_trylock(struct rw_semaphore *sem)
 		     "  call " slow_path "\n"		\
 		     "1:\n"				\
 		     "# ending down_write"		\
-		     : "+m" (sem->count), "=d" (tmp), "=a" (ret), "+r" (__sp) \
+		     : "+m" (sem->count), "=d" (tmp), "=a" (ret), "+r" (__current_sp) \
 		     : "a" (sem), "1" (RWSEM_ACTIVE_WRITE_BIAS) \
 		     : "memory", "cc");			\
 	ret;						\
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index 5161da1a0fa0..d8f225efab8e 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -157,7 +157,6 @@ struct thread_info {
  * preempt_count needs to be 1 initially, until the scheduler is functional.
  */
 #ifndef __ASSEMBLY__
-
 static inline unsigned long current_stack_pointer(void)
 {
 	unsigned long sp;
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 184eb9894dae..90098148e6b7 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -162,15 +162,16 @@ __typeof__(__builtin_choose_expr(sizeof(x) > sizeof(0UL), 0ULL, 0UL))
  * Clang/LLVM cares about the size of the register, but still wants
  * the base register for something that ends up being a pair.
  */
+
 #define get_user(x, ptr)						\
 ({									\
 	int __ret_gu;							\
 	register __inttype(*(ptr)) __val_gu asm("%"_ASM_DX);		\
-	register void *__sp asm(_ASM_SP);				\
 	__chk_user_ptr(ptr);						\
 	might_fault();							\
 	asm volatile("call __get_user_%P4"				\
-		     : "=a" (__ret_gu), "=r" (__val_gu), "+r" (__sp)	\
+		     : "=a" (__ret_gu), "=r" (__val_gu), 		\
+		       "+r" (__current_sp)				\
 		     : "0" (ptr), "i" (sizeof(*(ptr))));		\
 	(x) = (__force __typeof__(*(ptr))) __val_gu;			\
 	__builtin_expect(__ret_gu, 0);					\
diff --git a/arch/x86/include/asm/xen/hypercall.h b/arch/x86/include/asm/xen/hypercall.h
index 9606688caa4b..7adb02fbeead 100644
--- a/arch/x86/include/asm/xen/hypercall.h
+++ b/arch/x86/include/asm/xen/hypercall.h
@@ -113,10 +113,9 @@ extern struct { char _entry[32]; } hypercall_page[];
 	register unsigned long __arg2 asm(__HYPERCALL_ARG2REG) = __arg2; \
 	register unsigned long __arg3 asm(__HYPERCALL_ARG3REG) = __arg3; \
 	register unsigned long __arg4 asm(__HYPERCALL_ARG4REG) = __arg4; \
-	register unsigned long __arg5 asm(__HYPERCALL_ARG5REG) = __arg5; \
-	register void *__sp asm(_ASM_SP);
+	register unsigned long __arg5 asm(__HYPERCALL_ARG5REG) = __arg5;
 
-#define __HYPERCALL_0PARAM	"=r" (__res), "+r" (__sp)
+#define __HYPERCALL_0PARAM	"=r" (__res), "+r" (__current_sp)
 #define __HYPERCALL_1PARAM	__HYPERCALL_0PARAM, "+r" (__arg1)
 #define __HYPERCALL_2PARAM	__HYPERCALL_1PARAM, "+r" (__arg2)
 #define __HYPERCALL_3PARAM	__HYPERCALL_2PARAM, "+r" (__arg3)
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 16bf6655aa85..29c797dc40aa 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -5296,7 +5296,6 @@ static void fetch_possible_mmx_operand(struct x86_emulate_ctxt *ctxt,
 
 static int fastop(struct x86_emulate_ctxt *ctxt, void (*fop)(struct fastop *))
 {
-	register void *__sp asm(_ASM_SP);
 	ulong flags = (ctxt->eflags & EFLAGS_MASK) | X86_EFLAGS_IF;
 
 	if (!(ctxt->d & ByteOp))
@@ -5304,7 +5303,7 @@ static int fastop(struct x86_emulate_ctxt *ctxt, void (*fop)(struct fastop *))
 
 	asm("push %[flags]; popf; call *%[fastop]; pushf; pop %[flags]\n"
 	    : "+a"(ctxt->dst.val), "+d"(ctxt->src.val), [flags]"+D"(flags),
-	      [fastop]"+S"(fop), "+r"(__sp)
+	      [fastop]"+S"(fop), "+r"(__current_sp)
 	    : "c"(ctxt->src2.val));
 
 	ctxt->eflags = (ctxt->eflags & ~EFLAGS_MASK) | (flags & EFLAGS_MASK);
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 699704d4bc9e..9513c9f28312 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -9036,7 +9036,6 @@ static void vmx_complete_atomic_exit(struct vcpu_vmx *vmx)
 static void vmx_handle_external_intr(struct kvm_vcpu *vcpu)
 {
 	u32 exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
-	register void *__sp asm(_ASM_SP);
 
 	if ((exit_intr_info & (INTR_INFO_VALID_MASK | INTR_INFO_INTR_TYPE_MASK))
 			== (INTR_INFO_VALID_MASK | INTR_TYPE_EXT_INTR)) {
@@ -9065,7 +9064,7 @@ static void vmx_handle_external_intr(struct kvm_vcpu *vcpu)
 #ifdef CONFIG_X86_64
 			[sp]"=&r"(tmp),
 #endif
-			"+r"(__sp)
+			"+r"(__current_sp)
 			:
 			[entry]"r"(entry),
 			[ss]"i"(__KERNEL_DS),
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index b836a7274e12..071b459029fe 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -806,7 +806,6 @@ no_context(struct pt_regs *regs, unsigned long error_code,
 	if (is_vmalloc_addr((void *)address) &&
 	    (((unsigned long)tsk->stack - 1 - address < PAGE_SIZE) ||
 	     address - ((unsigned long)tsk->stack + THREAD_SIZE) < PAGE_SIZE)) {
-		register void *__sp asm("rsp");
 		unsigned long stack = this_cpu_read(orig_ist.ist[DOUBLEFAULT_STACK]) - sizeof(void *);
 		/*
 		 * We're likely to be running with very little stack space
@@ -821,7 +820,7 @@ no_context(struct pt_regs *regs, unsigned long error_code,
 		asm volatile ("movq %[stack], %%rsp\n\t"
 			      "call handle_stack_overflow\n\t"
 			      "1: jmp 1b"
-			      : "+r" (__sp)
+			      : "+r" (__current_sp)
 			      : "D" ("kernel stack overflow (page fault)"),
 				"S" (regs), "d" (address),
 				[stack] "rm" (stack));
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 3/4] x86/asm: Make alternative macro interfaces more clear and consistent
  2017-09-16 22:22             ` Andrey Ryabinin
@ 2017-09-18 17:40               ` Josh Poimboeuf
  0 siblings, 0 replies; 26+ messages in thread
From: Josh Poimboeuf @ 2017-09-18 17:40 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: Linus Torvalds, the arch/x86 maintainers,
	Linux Kernel Mailing List, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, Andy Lutomirski, Alexander Potapenko,
	Dmitriy Vyukov, Matthias Kaehlcke, Arnd Bergmann, Peter Zijlstra

On Sun, Sep 17, 2017 at 01:22:32AM +0300, Andrey Ryabinin wrote:
> On 09/16/2017 02:29 AM, Josh Poimboeuf wrote:
> > On Fri, Sep 15, 2017 at 11:01:19AM -0700, Linus Torvalds wrote:
> >> On Fri, Sep 15, 2017 at 9:53 AM, Andrey Ryabinin
> >> <aryabinin@virtuozzo.com> wrote:
> >>>
> >>> I'm not so sure that this is disabled optimization. I assume that global rsp makes
> >>> changes something in gcc's register allocation logic, or something like that leading
> >>> to subtle changes in generated code.
> >>>
> >>> But what I recently find out, is that this "regression" sometimes is actually improvement in .text size.
> >>> It all depends on .config, e.g:
> >>
> >> Oh, that would be lovely and solve all the issues.
> >>
> >> And looking at the code generation differences for one file
> >> (kernel/futex.c) and one single config (my default config), the thing
> >> that the global stack register seems to change is that it moves some
> >> code - particularly completely unrelated inline asm code - inside the
> >> region protected by frame pointers.
> >>
> >> There are a few register allocation changes too, but they didn't seem
> >> to make code worse, and I think they were just "incidental" from code
> >> movement. And most code movement really seemed to be around inline
> >> asms, I  wonder if the gcc logic simply is something like "if the
> >> stack pointer is visible as a register, don't move any inline asm
> >> across a frame setup".
> >>
> >> In fact, on that one file and one configuration, the resulting
> >> assembler file had three fewer lines of code with that global stack
> >> register declaration than with the local one.
> >>
> >> So at least from just that one case, I can back up Andrey's
> >> observation: it's not that the code gets worse, it just is slightly
> >> different. Sometimes it's better.
> >>
> >> So maybe that simple patch to just make the stack pointer be a global
> >> register declaration really is the fix for this issue.
> >>
> >> It's not *pretty*, and I'd much rather just see some explicit way for
> >> us to say "this asm wants the frame to be set up", but of the
> >> alternatives we've seen, maybe it's the right thing to do?
> > 
> > Ok, here's the (compile tested) patch in case anybody wants to try it
> > out.
> > 
> 
> I already had almost the same patch, so I just used mine. Patch seems to be
> the same as yours except cosmetic details which shouldn't affect the end result.
> But just in case it posted below.
> 
> 
> The patch was applied on top of b38923a068c10fc36ca8f596d650d095ce390b85 commit,
> kernel compiled gcc 7.2.0 
> 
> allnoconfig:   text    data     bss     dec     hex filename
> base           946920  206384 1215752 2369056  242620 allnoconfig/vmlinux
> patched        946501  206384 1215752 2368637  24247d allnoconfig_p/vmlinux
> 
> defconfig:     text    data     bss     dec     hex filename
> base           10729978        4840000  876544 16446522         faf43a defconfig/vmlinux
> patched        10730666        4840000  876544 16447210         faf6ea defconfig_p/vmlinux
> 
> allyesconfig:   text    data     bss     dec     hex filename
> base            161016572       152888347       49303552        363208471       15a61f17        allyesconfig/vmlinux
> patched         161003557       152888347       49303552        363195456       15a5ec40        allyesconfig_p/vmlinux

I get similar results with my config: slightly smaller .text, both with
*and* without frame pointers.  So yeah, this is probably the best
option, or at least, the least-worst option.  I shouldn't have dismissed
the idea so quickly before.

(H.J. Lu suggested another idea, which is to use "rbp" as an input
constraint.  I tried it, but it added 22k of text to my kernel with
frame pointers disabled, so that's definitely worse than this.)

If testing continues to goes well, I'll submit an official patch soon.

-- 
Josh

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 3/4] x86/asm: Make alternative macro interfaces more clear and consistent
  2017-09-15 18:01         ` Linus Torvalds
  2017-09-15 23:29           ` Josh Poimboeuf
@ 2017-09-19 16:02           ` Josh Poimboeuf
  1 sibling, 0 replies; 26+ messages in thread
From: Josh Poimboeuf @ 2017-09-19 16:02 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrey Ryabinin, the arch/x86 maintainers,
	Linux Kernel Mailing List, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, Andy Lutomirski, Alexander Potapenko,
	Dmitriy Vyukov, Matthias Kaehlcke, Arnd Bergmann, Peter Zijlstra

On Fri, Sep 15, 2017 at 11:01:19AM -0700, Linus Torvalds wrote:
> And most code movement really seemed to be around inline asms, I
> wonder if the gcc logic simply is something like "if the stack pointer
> is visible as a register, don't move any inline asm across a frame
> setup".

After some testing, that seems to be exactly right.  Making the stack
pointer a global register variable seems to cause *all* inline asm to be
treated like it has a call instruction.

And setting the stack pointer as a constraint becomes a no-op.

So basically if we just put

  register void *__asm_call_sp asm(_ASM_SP);

in a globally visible header file (e.g., arch/x86/include/asm/asm.h), it
will *always* make sure the frame pointer is set up before inserting
*any* inline asm.

The upside is that we don't have to manually annotate inline asm anymore
with those awful "+r" (__sp) constraints.

The downside is that this creates a very slight performance issue for
leaf functions which use inline assembly without call instructions,
because they're getting frame pointer setup when they don't need it.

But based on the text size difference, the performance impact should be
negligible.  And it's only an issue for frame pointer-based kernels,
which may soon no longer be the default anyway.

And for non-frame-pointer-based kernels, it actually saves quite a bit
of text.

In GCC, here are the vmlinux .text size differences with the following
configs:

- defconfig
- defconfig without frame pointers
- Fedora distro config
- Fedora distro config without frame pointers

	defconfig	defconfig-nofp	distro		distro-nofp
before	9796300		9466764		9076191		8789745
after	9796941		9462859		9076381		8785325

If there are no objections, I'll go forward with the patch.

-- 
Josh

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2017-09-19 16:02 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-31 14:11 [RFC PATCH 0/4] x86/asm: Add ASM_CALL() macro for inline asms with call instructions Josh Poimboeuf
2017-08-31 14:11 ` [RFC PATCH 1/4] x86/paravirt: Fix output constraint macro names Josh Poimboeuf
2017-08-31 14:11 ` [RFC PATCH 2/4] x86/asm: Convert some inline asm positional operands to named operands Josh Poimboeuf
2017-08-31 14:11 ` [RFC PATCH 3/4] x86/asm: Make alternative macro interfaces more clear and consistent Josh Poimboeuf
2017-08-31 16:11   ` Linus Torvalds
2017-08-31 17:25     ` Josh Poimboeuf
2017-08-31 17:31       ` Josh Poimboeuf
2017-09-02 10:32         ` Ingo Molnar
2017-09-14 14:48           ` Josh Poimboeuf
2017-09-14 17:16             ` Linus Torvalds
2017-09-14 17:26               ` Josh Poimboeuf
2017-09-14 17:33                 ` Josh Poimboeuf
2017-09-14 18:28                   ` Linus Torvalds
2017-09-14 18:45                     ` Josh Poimboeuf
2017-09-15 16:10                       ` Josh Poimboeuf
2017-09-15 16:53       ` Andrey Ryabinin
2017-09-15 17:20         ` Josh Poimboeuf
2017-09-15 18:01         ` Linus Torvalds
2017-09-15 23:29           ` Josh Poimboeuf
2017-09-16 22:22             ` Andrey Ryabinin
2017-09-18 17:40               ` Josh Poimboeuf
2017-09-19 16:02           ` Josh Poimboeuf
2017-08-31 14:11 ` [RFC PATCH 4/4] x86/asm: Use ASM_CALL() macro for inline asm statements with call instructions Josh Poimboeuf
2017-08-31 14:50   ` Peter Zijlstra
2017-08-31 15:21     ` Josh Poimboeuf
2017-08-31 15:36       ` Dmitry Vyukov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).