linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 00/11] i386 Clang support
@ 2020-07-20 20:49 Nick Desaulniers
  2020-07-20 20:49 ` [PATCH v3 01/11] x86/percpu: Introduce size abstraction macros Nick Desaulniers
                   ` (12 more replies)
  0 siblings, 13 replies; 34+ messages in thread
From: Nick Desaulniers @ 2020-07-20 20:49 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov
  Cc: Dennis Zhou, Tejun Heo, Christoph Lameter, x86, H. Peter Anvin,
	Al Viro, Andrew Morton, Peter Zijlstra, linux-kernel,
	clang-built-linux, Nick Desaulniers

Resend of Brian's v2 with Acks from Peter and Linus collected, as well
as the final patch (mine) added. The commit of the final patch discusses
some of the architectural differences between GCC and Clang, and the
kernels tickling of this difference for i386, which necessitated these
patches.

Brian Gerst (10):
  x86/percpu: Introduce size abstraction macros
  x86/percpu: Clean up percpu_to_op()
  x86/percpu: Clean up percpu_from_op()
  x86/percpu: Clean up percpu_add_op()
  x86/percpu: Remove "e" constraint from XADD
  x86/percpu: Clean up percpu_add_return_op()
  x86/percpu: Clean up percpu_xchg_op()
  x86/percpu: Clean up percpu_cmpxchg_op()
  x86/percpu: Clean up percpu_stable_op()
  x86/percpu: Remove unused PER_CPU() macro

Nick Desaulniers (1):
  x86: support i386 with Clang

 arch/x86/include/asm/percpu.h  | 510 +++++++++++----------------------
 arch/x86/include/asm/uaccess.h |   4 +-
 2 files changed, 175 insertions(+), 339 deletions(-)

-- 
2.28.0.rc0.105.gf9edc3c819-goog


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v3 01/11] x86/percpu: Introduce size abstraction macros
  2020-07-20 20:49 [PATCH v3 00/11] i386 Clang support Nick Desaulniers
@ 2020-07-20 20:49 ` Nick Desaulniers
  2020-07-23 10:43   ` [tip: x86/asm] " tip-bot2 for Brian Gerst
  2020-07-20 20:49 ` [PATCH v3 02/11] x86/percpu: Clean up percpu_to_op() Nick Desaulniers
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 34+ messages in thread
From: Nick Desaulniers @ 2020-07-20 20:49 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov
  Cc: Dennis Zhou, Tejun Heo, Christoph Lameter, x86, H. Peter Anvin,
	Al Viro, Andrew Morton, Peter Zijlstra, linux-kernel,
	clang-built-linux, Brian Gerst, Nick Desaulniers, Linus Torvalds

From: Brian Gerst <brgerst@gmail.com>

In preparation for cleaning up the percpu operations, define macros for
abstraction based on the width of the operation.

Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
---
 arch/x86/include/asm/percpu.h | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index 2278797c769d..19838e4f7a8f 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -87,6 +87,36 @@
  * don't give an lvalue though). */
 extern void __bad_percpu_size(void);
 
+#define __pcpu_type_1 u8
+#define __pcpu_type_2 u16
+#define __pcpu_type_4 u32
+#define __pcpu_type_8 u64
+
+#define __pcpu_cast_1(val) ((u8)(((unsigned long) val) & 0xff))
+#define __pcpu_cast_2(val) ((u16)(((unsigned long) val) & 0xffff))
+#define __pcpu_cast_4(val) ((u32)(((unsigned long) val) & 0xffffffff))
+#define __pcpu_cast_8(val) ((u64)(val))
+
+#define __pcpu_op1_1(op, dst) op "b " dst
+#define __pcpu_op1_2(op, dst) op "w " dst
+#define __pcpu_op1_4(op, dst) op "l " dst
+#define __pcpu_op1_8(op, dst) op "q " dst
+
+#define __pcpu_op2_1(op, src, dst) op "b " src ", " dst
+#define __pcpu_op2_2(op, src, dst) op "w " src ", " dst
+#define __pcpu_op2_4(op, src, dst) op "l " src ", " dst
+#define __pcpu_op2_8(op, src, dst) op "q " src ", " dst
+
+#define __pcpu_reg_1(mod, x) mod "q" (x)
+#define __pcpu_reg_2(mod, x) mod "r" (x)
+#define __pcpu_reg_4(mod, x) mod "r" (x)
+#define __pcpu_reg_8(mod, x) mod "r" (x)
+
+#define __pcpu_reg_imm_1(x) "qi" (x)
+#define __pcpu_reg_imm_2(x) "ri" (x)
+#define __pcpu_reg_imm_4(x) "ri" (x)
+#define __pcpu_reg_imm_8(x) "re" (x)
+
 #define percpu_to_op(qual, op, var, val)		\
 do {							\
 	typedef typeof(var) pto_T__;			\
-- 
2.28.0.rc0.105.gf9edc3c819-goog


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v3 02/11] x86/percpu: Clean up percpu_to_op()
  2020-07-20 20:49 [PATCH v3 00/11] i386 Clang support Nick Desaulniers
  2020-07-20 20:49 ` [PATCH v3 01/11] x86/percpu: Introduce size abstraction macros Nick Desaulniers
@ 2020-07-20 20:49 ` Nick Desaulniers
  2020-07-23 10:43   ` [tip: x86/asm] " tip-bot2 for Brian Gerst
  2020-07-20 20:49 ` [PATCH v3 03/11] x86/percpu: Clean up percpu_from_op() Nick Desaulniers
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 34+ messages in thread
From: Nick Desaulniers @ 2020-07-20 20:49 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov
  Cc: Dennis Zhou, Tejun Heo, Christoph Lameter, x86, H. Peter Anvin,
	Al Viro, Andrew Morton, Peter Zijlstra, linux-kernel,
	clang-built-linux, Brian Gerst, Nick Desaulniers, Linus Torvalds

From: Brian Gerst <brgerst@gmail.com>

The core percpu macros already have a switch on the data size, so the switch
in the x86 code is redundant and produces more dead code.

Also use appropriate types for the width of the instructions.  This avoids
errors when compiling with Clang.

Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
---
 arch/x86/include/asm/percpu.h | 90 ++++++++++++++---------------------
 1 file changed, 35 insertions(+), 55 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index 19838e4f7a8f..fb280fba94c5 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -117,37 +117,17 @@ extern void __bad_percpu_size(void);
 #define __pcpu_reg_imm_4(x) "ri" (x)
 #define __pcpu_reg_imm_8(x) "re" (x)
 
-#define percpu_to_op(qual, op, var, val)		\
-do {							\
-	typedef typeof(var) pto_T__;			\
-	if (0) {					\
-		pto_T__ pto_tmp__;			\
-		pto_tmp__ = (val);			\
-		(void)pto_tmp__;			\
-	}						\
-	switch (sizeof(var)) {				\
-	case 1:						\
-		asm qual (op "b %1,"__percpu_arg(0)	\
-		    : "+m" (var)			\
-		    : "qi" ((pto_T__)(val)));		\
-		break;					\
-	case 2:						\
-		asm qual (op "w %1,"__percpu_arg(0)	\
-		    : "+m" (var)			\
-		    : "ri" ((pto_T__)(val)));		\
-		break;					\
-	case 4:						\
-		asm qual (op "l %1,"__percpu_arg(0)	\
-		    : "+m" (var)			\
-		    : "ri" ((pto_T__)(val)));		\
-		break;					\
-	case 8:						\
-		asm qual (op "q %1,"__percpu_arg(0)	\
-		    : "+m" (var)			\
-		    : "re" ((pto_T__)(val)));		\
-		break;					\
-	default: __bad_percpu_size();			\
-	}						\
+#define percpu_to_op(size, qual, op, _var, _val)			\
+do {									\
+	__pcpu_type_##size pto_val__ = __pcpu_cast_##size(_val);	\
+	if (0) {		                                        \
+		typeof(_var) pto_tmp__;					\
+		pto_tmp__ = (_val);					\
+		(void)pto_tmp__;					\
+	}								\
+	asm qual(__pcpu_op2_##size(op, "%[val]", __percpu_arg([var]))	\
+	    : [var] "+m" (_var)						\
+	    : [val] __pcpu_reg_imm_##size(pto_val__));			\
 } while (0)
 
 /*
@@ -425,18 +405,18 @@ do {									\
 #define raw_cpu_read_2(pcp)		percpu_from_op(, "mov", pcp)
 #define raw_cpu_read_4(pcp)		percpu_from_op(, "mov", pcp)
 
-#define raw_cpu_write_1(pcp, val)	percpu_to_op(, "mov", (pcp), val)
-#define raw_cpu_write_2(pcp, val)	percpu_to_op(, "mov", (pcp), val)
-#define raw_cpu_write_4(pcp, val)	percpu_to_op(, "mov", (pcp), val)
+#define raw_cpu_write_1(pcp, val)	percpu_to_op(1, , "mov", (pcp), val)
+#define raw_cpu_write_2(pcp, val)	percpu_to_op(2, , "mov", (pcp), val)
+#define raw_cpu_write_4(pcp, val)	percpu_to_op(4, , "mov", (pcp), val)
 #define raw_cpu_add_1(pcp, val)		percpu_add_op(, (pcp), val)
 #define raw_cpu_add_2(pcp, val)		percpu_add_op(, (pcp), val)
 #define raw_cpu_add_4(pcp, val)		percpu_add_op(, (pcp), val)
-#define raw_cpu_and_1(pcp, val)		percpu_to_op(, "and", (pcp), val)
-#define raw_cpu_and_2(pcp, val)		percpu_to_op(, "and", (pcp), val)
-#define raw_cpu_and_4(pcp, val)		percpu_to_op(, "and", (pcp), val)
-#define raw_cpu_or_1(pcp, val)		percpu_to_op(, "or", (pcp), val)
-#define raw_cpu_or_2(pcp, val)		percpu_to_op(, "or", (pcp), val)
-#define raw_cpu_or_4(pcp, val)		percpu_to_op(, "or", (pcp), val)
+#define raw_cpu_and_1(pcp, val)		percpu_to_op(1, , "and", (pcp), val)
+#define raw_cpu_and_2(pcp, val)		percpu_to_op(2, , "and", (pcp), val)
+#define raw_cpu_and_4(pcp, val)		percpu_to_op(4, , "and", (pcp), val)
+#define raw_cpu_or_1(pcp, val)		percpu_to_op(1, , "or", (pcp), val)
+#define raw_cpu_or_2(pcp, val)		percpu_to_op(2, , "or", (pcp), val)
+#define raw_cpu_or_4(pcp, val)		percpu_to_op(4, , "or", (pcp), val)
 
 /*
  * raw_cpu_xchg() can use a load-store since it is not required to be
@@ -456,18 +436,18 @@ do {									\
 #define this_cpu_read_1(pcp)		percpu_from_op(volatile, "mov", pcp)
 #define this_cpu_read_2(pcp)		percpu_from_op(volatile, "mov", pcp)
 #define this_cpu_read_4(pcp)		percpu_from_op(volatile, "mov", pcp)
-#define this_cpu_write_1(pcp, val)	percpu_to_op(volatile, "mov", (pcp), val)
-#define this_cpu_write_2(pcp, val)	percpu_to_op(volatile, "mov", (pcp), val)
-#define this_cpu_write_4(pcp, val)	percpu_to_op(volatile, "mov", (pcp), val)
+#define this_cpu_write_1(pcp, val)	percpu_to_op(1, volatile, "mov", (pcp), val)
+#define this_cpu_write_2(pcp, val)	percpu_to_op(2, volatile, "mov", (pcp), val)
+#define this_cpu_write_4(pcp, val)	percpu_to_op(4, volatile, "mov", (pcp), val)
 #define this_cpu_add_1(pcp, val)	percpu_add_op(volatile, (pcp), val)
 #define this_cpu_add_2(pcp, val)	percpu_add_op(volatile, (pcp), val)
 #define this_cpu_add_4(pcp, val)	percpu_add_op(volatile, (pcp), val)
-#define this_cpu_and_1(pcp, val)	percpu_to_op(volatile, "and", (pcp), val)
-#define this_cpu_and_2(pcp, val)	percpu_to_op(volatile, "and", (pcp), val)
-#define this_cpu_and_4(pcp, val)	percpu_to_op(volatile, "and", (pcp), val)
-#define this_cpu_or_1(pcp, val)		percpu_to_op(volatile, "or", (pcp), val)
-#define this_cpu_or_2(pcp, val)		percpu_to_op(volatile, "or", (pcp), val)
-#define this_cpu_or_4(pcp, val)		percpu_to_op(volatile, "or", (pcp), val)
+#define this_cpu_and_1(pcp, val)	percpu_to_op(1, volatile, "and", (pcp), val)
+#define this_cpu_and_2(pcp, val)	percpu_to_op(2, volatile, "and", (pcp), val)
+#define this_cpu_and_4(pcp, val)	percpu_to_op(4, volatile, "and", (pcp), val)
+#define this_cpu_or_1(pcp, val)		percpu_to_op(1, volatile, "or", (pcp), val)
+#define this_cpu_or_2(pcp, val)		percpu_to_op(2, volatile, "or", (pcp), val)
+#define this_cpu_or_4(pcp, val)		percpu_to_op(4, volatile, "or", (pcp), val)
 #define this_cpu_xchg_1(pcp, nval)	percpu_xchg_op(volatile, pcp, nval)
 #define this_cpu_xchg_2(pcp, nval)	percpu_xchg_op(volatile, pcp, nval)
 #define this_cpu_xchg_4(pcp, nval)	percpu_xchg_op(volatile, pcp, nval)
@@ -509,19 +489,19 @@ do {									\
  */
 #ifdef CONFIG_X86_64
 #define raw_cpu_read_8(pcp)			percpu_from_op(, "mov", pcp)
-#define raw_cpu_write_8(pcp, val)		percpu_to_op(, "mov", (pcp), val)
+#define raw_cpu_write_8(pcp, val)		percpu_to_op(8, , "mov", (pcp), val)
 #define raw_cpu_add_8(pcp, val)			percpu_add_op(, (pcp), val)
-#define raw_cpu_and_8(pcp, val)			percpu_to_op(, "and", (pcp), val)
-#define raw_cpu_or_8(pcp, val)			percpu_to_op(, "or", (pcp), val)
+#define raw_cpu_and_8(pcp, val)			percpu_to_op(8, , "and", (pcp), val)
+#define raw_cpu_or_8(pcp, val)			percpu_to_op(8, , "or", (pcp), val)
 #define raw_cpu_add_return_8(pcp, val)		percpu_add_return_op(, pcp, val)
 #define raw_cpu_xchg_8(pcp, nval)		raw_percpu_xchg_op(pcp, nval)
 #define raw_cpu_cmpxchg_8(pcp, oval, nval)	percpu_cmpxchg_op(, pcp, oval, nval)
 
 #define this_cpu_read_8(pcp)			percpu_from_op(volatile, "mov", pcp)
-#define this_cpu_write_8(pcp, val)		percpu_to_op(volatile, "mov", (pcp), val)
+#define this_cpu_write_8(pcp, val)		percpu_to_op(8, volatile, "mov", (pcp), val)
 #define this_cpu_add_8(pcp, val)		percpu_add_op(volatile, (pcp), val)
-#define this_cpu_and_8(pcp, val)		percpu_to_op(volatile, "and", (pcp), val)
-#define this_cpu_or_8(pcp, val)			percpu_to_op(volatile, "or", (pcp), val)
+#define this_cpu_and_8(pcp, val)		percpu_to_op(8, volatile, "and", (pcp), val)
+#define this_cpu_or_8(pcp, val)			percpu_to_op(8, volatile, "or", (pcp), val)
 #define this_cpu_add_return_8(pcp, val)		percpu_add_return_op(volatile, pcp, val)
 #define this_cpu_xchg_8(pcp, nval)		percpu_xchg_op(volatile, pcp, nval)
 #define this_cpu_cmpxchg_8(pcp, oval, nval)	percpu_cmpxchg_op(volatile, pcp, oval, nval)
-- 
2.28.0.rc0.105.gf9edc3c819-goog


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v3 03/11] x86/percpu: Clean up percpu_from_op()
  2020-07-20 20:49 [PATCH v3 00/11] i386 Clang support Nick Desaulniers
  2020-07-20 20:49 ` [PATCH v3 01/11] x86/percpu: Introduce size abstraction macros Nick Desaulniers
  2020-07-20 20:49 ` [PATCH v3 02/11] x86/percpu: Clean up percpu_to_op() Nick Desaulniers
@ 2020-07-20 20:49 ` Nick Desaulniers
  2020-07-23 10:43   ` [tip: x86/asm] " tip-bot2 for Brian Gerst
  2020-07-20 20:49 ` [PATCH v3 04/11] x86/percpu: Clean up percpu_add_op() Nick Desaulniers
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 34+ messages in thread
From: Nick Desaulniers @ 2020-07-20 20:49 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov
  Cc: Dennis Zhou, Tejun Heo, Christoph Lameter, x86, H. Peter Anvin,
	Al Viro, Andrew Morton, Peter Zijlstra, linux-kernel,
	clang-built-linux, Brian Gerst, Nick Desaulniers, Linus Torvalds

From: Brian Gerst <brgerst@gmail.com>

The core percpu macros already have a switch on the data size, so the switch
in the x86 code is redundant and produces more dead code.

Also use appropriate types for the width of the instructions.  This avoids
errors when compiling with Clang.

Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
---
 arch/x86/include/asm/percpu.h | 50 +++++++++++------------------------
 1 file changed, 15 insertions(+), 35 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index fb280fba94c5..a40d2e055f58 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -190,33 +190,13 @@ do {									\
 	}								\
 } while (0)
 
-#define percpu_from_op(qual, op, var)			\
-({							\
-	typeof(var) pfo_ret__;				\
-	switch (sizeof(var)) {				\
-	case 1:						\
-		asm qual (op "b "__percpu_arg(1)",%0"	\
-		    : "=q" (pfo_ret__)			\
-		    : "m" (var));			\
-		break;					\
-	case 2:						\
-		asm qual (op "w "__percpu_arg(1)",%0"	\
-		    : "=r" (pfo_ret__)			\
-		    : "m" (var));			\
-		break;					\
-	case 4:						\
-		asm qual (op "l "__percpu_arg(1)",%0"	\
-		    : "=r" (pfo_ret__)			\
-		    : "m" (var));			\
-		break;					\
-	case 8:						\
-		asm qual (op "q "__percpu_arg(1)",%0"	\
-		    : "=r" (pfo_ret__)			\
-		    : "m" (var));			\
-		break;					\
-	default: __bad_percpu_size();			\
-	}						\
-	pfo_ret__;					\
+#define percpu_from_op(size, qual, op, _var)				\
+({									\
+	__pcpu_type_##size pfo_val__;					\
+	asm qual (__pcpu_op2_##size(op, __percpu_arg([var]), "%[val]")	\
+	    : [val] __pcpu_reg_##size("=", pfo_val__)			\
+	    : [var] "m" (_var));					\
+	(typeof(_var))(unsigned long) pfo_val__;			\
 })
 
 #define percpu_stable_op(op, var)			\
@@ -401,9 +381,9 @@ do {									\
  */
 #define this_cpu_read_stable(var)	percpu_stable_op("mov", var)
 
-#define raw_cpu_read_1(pcp)		percpu_from_op(, "mov", pcp)
-#define raw_cpu_read_2(pcp)		percpu_from_op(, "mov", pcp)
-#define raw_cpu_read_4(pcp)		percpu_from_op(, "mov", pcp)
+#define raw_cpu_read_1(pcp)		percpu_from_op(1, , "mov", pcp)
+#define raw_cpu_read_2(pcp)		percpu_from_op(2, , "mov", pcp)
+#define raw_cpu_read_4(pcp)		percpu_from_op(4, , "mov", pcp)
 
 #define raw_cpu_write_1(pcp, val)	percpu_to_op(1, , "mov", (pcp), val)
 #define raw_cpu_write_2(pcp, val)	percpu_to_op(2, , "mov", (pcp), val)
@@ -433,9 +413,9 @@ do {									\
 #define raw_cpu_xchg_2(pcp, val)	raw_percpu_xchg_op(pcp, val)
 #define raw_cpu_xchg_4(pcp, val)	raw_percpu_xchg_op(pcp, val)
 
-#define this_cpu_read_1(pcp)		percpu_from_op(volatile, "mov", pcp)
-#define this_cpu_read_2(pcp)		percpu_from_op(volatile, "mov", pcp)
-#define this_cpu_read_4(pcp)		percpu_from_op(volatile, "mov", pcp)
+#define this_cpu_read_1(pcp)		percpu_from_op(1, volatile, "mov", pcp)
+#define this_cpu_read_2(pcp)		percpu_from_op(2, volatile, "mov", pcp)
+#define this_cpu_read_4(pcp)		percpu_from_op(4, volatile, "mov", pcp)
 #define this_cpu_write_1(pcp, val)	percpu_to_op(1, volatile, "mov", (pcp), val)
 #define this_cpu_write_2(pcp, val)	percpu_to_op(2, volatile, "mov", (pcp), val)
 #define this_cpu_write_4(pcp, val)	percpu_to_op(4, volatile, "mov", (pcp), val)
@@ -488,7 +468,7 @@ do {									\
  * 32 bit must fall back to generic operations.
  */
 #ifdef CONFIG_X86_64
-#define raw_cpu_read_8(pcp)			percpu_from_op(, "mov", pcp)
+#define raw_cpu_read_8(pcp)			percpu_from_op(8, , "mov", pcp)
 #define raw_cpu_write_8(pcp, val)		percpu_to_op(8, , "mov", (pcp), val)
 #define raw_cpu_add_8(pcp, val)			percpu_add_op(, (pcp), val)
 #define raw_cpu_and_8(pcp, val)			percpu_to_op(8, , "and", (pcp), val)
@@ -497,7 +477,7 @@ do {									\
 #define raw_cpu_xchg_8(pcp, nval)		raw_percpu_xchg_op(pcp, nval)
 #define raw_cpu_cmpxchg_8(pcp, oval, nval)	percpu_cmpxchg_op(, pcp, oval, nval)
 
-#define this_cpu_read_8(pcp)			percpu_from_op(volatile, "mov", pcp)
+#define this_cpu_read_8(pcp)			percpu_from_op(8, volatile, "mov", pcp)
 #define this_cpu_write_8(pcp, val)		percpu_to_op(8, volatile, "mov", (pcp), val)
 #define this_cpu_add_8(pcp, val)		percpu_add_op(volatile, (pcp), val)
 #define this_cpu_and_8(pcp, val)		percpu_to_op(8, volatile, "and", (pcp), val)
-- 
2.28.0.rc0.105.gf9edc3c819-goog


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v3 04/11] x86/percpu: Clean up percpu_add_op()
  2020-07-20 20:49 [PATCH v3 00/11] i386 Clang support Nick Desaulniers
                   ` (2 preceding siblings ...)
  2020-07-20 20:49 ` [PATCH v3 03/11] x86/percpu: Clean up percpu_from_op() Nick Desaulniers
@ 2020-07-20 20:49 ` Nick Desaulniers
  2020-07-23 10:43   ` [tip: x86/asm] " tip-bot2 for Brian Gerst
  2020-07-20 20:49 ` [PATCH v3 05/11] x86/percpu: Remove "e" constraint from XADD Nick Desaulniers
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 34+ messages in thread
From: Nick Desaulniers @ 2020-07-20 20:49 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov
  Cc: Dennis Zhou, Tejun Heo, Christoph Lameter, x86, H. Peter Anvin,
	Al Viro, Andrew Morton, Peter Zijlstra, linux-kernel,
	clang-built-linux, Brian Gerst, Nick Desaulniers, Linus Torvalds

From: Brian Gerst <brgerst@gmail.com>

The core percpu macros already have a switch on the data size, so the switch
in the x86 code is redundant and produces more dead code.

Also use appropriate types for the width of the instructions.  This avoids
errors when compiling with Clang.

Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
---
 arch/x86/include/asm/percpu.h | 99 ++++++++---------------------------
 1 file changed, 22 insertions(+), 77 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index a40d2e055f58..2a24f3c795eb 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -130,64 +130,32 @@ do {									\
 	    : [val] __pcpu_reg_imm_##size(pto_val__));			\
 } while (0)
 
+#define percpu_unary_op(size, qual, op, _var)				\
+({									\
+	asm qual (__pcpu_op1_##size(op, __percpu_arg([var]))		\
+	    : [var] "+m" (_var));					\
+})
+
 /*
  * Generate a percpu add to memory instruction and optimize code
  * if one is added or subtracted.
  */
-#define percpu_add_op(qual, var, val)					\
+#define percpu_add_op(size, qual, var, val)				\
 do {									\
-	typedef typeof(var) pao_T__;					\
 	const int pao_ID__ = (__builtin_constant_p(val) &&		\
 			      ((val) == 1 || (val) == -1)) ?		\
 				(int)(val) : 0;				\
 	if (0) {							\
-		pao_T__ pao_tmp__;					\
+		typeof(var) pao_tmp__;					\
 		pao_tmp__ = (val);					\
 		(void)pao_tmp__;					\
 	}								\
-	switch (sizeof(var)) {						\
-	case 1:								\
-		if (pao_ID__ == 1)					\
-			asm qual ("incb "__percpu_arg(0) : "+m" (var));	\
-		else if (pao_ID__ == -1)				\
-			asm qual ("decb "__percpu_arg(0) : "+m" (var));	\
-		else							\
-			asm qual ("addb %1, "__percpu_arg(0)		\
-			    : "+m" (var)				\
-			    : "qi" ((pao_T__)(val)));			\
-		break;							\
-	case 2:								\
-		if (pao_ID__ == 1)					\
-			asm qual ("incw "__percpu_arg(0) : "+m" (var));	\
-		else if (pao_ID__ == -1)				\
-			asm qual ("decw "__percpu_arg(0) : "+m" (var));	\
-		else							\
-			asm qual ("addw %1, "__percpu_arg(0)		\
-			    : "+m" (var)				\
-			    : "ri" ((pao_T__)(val)));			\
-		break;							\
-	case 4:								\
-		if (pao_ID__ == 1)					\
-			asm qual ("incl "__percpu_arg(0) : "+m" (var));	\
-		else if (pao_ID__ == -1)				\
-			asm qual ("decl "__percpu_arg(0) : "+m" (var));	\
-		else							\
-			asm qual ("addl %1, "__percpu_arg(0)		\
-			    : "+m" (var)				\
-			    : "ri" ((pao_T__)(val)));			\
-		break;							\
-	case 8:								\
-		if (pao_ID__ == 1)					\
-			asm qual ("incq "__percpu_arg(0) : "+m" (var));	\
-		else if (pao_ID__ == -1)				\
-			asm qual ("decq "__percpu_arg(0) : "+m" (var));	\
-		else							\
-			asm qual ("addq %1, "__percpu_arg(0)		\
-			    : "+m" (var)				\
-			    : "re" ((pao_T__)(val)));			\
-		break;							\
-	default: __bad_percpu_size();					\
-	}								\
+	if (pao_ID__ == 1)						\
+		percpu_unary_op(size, qual, "inc", var);		\
+	else if (pao_ID__ == -1)					\
+		percpu_unary_op(size, qual, "dec", var);		\
+	else								\
+		percpu_to_op(size, qual, "add", var, val);		\
 } while (0)
 
 #define percpu_from_op(size, qual, op, _var)				\
@@ -228,29 +196,6 @@ do {									\
 	pfo_ret__;					\
 })
 
-#define percpu_unary_op(qual, op, var)			\
-({							\
-	switch (sizeof(var)) {				\
-	case 1:						\
-		asm qual (op "b "__percpu_arg(0)	\
-		    : "+m" (var));			\
-		break;					\
-	case 2:						\
-		asm qual (op "w "__percpu_arg(0)	\
-		    : "+m" (var));			\
-		break;					\
-	case 4:						\
-		asm qual (op "l "__percpu_arg(0)	\
-		    : "+m" (var));			\
-		break;					\
-	case 8:						\
-		asm qual (op "q "__percpu_arg(0)	\
-		    : "+m" (var));			\
-		break;					\
-	default: __bad_percpu_size();			\
-	}						\
-})
-
 /*
  * Add return operation
  */
@@ -388,9 +333,9 @@ do {									\
 #define raw_cpu_write_1(pcp, val)	percpu_to_op(1, , "mov", (pcp), val)
 #define raw_cpu_write_2(pcp, val)	percpu_to_op(2, , "mov", (pcp), val)
 #define raw_cpu_write_4(pcp, val)	percpu_to_op(4, , "mov", (pcp), val)
-#define raw_cpu_add_1(pcp, val)		percpu_add_op(, (pcp), val)
-#define raw_cpu_add_2(pcp, val)		percpu_add_op(, (pcp), val)
-#define raw_cpu_add_4(pcp, val)		percpu_add_op(, (pcp), val)
+#define raw_cpu_add_1(pcp, val)		percpu_add_op(1, , (pcp), val)
+#define raw_cpu_add_2(pcp, val)		percpu_add_op(2, , (pcp), val)
+#define raw_cpu_add_4(pcp, val)		percpu_add_op(4, , (pcp), val)
 #define raw_cpu_and_1(pcp, val)		percpu_to_op(1, , "and", (pcp), val)
 #define raw_cpu_and_2(pcp, val)		percpu_to_op(2, , "and", (pcp), val)
 #define raw_cpu_and_4(pcp, val)		percpu_to_op(4, , "and", (pcp), val)
@@ -419,9 +364,9 @@ do {									\
 #define this_cpu_write_1(pcp, val)	percpu_to_op(1, volatile, "mov", (pcp), val)
 #define this_cpu_write_2(pcp, val)	percpu_to_op(2, volatile, "mov", (pcp), val)
 #define this_cpu_write_4(pcp, val)	percpu_to_op(4, volatile, "mov", (pcp), val)
-#define this_cpu_add_1(pcp, val)	percpu_add_op(volatile, (pcp), val)
-#define this_cpu_add_2(pcp, val)	percpu_add_op(volatile, (pcp), val)
-#define this_cpu_add_4(pcp, val)	percpu_add_op(volatile, (pcp), val)
+#define this_cpu_add_1(pcp, val)	percpu_add_op(1, volatile, (pcp), val)
+#define this_cpu_add_2(pcp, val)	percpu_add_op(2, volatile, (pcp), val)
+#define this_cpu_add_4(pcp, val)	percpu_add_op(4, volatile, (pcp), val)
 #define this_cpu_and_1(pcp, val)	percpu_to_op(1, volatile, "and", (pcp), val)
 #define this_cpu_and_2(pcp, val)	percpu_to_op(2, volatile, "and", (pcp), val)
 #define this_cpu_and_4(pcp, val)	percpu_to_op(4, volatile, "and", (pcp), val)
@@ -470,7 +415,7 @@ do {									\
 #ifdef CONFIG_X86_64
 #define raw_cpu_read_8(pcp)			percpu_from_op(8, , "mov", pcp)
 #define raw_cpu_write_8(pcp, val)		percpu_to_op(8, , "mov", (pcp), val)
-#define raw_cpu_add_8(pcp, val)			percpu_add_op(, (pcp), val)
+#define raw_cpu_add_8(pcp, val)			percpu_add_op(8, , (pcp), val)
 #define raw_cpu_and_8(pcp, val)			percpu_to_op(8, , "and", (pcp), val)
 #define raw_cpu_or_8(pcp, val)			percpu_to_op(8, , "or", (pcp), val)
 #define raw_cpu_add_return_8(pcp, val)		percpu_add_return_op(, pcp, val)
@@ -479,7 +424,7 @@ do {									\
 
 #define this_cpu_read_8(pcp)			percpu_from_op(8, volatile, "mov", pcp)
 #define this_cpu_write_8(pcp, val)		percpu_to_op(8, volatile, "mov", (pcp), val)
-#define this_cpu_add_8(pcp, val)		percpu_add_op(volatile, (pcp), val)
+#define this_cpu_add_8(pcp, val)		percpu_add_op(8, volatile, (pcp), val)
 #define this_cpu_and_8(pcp, val)		percpu_to_op(8, volatile, "and", (pcp), val)
 #define this_cpu_or_8(pcp, val)			percpu_to_op(8, volatile, "or", (pcp), val)
 #define this_cpu_add_return_8(pcp, val)		percpu_add_return_op(volatile, pcp, val)
-- 
2.28.0.rc0.105.gf9edc3c819-goog


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v3 05/11] x86/percpu: Remove "e" constraint from XADD
  2020-07-20 20:49 [PATCH v3 00/11] i386 Clang support Nick Desaulniers
                   ` (3 preceding siblings ...)
  2020-07-20 20:49 ` [PATCH v3 04/11] x86/percpu: Clean up percpu_add_op() Nick Desaulniers
@ 2020-07-20 20:49 ` Nick Desaulniers
  2020-07-23 10:43   ` [tip: x86/asm] " tip-bot2 for Brian Gerst
  2020-07-20 20:49 ` [PATCH v3 06/11] x86/percpu: Clean up percpu_add_return_op() Nick Desaulniers
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 34+ messages in thread
From: Nick Desaulniers @ 2020-07-20 20:49 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov
  Cc: Dennis Zhou, Tejun Heo, Christoph Lameter, x86, H. Peter Anvin,
	Al Viro, Andrew Morton, Peter Zijlstra, linux-kernel,
	clang-built-linux, Brian Gerst, Nick Desaulniers, Linus Torvalds

From: Brian Gerst <brgerst@gmail.com>

The "e" constraint represents a constant, but the XADD instruction doesn't
accept immediate operands.

Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
---
 arch/x86/include/asm/percpu.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index 2a24f3c795eb..9bb5440d98d3 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -220,7 +220,7 @@ do {									\
 		break;							\
 	case 8:								\
 		asm qual ("xaddq %0, "__percpu_arg(1)			\
-			    : "+re" (paro_ret__), "+m" (var)		\
+			    : "+r" (paro_ret__), "+m" (var)		\
 			    : : "memory");				\
 		break;							\
 	default: __bad_percpu_size();					\
-- 
2.28.0.rc0.105.gf9edc3c819-goog


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v3 06/11] x86/percpu: Clean up percpu_add_return_op()
  2020-07-20 20:49 [PATCH v3 00/11] i386 Clang support Nick Desaulniers
                   ` (4 preceding siblings ...)
  2020-07-20 20:49 ` [PATCH v3 05/11] x86/percpu: Remove "e" constraint from XADD Nick Desaulniers
@ 2020-07-20 20:49 ` Nick Desaulniers
  2020-07-23 10:43   ` [tip: x86/asm] " tip-bot2 for Brian Gerst
  2020-07-20 20:49 ` [PATCH v3 07/11] x86/percpu: Clean up percpu_xchg_op() Nick Desaulniers
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 34+ messages in thread
From: Nick Desaulniers @ 2020-07-20 20:49 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov
  Cc: Dennis Zhou, Tejun Heo, Christoph Lameter, x86, H. Peter Anvin,
	Al Viro, Andrew Morton, Peter Zijlstra, linux-kernel,
	clang-built-linux, Brian Gerst, Nick Desaulniers, Linus Torvalds

From: Brian Gerst <brgerst@gmail.com>

The core percpu macros already have a switch on the data size, so the switch
in the x86 code is redundant and produces more dead code.

Also use appropriate types for the width of the instructions.  This avoids
errors when compiling with Clang.

Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
---
 arch/x86/include/asm/percpu.h | 51 +++++++++++------------------------
 1 file changed, 16 insertions(+), 35 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index 9bb5440d98d3..0776a11e7e11 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -199,34 +199,15 @@ do {									\
 /*
  * Add return operation
  */
-#define percpu_add_return_op(qual, var, val)				\
+#define percpu_add_return_op(size, qual, _var, _val)			\
 ({									\
-	typeof(var) paro_ret__ = val;					\
-	switch (sizeof(var)) {						\
-	case 1:								\
-		asm qual ("xaddb %0, "__percpu_arg(1)			\
-			    : "+q" (paro_ret__), "+m" (var)		\
-			    : : "memory");				\
-		break;							\
-	case 2:								\
-		asm qual ("xaddw %0, "__percpu_arg(1)			\
-			    : "+r" (paro_ret__), "+m" (var)		\
-			    : : "memory");				\
-		break;							\
-	case 4:								\
-		asm qual ("xaddl %0, "__percpu_arg(1)			\
-			    : "+r" (paro_ret__), "+m" (var)		\
-			    : : "memory");				\
-		break;							\
-	case 8:								\
-		asm qual ("xaddq %0, "__percpu_arg(1)			\
-			    : "+r" (paro_ret__), "+m" (var)		\
-			    : : "memory");				\
-		break;							\
-	default: __bad_percpu_size();					\
-	}								\
-	paro_ret__ += val;						\
-	paro_ret__;							\
+	__pcpu_type_##size paro_tmp__ = __pcpu_cast_##size(_val);	\
+	asm qual (__pcpu_op2_##size("xadd", "%[tmp]",			\
+				     __percpu_arg([var]))		\
+		  : [tmp] __pcpu_reg_##size("+", paro_tmp__),		\
+		    [var] "+m" (_var)					\
+		  : : "memory");					\
+	(typeof(_var))(unsigned long) (paro_tmp__ + _val);		\
 })
 
 /*
@@ -377,16 +358,16 @@ do {									\
 #define this_cpu_xchg_2(pcp, nval)	percpu_xchg_op(volatile, pcp, nval)
 #define this_cpu_xchg_4(pcp, nval)	percpu_xchg_op(volatile, pcp, nval)
 
-#define raw_cpu_add_return_1(pcp, val)		percpu_add_return_op(, pcp, val)
-#define raw_cpu_add_return_2(pcp, val)		percpu_add_return_op(, pcp, val)
-#define raw_cpu_add_return_4(pcp, val)		percpu_add_return_op(, pcp, val)
+#define raw_cpu_add_return_1(pcp, val)		percpu_add_return_op(1, , pcp, val)
+#define raw_cpu_add_return_2(pcp, val)		percpu_add_return_op(2, , pcp, val)
+#define raw_cpu_add_return_4(pcp, val)		percpu_add_return_op(4, , pcp, val)
 #define raw_cpu_cmpxchg_1(pcp, oval, nval)	percpu_cmpxchg_op(, pcp, oval, nval)
 #define raw_cpu_cmpxchg_2(pcp, oval, nval)	percpu_cmpxchg_op(, pcp, oval, nval)
 #define raw_cpu_cmpxchg_4(pcp, oval, nval)	percpu_cmpxchg_op(, pcp, oval, nval)
 
-#define this_cpu_add_return_1(pcp, val)		percpu_add_return_op(volatile, pcp, val)
-#define this_cpu_add_return_2(pcp, val)		percpu_add_return_op(volatile, pcp, val)
-#define this_cpu_add_return_4(pcp, val)		percpu_add_return_op(volatile, pcp, val)
+#define this_cpu_add_return_1(pcp, val)		percpu_add_return_op(1, volatile, pcp, val)
+#define this_cpu_add_return_2(pcp, val)		percpu_add_return_op(2, volatile, pcp, val)
+#define this_cpu_add_return_4(pcp, val)		percpu_add_return_op(4, volatile, pcp, val)
 #define this_cpu_cmpxchg_1(pcp, oval, nval)	percpu_cmpxchg_op(volatile, pcp, oval, nval)
 #define this_cpu_cmpxchg_2(pcp, oval, nval)	percpu_cmpxchg_op(volatile, pcp, oval, nval)
 #define this_cpu_cmpxchg_4(pcp, oval, nval)	percpu_cmpxchg_op(volatile, pcp, oval, nval)
@@ -418,7 +399,7 @@ do {									\
 #define raw_cpu_add_8(pcp, val)			percpu_add_op(8, , (pcp), val)
 #define raw_cpu_and_8(pcp, val)			percpu_to_op(8, , "and", (pcp), val)
 #define raw_cpu_or_8(pcp, val)			percpu_to_op(8, , "or", (pcp), val)
-#define raw_cpu_add_return_8(pcp, val)		percpu_add_return_op(, pcp, val)
+#define raw_cpu_add_return_8(pcp, val)		percpu_add_return_op(8, , pcp, val)
 #define raw_cpu_xchg_8(pcp, nval)		raw_percpu_xchg_op(pcp, nval)
 #define raw_cpu_cmpxchg_8(pcp, oval, nval)	percpu_cmpxchg_op(, pcp, oval, nval)
 
@@ -427,7 +408,7 @@ do {									\
 #define this_cpu_add_8(pcp, val)		percpu_add_op(8, volatile, (pcp), val)
 #define this_cpu_and_8(pcp, val)		percpu_to_op(8, volatile, "and", (pcp), val)
 #define this_cpu_or_8(pcp, val)			percpu_to_op(8, volatile, "or", (pcp), val)
-#define this_cpu_add_return_8(pcp, val)		percpu_add_return_op(volatile, pcp, val)
+#define this_cpu_add_return_8(pcp, val)		percpu_add_return_op(8, volatile, pcp, val)
 #define this_cpu_xchg_8(pcp, nval)		percpu_xchg_op(volatile, pcp, nval)
 #define this_cpu_cmpxchg_8(pcp, oval, nval)	percpu_cmpxchg_op(volatile, pcp, oval, nval)
 
-- 
2.28.0.rc0.105.gf9edc3c819-goog


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v3 07/11] x86/percpu: Clean up percpu_xchg_op()
  2020-07-20 20:49 [PATCH v3 00/11] i386 Clang support Nick Desaulniers
                   ` (5 preceding siblings ...)
  2020-07-20 20:49 ` [PATCH v3 06/11] x86/percpu: Clean up percpu_add_return_op() Nick Desaulniers
@ 2020-07-20 20:49 ` Nick Desaulniers
  2020-07-23 10:43   ` [tip: x86/asm] " tip-bot2 for Brian Gerst
  2020-07-20 20:49 ` [PATCH v3 08/11] x86/percpu: Clean up percpu_cmpxchg_op() Nick Desaulniers
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 34+ messages in thread
From: Nick Desaulniers @ 2020-07-20 20:49 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov
  Cc: Dennis Zhou, Tejun Heo, Christoph Lameter, x86, H. Peter Anvin,
	Al Viro, Andrew Morton, Peter Zijlstra, linux-kernel,
	clang-built-linux, Brian Gerst, Nick Desaulniers, Linus Torvalds

From: Brian Gerst <brgerst@gmail.com>

The core percpu macros already have a switch on the data size, so the switch
in the x86 code is redundant and produces more dead code.

Also use appropriate types for the width of the instructions.  This avoids
errors when compiling with Clang.

Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
---
 arch/x86/include/asm/percpu.h | 61 +++++++++++------------------------
 1 file changed, 18 insertions(+), 43 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index 0776a11e7e11..ac6d7e76c0d4 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -215,46 +215,21 @@ do {									\
  * expensive due to the implied lock prefix.  The processor cannot prefetch
  * cachelines if xchg is used.
  */
-#define percpu_xchg_op(qual, var, nval)					\
+#define percpu_xchg_op(size, qual, _var, _nval)				\
 ({									\
-	typeof(var) pxo_ret__;						\
-	typeof(var) pxo_new__ = (nval);					\
-	switch (sizeof(var)) {						\
-	case 1:								\
-		asm qual ("\n\tmov "__percpu_arg(1)",%%al"		\
-		    "\n1:\tcmpxchgb %2, "__percpu_arg(1)		\
-		    "\n\tjnz 1b"					\
-			    : "=&a" (pxo_ret__), "+m" (var)		\
-			    : "q" (pxo_new__)				\
-			    : "memory");				\
-		break;							\
-	case 2:								\
-		asm qual ("\n\tmov "__percpu_arg(1)",%%ax"		\
-		    "\n1:\tcmpxchgw %2, "__percpu_arg(1)		\
-		    "\n\tjnz 1b"					\
-			    : "=&a" (pxo_ret__), "+m" (var)		\
-			    : "r" (pxo_new__)				\
-			    : "memory");				\
-		break;							\
-	case 4:								\
-		asm qual ("\n\tmov "__percpu_arg(1)",%%eax"		\
-		    "\n1:\tcmpxchgl %2, "__percpu_arg(1)		\
-		    "\n\tjnz 1b"					\
-			    : "=&a" (pxo_ret__), "+m" (var)		\
-			    : "r" (pxo_new__)				\
-			    : "memory");				\
-		break;							\
-	case 8:								\
-		asm qual ("\n\tmov "__percpu_arg(1)",%%rax"		\
-		    "\n1:\tcmpxchgq %2, "__percpu_arg(1)		\
-		    "\n\tjnz 1b"					\
-			    : "=&a" (pxo_ret__), "+m" (var)		\
-			    : "r" (pxo_new__)				\
-			    : "memory");				\
-		break;							\
-	default: __bad_percpu_size();					\
-	}								\
-	pxo_ret__;							\
+	__pcpu_type_##size pxo_old__;					\
+	__pcpu_type_##size pxo_new__ = __pcpu_cast_##size(_nval);	\
+	asm qual (__pcpu_op2_##size("mov", __percpu_arg([var]),		\
+				    "%[oval]")				\
+		  "\n1:\t"						\
+		  __pcpu_op2_##size("cmpxchg", "%[nval]",		\
+				    __percpu_arg([var]))		\
+		  "\n\tjnz 1b"						\
+		  : [oval] "=&a" (pxo_old__),				\
+		    [var] "+m" (_var)					\
+		  : [nval] __pcpu_reg_##size(, pxo_new__)		\
+		  : "memory");						\
+	(typeof(_var))(unsigned long) pxo_old__;			\
 })
 
 /*
@@ -354,9 +329,9 @@ do {									\
 #define this_cpu_or_1(pcp, val)		percpu_to_op(1, volatile, "or", (pcp), val)
 #define this_cpu_or_2(pcp, val)		percpu_to_op(2, volatile, "or", (pcp), val)
 #define this_cpu_or_4(pcp, val)		percpu_to_op(4, volatile, "or", (pcp), val)
-#define this_cpu_xchg_1(pcp, nval)	percpu_xchg_op(volatile, pcp, nval)
-#define this_cpu_xchg_2(pcp, nval)	percpu_xchg_op(volatile, pcp, nval)
-#define this_cpu_xchg_4(pcp, nval)	percpu_xchg_op(volatile, pcp, nval)
+#define this_cpu_xchg_1(pcp, nval)	percpu_xchg_op(1, volatile, pcp, nval)
+#define this_cpu_xchg_2(pcp, nval)	percpu_xchg_op(2, volatile, pcp, nval)
+#define this_cpu_xchg_4(pcp, nval)	percpu_xchg_op(4, volatile, pcp, nval)
 
 #define raw_cpu_add_return_1(pcp, val)		percpu_add_return_op(1, , pcp, val)
 #define raw_cpu_add_return_2(pcp, val)		percpu_add_return_op(2, , pcp, val)
@@ -409,7 +384,7 @@ do {									\
 #define this_cpu_and_8(pcp, val)		percpu_to_op(8, volatile, "and", (pcp), val)
 #define this_cpu_or_8(pcp, val)			percpu_to_op(8, volatile, "or", (pcp), val)
 #define this_cpu_add_return_8(pcp, val)		percpu_add_return_op(8, volatile, pcp, val)
-#define this_cpu_xchg_8(pcp, nval)		percpu_xchg_op(volatile, pcp, nval)
+#define this_cpu_xchg_8(pcp, nval)		percpu_xchg_op(8, volatile, pcp, nval)
 #define this_cpu_cmpxchg_8(pcp, oval, nval)	percpu_cmpxchg_op(volatile, pcp, oval, nval)
 
 /*
-- 
2.28.0.rc0.105.gf9edc3c819-goog


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v3 08/11] x86/percpu: Clean up percpu_cmpxchg_op()
  2020-07-20 20:49 [PATCH v3 00/11] i386 Clang support Nick Desaulniers
                   ` (6 preceding siblings ...)
  2020-07-20 20:49 ` [PATCH v3 07/11] x86/percpu: Clean up percpu_xchg_op() Nick Desaulniers
@ 2020-07-20 20:49 ` Nick Desaulniers
  2020-07-23 10:43   ` [tip: x86/asm] " tip-bot2 for Brian Gerst
  2020-07-20 20:49 ` [PATCH v3 09/11] x86/percpu: Clean up percpu_stable_op() Nick Desaulniers
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 34+ messages in thread
From: Nick Desaulniers @ 2020-07-20 20:49 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov
  Cc: Dennis Zhou, Tejun Heo, Christoph Lameter, x86, H. Peter Anvin,
	Al Viro, Andrew Morton, Peter Zijlstra, linux-kernel,
	clang-built-linux, Brian Gerst, Nick Desaulniers, Linus Torvalds

From: Brian Gerst <brgerst@gmail.com>

The core percpu macros already have a switch on the data size, so the switch
in the x86 code is redundant and produces more dead code.

Also use appropriate types for the width of the instructions.  This avoids
errors when compiling with Clang.

Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
---
 arch/x86/include/asm/percpu.h | 58 +++++++++++------------------------
 1 file changed, 18 insertions(+), 40 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index ac6d7e76c0d4..7efc0b5c4ff0 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -236,39 +236,17 @@ do {									\
  * cmpxchg has no such implied lock semantics as a result it is much
  * more efficient for cpu local operations.
  */
-#define percpu_cmpxchg_op(qual, var, oval, nval)			\
+#define percpu_cmpxchg_op(size, qual, _var, _oval, _nval)		\
 ({									\
-	typeof(var) pco_ret__;						\
-	typeof(var) pco_old__ = (oval);					\
-	typeof(var) pco_new__ = (nval);					\
-	switch (sizeof(var)) {						\
-	case 1:								\
-		asm qual ("cmpxchgb %2, "__percpu_arg(1)		\
-			    : "=a" (pco_ret__), "+m" (var)		\
-			    : "q" (pco_new__), "0" (pco_old__)		\
-			    : "memory");				\
-		break;							\
-	case 2:								\
-		asm qual ("cmpxchgw %2, "__percpu_arg(1)		\
-			    : "=a" (pco_ret__), "+m" (var)		\
-			    : "r" (pco_new__), "0" (pco_old__)		\
-			    : "memory");				\
-		break;							\
-	case 4:								\
-		asm qual ("cmpxchgl %2, "__percpu_arg(1)		\
-			    : "=a" (pco_ret__), "+m" (var)		\
-			    : "r" (pco_new__), "0" (pco_old__)		\
-			    : "memory");				\
-		break;							\
-	case 8:								\
-		asm qual ("cmpxchgq %2, "__percpu_arg(1)		\
-			    : "=a" (pco_ret__), "+m" (var)		\
-			    : "r" (pco_new__), "0" (pco_old__)		\
-			    : "memory");				\
-		break;							\
-	default: __bad_percpu_size();					\
-	}								\
-	pco_ret__;							\
+	__pcpu_type_##size pco_old__ = __pcpu_cast_##size(_oval);	\
+	__pcpu_type_##size pco_new__ = __pcpu_cast_##size(_nval);	\
+	asm qual (__pcpu_op2_##size("cmpxchg", "%[nval]",		\
+				    __percpu_arg([var]))		\
+		  : [oval] "+a" (pco_old__),				\
+		    [var] "+m" (_var)					\
+		  : [nval] __pcpu_reg_##size(, pco_new__)		\
+		  : "memory");						\
+	(typeof(_var))(unsigned long) pco_old__;			\
 })
 
 /*
@@ -336,16 +314,16 @@ do {									\
 #define raw_cpu_add_return_1(pcp, val)		percpu_add_return_op(1, , pcp, val)
 #define raw_cpu_add_return_2(pcp, val)		percpu_add_return_op(2, , pcp, val)
 #define raw_cpu_add_return_4(pcp, val)		percpu_add_return_op(4, , pcp, val)
-#define raw_cpu_cmpxchg_1(pcp, oval, nval)	percpu_cmpxchg_op(, pcp, oval, nval)
-#define raw_cpu_cmpxchg_2(pcp, oval, nval)	percpu_cmpxchg_op(, pcp, oval, nval)
-#define raw_cpu_cmpxchg_4(pcp, oval, nval)	percpu_cmpxchg_op(, pcp, oval, nval)
+#define raw_cpu_cmpxchg_1(pcp, oval, nval)	percpu_cmpxchg_op(1, , pcp, oval, nval)
+#define raw_cpu_cmpxchg_2(pcp, oval, nval)	percpu_cmpxchg_op(2, , pcp, oval, nval)
+#define raw_cpu_cmpxchg_4(pcp, oval, nval)	percpu_cmpxchg_op(4, , pcp, oval, nval)
 
 #define this_cpu_add_return_1(pcp, val)		percpu_add_return_op(1, volatile, pcp, val)
 #define this_cpu_add_return_2(pcp, val)		percpu_add_return_op(2, volatile, pcp, val)
 #define this_cpu_add_return_4(pcp, val)		percpu_add_return_op(4, volatile, pcp, val)
-#define this_cpu_cmpxchg_1(pcp, oval, nval)	percpu_cmpxchg_op(volatile, pcp, oval, nval)
-#define this_cpu_cmpxchg_2(pcp, oval, nval)	percpu_cmpxchg_op(volatile, pcp, oval, nval)
-#define this_cpu_cmpxchg_4(pcp, oval, nval)	percpu_cmpxchg_op(volatile, pcp, oval, nval)
+#define this_cpu_cmpxchg_1(pcp, oval, nval)	percpu_cmpxchg_op(1, volatile, pcp, oval, nval)
+#define this_cpu_cmpxchg_2(pcp, oval, nval)	percpu_cmpxchg_op(2, volatile, pcp, oval, nval)
+#define this_cpu_cmpxchg_4(pcp, oval, nval)	percpu_cmpxchg_op(4, volatile, pcp, oval, nval)
 
 #ifdef CONFIG_X86_CMPXCHG64
 #define percpu_cmpxchg8b_double(pcp1, pcp2, o1, o2, n1, n2)		\
@@ -376,7 +354,7 @@ do {									\
 #define raw_cpu_or_8(pcp, val)			percpu_to_op(8, , "or", (pcp), val)
 #define raw_cpu_add_return_8(pcp, val)		percpu_add_return_op(8, , pcp, val)
 #define raw_cpu_xchg_8(pcp, nval)		raw_percpu_xchg_op(pcp, nval)
-#define raw_cpu_cmpxchg_8(pcp, oval, nval)	percpu_cmpxchg_op(, pcp, oval, nval)
+#define raw_cpu_cmpxchg_8(pcp, oval, nval)	percpu_cmpxchg_op(8, , pcp, oval, nval)
 
 #define this_cpu_read_8(pcp)			percpu_from_op(8, volatile, "mov", pcp)
 #define this_cpu_write_8(pcp, val)		percpu_to_op(8, volatile, "mov", (pcp), val)
@@ -385,7 +363,7 @@ do {									\
 #define this_cpu_or_8(pcp, val)			percpu_to_op(8, volatile, "or", (pcp), val)
 #define this_cpu_add_return_8(pcp, val)		percpu_add_return_op(8, volatile, pcp, val)
 #define this_cpu_xchg_8(pcp, nval)		percpu_xchg_op(8, volatile, pcp, nval)
-#define this_cpu_cmpxchg_8(pcp, oval, nval)	percpu_cmpxchg_op(volatile, pcp, oval, nval)
+#define this_cpu_cmpxchg_8(pcp, oval, nval)	percpu_cmpxchg_op(8, volatile, pcp, oval, nval)
 
 /*
  * Pretty complex macro to generate cmpxchg16 instruction.  The instruction
-- 
2.28.0.rc0.105.gf9edc3c819-goog


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v3 09/11] x86/percpu: Clean up percpu_stable_op()
  2020-07-20 20:49 [PATCH v3 00/11] i386 Clang support Nick Desaulniers
                   ` (7 preceding siblings ...)
  2020-07-20 20:49 ` [PATCH v3 08/11] x86/percpu: Clean up percpu_cmpxchg_op() Nick Desaulniers
@ 2020-07-20 20:49 ` Nick Desaulniers
  2020-07-23 10:43   ` [tip: x86/asm] " tip-bot2 for Brian Gerst
  2020-07-20 20:49 ` [PATCH v3 10/11] x86/percpu: Remove unused PER_CPU() macro Nick Desaulniers
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 34+ messages in thread
From: Nick Desaulniers @ 2020-07-20 20:49 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov
  Cc: Dennis Zhou, Tejun Heo, Christoph Lameter, x86, H. Peter Anvin,
	Al Viro, Andrew Morton, Peter Zijlstra, linux-kernel,
	clang-built-linux, Brian Gerst, Nick Desaulniers, Linus Torvalds

From: Brian Gerst <brgerst@gmail.com>

Use __pcpu_size_call_return() to simplify this_cpu_read_stable().
Also remove __bad_percpu_size() which is now unused.

Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
---
 arch/x86/include/asm/percpu.h | 41 ++++++++++-------------------------
 1 file changed, 12 insertions(+), 29 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index 7efc0b5c4ff0..cf2b9c2a241e 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -85,7 +85,6 @@
 
 /* For arch-specific code, we can use direct single-insn ops (they
  * don't give an lvalue though). */
-extern void __bad_percpu_size(void);
 
 #define __pcpu_type_1 u8
 #define __pcpu_type_2 u16
@@ -167,33 +166,13 @@ do {									\
 	(typeof(_var))(unsigned long) pfo_val__;			\
 })
 
-#define percpu_stable_op(op, var)			\
-({							\
-	typeof(var) pfo_ret__;				\
-	switch (sizeof(var)) {				\
-	case 1:						\
-		asm(op "b "__percpu_arg(P1)",%0"	\
-		    : "=q" (pfo_ret__)			\
-		    : "p" (&(var)));			\
-		break;					\
-	case 2:						\
-		asm(op "w "__percpu_arg(P1)",%0"	\
-		    : "=r" (pfo_ret__)			\
-		    : "p" (&(var)));			\
-		break;					\
-	case 4:						\
-		asm(op "l "__percpu_arg(P1)",%0"	\
-		    : "=r" (pfo_ret__)			\
-		    : "p" (&(var)));			\
-		break;					\
-	case 8:						\
-		asm(op "q "__percpu_arg(P1)",%0"	\
-		    : "=r" (pfo_ret__)			\
-		    : "p" (&(var)));			\
-		break;					\
-	default: __bad_percpu_size();			\
-	}						\
-	pfo_ret__;					\
+#define percpu_stable_op(size, op, _var)				\
+({									\
+	__pcpu_type_##size pfo_val__;					\
+	asm(__pcpu_op2_##size(op, __percpu_arg(P[var]), "%[val]")	\
+	    : [val] __pcpu_reg_##size("=", pfo_val__)			\
+	    : [var] "p" (&(_var)));					\
+	(typeof(_var))(unsigned long) pfo_val__;			\
 })
 
 /*
@@ -258,7 +237,11 @@ do {									\
  * per-thread variables implemented as per-cpu variables and thus
  * stable for the duration of the respective task.
  */
-#define this_cpu_read_stable(var)	percpu_stable_op("mov", var)
+#define this_cpu_read_stable_1(pcp)	percpu_stable_op(1, "mov", pcp)
+#define this_cpu_read_stable_2(pcp)	percpu_stable_op(2, "mov", pcp)
+#define this_cpu_read_stable_4(pcp)	percpu_stable_op(4, "mov", pcp)
+#define this_cpu_read_stable_8(pcp)	percpu_stable_op(8, "mov", pcp)
+#define this_cpu_read_stable(pcp)	__pcpu_size_call_return(this_cpu_read_stable_, pcp)
 
 #define raw_cpu_read_1(pcp)		percpu_from_op(1, , "mov", pcp)
 #define raw_cpu_read_2(pcp)		percpu_from_op(2, , "mov", pcp)
-- 
2.28.0.rc0.105.gf9edc3c819-goog


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v3 10/11] x86/percpu: Remove unused PER_CPU() macro
  2020-07-20 20:49 [PATCH v3 00/11] i386 Clang support Nick Desaulniers
                   ` (8 preceding siblings ...)
  2020-07-20 20:49 ` [PATCH v3 09/11] x86/percpu: Clean up percpu_stable_op() Nick Desaulniers
@ 2020-07-20 20:49 ` Nick Desaulniers
  2020-07-23 10:43   ` [tip: x86/asm] " tip-bot2 for Brian Gerst
  2020-07-20 20:49 ` [PATCH v3 11/11] x86: support i386 with Clang Nick Desaulniers
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 34+ messages in thread
From: Nick Desaulniers @ 2020-07-20 20:49 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov
  Cc: Dennis Zhou, Tejun Heo, Christoph Lameter, x86, H. Peter Anvin,
	Al Viro, Andrew Morton, Peter Zijlstra, linux-kernel,
	clang-built-linux, Brian Gerst, Nick Desaulniers, Linus Torvalds

From: Brian Gerst <brgerst@gmail.com>

Also remove now unused __percpu_mov_op.

Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
---
 arch/x86/include/asm/percpu.h | 18 ------------------
 1 file changed, 18 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index cf2b9c2a241e..a3c33b79fb86 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -4,33 +4,15 @@
 
 #ifdef CONFIG_X86_64
 #define __percpu_seg		gs
-#define __percpu_mov_op		movq
 #else
 #define __percpu_seg		fs
-#define __percpu_mov_op		movl
 #endif
 
 #ifdef __ASSEMBLY__
 
-/*
- * PER_CPU finds an address of a per-cpu variable.
- *
- * Args:
- *    var - variable name
- *    reg - 32bit register
- *
- * The resulting address is stored in the "reg" argument.
- *
- * Example:
- *    PER_CPU(cpu_gdt_descr, %ebx)
- */
 #ifdef CONFIG_SMP
-#define PER_CPU(var, reg)						\
-	__percpu_mov_op %__percpu_seg:this_cpu_off, reg;		\
-	lea var(reg), reg
 #define PER_CPU_VAR(var)	%__percpu_seg:var
 #else /* ! SMP */
-#define PER_CPU(var, reg)	__percpu_mov_op $var, reg
 #define PER_CPU_VAR(var)	var
 #endif	/* SMP */
 
-- 
2.28.0.rc0.105.gf9edc3c819-goog


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v3 11/11] x86: support i386 with Clang
  2020-07-20 20:49 [PATCH v3 00/11] i386 Clang support Nick Desaulniers
                   ` (9 preceding siblings ...)
  2020-07-20 20:49 ` [PATCH v3 10/11] x86/percpu: Remove unused PER_CPU() macro Nick Desaulniers
@ 2020-07-20 20:49 ` Nick Desaulniers
  2020-07-23  9:14   ` Thomas Gleixner
  2020-07-23 10:43   ` [tip: x86/asm] x86/uaccess: Make __get_user_size() Clang compliant on 32-bit tip-bot2 for Nick Desaulniers
  2020-07-21  8:09 ` [PATCH v3 00/11] i386 Clang support Sedat Dilek
  2020-07-21 22:27 ` Dennis Zhou
  12 siblings, 2 replies; 34+ messages in thread
From: Nick Desaulniers @ 2020-07-20 20:49 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov
  Cc: Dennis Zhou, Tejun Heo, Christoph Lameter, x86, H. Peter Anvin,
	Al Viro, Andrew Morton, Peter Zijlstra, linux-kernel,
	clang-built-linux, Nick Desaulniers, Arnd Bergmann,
	David Woodhouse, Dmitry Golovin, Linus Torvalds

GCC and Clang are architecturally different, which leads to subtle
issues for code that's invalid but clearly dead. This can happen with
code that emulates polymorphism with the preprocessor and sizeof.

GCC will perform semantic analysis after early inlining and dead code
elimination, so it will not warn on invalid code that's dead. Clang
strictly performs optimizations after semantic analysis, so it will warn
for dead code.

Neither Clang nor GCC like this very much with -m32:

long long ret;
asm ("movb $5, %0" : "=q" (ret));

However, GCC can tolerate this variant:

long long ret;
switch (sizeof(ret)) {
case 1:
        asm ("movb $5, %0" : "=q" (ret));
        break;
case 8:;
}

Clang, on the other hand, won't accept that because it validates the
inline asm for the '1' case *before* the optimisation phase where it
realises that it wouldn't have to emit it anyway.

If LLVM (Clang's "back end") fails such as during instruction selection
or register allocation, it cannot provide accurate diagnostics
(warnings/errors) that contain line information, as the AST has been
discarded from memory at that point.

While there have been early discussions about having C/C++ specific
language optimizations in Clang via the use of MLIR, which would enable
such earlier optimizations, such work is not scoped and likely a
multi-year endeavor.

We also don't want to swap the use of "=q" with "=r". For 64b, it
doesn't matter. For 32b, it's possible that a 32b register without a 8b
lower alias (i.e. ESI, EDI, EBP) is selected which the assembler will
then reject.

With this, Clang can finally build an i386 defconfig.

Link: https://bugs.llvm.org/show_bug.cgi?id=33587
Link: https://github.com/ClangBuiltLinux/linux/issues/3
Link: https://github.com/ClangBuiltLinux/linux/issues/194
Link: https://github.com/ClangBuiltLinux/linux/issues/781
Link: https://lore.kernel.org/lkml/20180209161833.4605-1-dwmw2@infradead.org/
Link: https://lore.kernel.org/lkml/CAK8P3a1EBaWdbAEzirFDSgHVJMtWjuNt2HGG8z+vpXeNHwETFQ@mail.gmail.com/
Reported-by: Arnd Bergmann <arnd@arndb.de>
Reported-by: David Woodhouse <dwmw2@infradead.org>
Reported-by: Dmitry Golovin <dima@golovin.in>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
---
 arch/x86/include/asm/uaccess.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index dd3261f9f4ea..9d57556ad42f 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -314,11 +314,13 @@ do {									\
 
 #define __get_user_size(x, ptr, size, retval)				\
 do {									\
+	unsigned char x_u8__;						\
 	retval = 0;							\
 	__chk_user_ptr(ptr);						\
 	switch (size) {							\
 	case 1:								\
-		__get_user_asm(x, ptr, retval, "b", "=q");		\
+		__get_user_asm(x_u8__, ptr, retval, "b", "=q");		\
+		(x) = x_u8__;						\
 		break;							\
 	case 2:								\
 		__get_user_asm(x, ptr, retval, "w", "=r");		\
-- 
2.28.0.rc0.105.gf9edc3c819-goog


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 00/11] i386 Clang support
  2020-07-20 20:49 [PATCH v3 00/11] i386 Clang support Nick Desaulniers
                   ` (10 preceding siblings ...)
  2020-07-20 20:49 ` [PATCH v3 11/11] x86: support i386 with Clang Nick Desaulniers
@ 2020-07-21  8:09 ` Sedat Dilek
  2020-07-21 22:27 ` Dennis Zhou
  12 siblings, 0 replies; 34+ messages in thread
From: Sedat Dilek @ 2020-07-21  8:09 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dennis Zhou,
	Tejun Heo, Christoph Lameter, x86, H. Peter Anvin, Al Viro,
	Andrew Morton, Peter Zijlstra, linux-kernel,
	Clang-Built-Linux ML

On Mon, Jul 20, 2020 at 10:49 PM 'Nick Desaulniers' via Clang Built
Linux <clang-built-linux@googlegroups.com> wrote:
>
> Resend of Brian's v2 with Acks from Peter and Linus collected, as well
> as the final patch (mine) added. The commit of the final patch discusses
> some of the architectural differences between GCC and Clang, and the
> kernels tickling of this difference for i386, which necessitated these
> patches.
>
> Brian Gerst (10):
>   x86/percpu: Introduce size abstraction macros
>   x86/percpu: Clean up percpu_to_op()
>   x86/percpu: Clean up percpu_from_op()
>   x86/percpu: Clean up percpu_add_op()
>   x86/percpu: Remove "e" constraint from XADD
>   x86/percpu: Clean up percpu_add_return_op()
>   x86/percpu: Clean up percpu_xchg_op()
>   x86/percpu: Clean up percpu_cmpxchg_op()
>   x86/percpu: Clean up percpu_stable_op()
>   x86/percpu: Remove unused PER_CPU() macro
>
> Nick Desaulniers (1):
>   x86: support i386 with Clang
>
>  arch/x86/include/asm/percpu.h  | 510 +++++++++++----------------------
>  arch/x86/include/asm/uaccess.h |   4 +-
>  2 files changed, 175 insertions(+), 339 deletions(-)
>
> --
> 2.28.0.rc0.105.gf9edc3c819-goog
>

Hi,

I have tested this patchset v3 on top of Linux v5.8-rc6 with a
selfmade llvm-toolchain (snapshot version: v11+git-e05c7e400f3a plus
cherry-picked 8b354cc8db41).

This happened out of interest if this is in good shape together with
my Debian AMD64 system.
I checked my build-log and dmesg-output and it looks good.

Feel free to add my...

Tested-by: Sedat Dilek <sedat.dilek@gmail.com>

Thanks.

Regards,
- Sedat -

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 00/11] i386 Clang support
  2020-07-20 20:49 [PATCH v3 00/11] i386 Clang support Nick Desaulniers
                   ` (11 preceding siblings ...)
  2020-07-21  8:09 ` [PATCH v3 00/11] i386 Clang support Sedat Dilek
@ 2020-07-21 22:27 ` Dennis Zhou
  2020-07-22 23:08   ` Thomas Gleixner
  12 siblings, 1 reply; 34+ messages in thread
From: Dennis Zhou @ 2020-07-21 22:27 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dennis Zhou,
	Tejun Heo, Christoph Lameter, x86, H. Peter Anvin, Al Viro,
	Andrew Morton, Peter Zijlstra, linux-kernel, clang-built-linux

On Mon, Jul 20, 2020 at 01:49:14PM -0700, Nick Desaulniers wrote:
> Resend of Brian's v2 with Acks from Peter and Linus collected, as well
> as the final patch (mine) added. The commit of the final patch discusses
> some of the architectural differences between GCC and Clang, and the
> kernels tickling of this difference for i386, which necessitated these
> patches.
> 
> Brian Gerst (10):
>   x86/percpu: Introduce size abstraction macros
>   x86/percpu: Clean up percpu_to_op()
>   x86/percpu: Clean up percpu_from_op()
>   x86/percpu: Clean up percpu_add_op()
>   x86/percpu: Remove "e" constraint from XADD
>   x86/percpu: Clean up percpu_add_return_op()
>   x86/percpu: Clean up percpu_xchg_op()
>   x86/percpu: Clean up percpu_cmpxchg_op()
>   x86/percpu: Clean up percpu_stable_op()
>   x86/percpu: Remove unused PER_CPU() macro
> 
> Nick Desaulniers (1):
>   x86: support i386 with Clang
> 
>  arch/x86/include/asm/percpu.h  | 510 +++++++++++----------------------
>  arch/x86/include/asm/uaccess.h |   4 +-
>  2 files changed, 175 insertions(+), 339 deletions(-)
> 
> -- 
> 2.28.0.rc0.105.gf9edc3c819-goog
> 

This looks great to me! I applied it to for-5.9.

Thanks,
Dennis

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 00/11] i386 Clang support
  2020-07-21 22:27 ` Dennis Zhou
@ 2020-07-22 23:08   ` Thomas Gleixner
  2020-07-22 23:25     ` Dennis Zhou
  0 siblings, 1 reply; 34+ messages in thread
From: Thomas Gleixner @ 2020-07-22 23:08 UTC (permalink / raw)
  To: Dennis Zhou, Nick Desaulniers
  Cc: Ingo Molnar, Borislav Petkov, Dennis Zhou, Tejun Heo,
	Christoph Lameter, x86, H. Peter Anvin, Al Viro, Andrew Morton,
	Peter Zijlstra, linux-kernel, clang-built-linux

Dennis Zhou <dennis@kernel.org> writes:
> On Mon, Jul 20, 2020 at 01:49:14PM -0700, Nick Desaulniers wrote:
>> Resend of Brian's v2 with Acks from Peter and Linus collected, as well
>> as the final patch (mine) added. The commit of the final patch discusses
>> some of the architectural differences between GCC and Clang, and the
>> kernels tickling of this difference for i386, which necessitated these
>> patches.
>> 
>> Brian Gerst (10):
>>   x86/percpu: Introduce size abstraction macros
>>   x86/percpu: Clean up percpu_to_op()
>>   x86/percpu: Clean up percpu_from_op()
>>   x86/percpu: Clean up percpu_add_op()
>>   x86/percpu: Remove "e" constraint from XADD
>>   x86/percpu: Clean up percpu_add_return_op()
>>   x86/percpu: Clean up percpu_xchg_op()
>>   x86/percpu: Clean up percpu_cmpxchg_op()
>>   x86/percpu: Clean up percpu_stable_op()
>>   x86/percpu: Remove unused PER_CPU() macro
>> 
>> Nick Desaulniers (1):
>>   x86: support i386 with Clang
>> 
>>  arch/x86/include/asm/percpu.h  | 510 +++++++++++----------------------
>>  arch/x86/include/asm/uaccess.h |   4 +-
>>  2 files changed, 175 insertions(+), 339 deletions(-)
>> 
>> -- 
>> 2.28.0.rc0.105.gf9edc3c819-goog
>> 
>
> This looks great to me! I applied it to for-5.9.

You applied it? I'm not aware that you're maintaining x86 nowadays.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 00/11] i386 Clang support
  2020-07-22 23:08   ` Thomas Gleixner
@ 2020-07-22 23:25     ` Dennis Zhou
  0 siblings, 0 replies; 34+ messages in thread
From: Dennis Zhou @ 2020-07-22 23:25 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Dennis Zhou, Nick Desaulniers, Ingo Molnar, Borislav Petkov,
	Tejun Heo, Christoph Lameter, x86, H. Peter Anvin, Al Viro,
	Andrew Morton, Peter Zijlstra, linux-kernel, clang-built-linux

On Thu, Jul 23, 2020 at 01:08:42AM +0200, Thomas Gleixner wrote:
> Dennis Zhou <dennis@kernel.org> writes:
> > On Mon, Jul 20, 2020 at 01:49:14PM -0700, Nick Desaulniers wrote:
> >> Resend of Brian's v2 with Acks from Peter and Linus collected, as well
> >> as the final patch (mine) added. The commit of the final patch discusses
> >> some of the architectural differences between GCC and Clang, and the
> >> kernels tickling of this difference for i386, which necessitated these
> >> patches.
> >> 
> >> Brian Gerst (10):
> >>   x86/percpu: Introduce size abstraction macros
> >>   x86/percpu: Clean up percpu_to_op()
> >>   x86/percpu: Clean up percpu_from_op()
> >>   x86/percpu: Clean up percpu_add_op()
> >>   x86/percpu: Remove "e" constraint from XADD
> >>   x86/percpu: Clean up percpu_add_return_op()
> >>   x86/percpu: Clean up percpu_xchg_op()
> >>   x86/percpu: Clean up percpu_cmpxchg_op()
> >>   x86/percpu: Clean up percpu_stable_op()
> >>   x86/percpu: Remove unused PER_CPU() macro
> >> 
> >> Nick Desaulniers (1):
> >>   x86: support i386 with Clang
> >> 
> >>  arch/x86/include/asm/percpu.h  | 510 +++++++++++----------------------
> >>  arch/x86/include/asm/uaccess.h |   4 +-
> >>  2 files changed, 175 insertions(+), 339 deletions(-)
> >> 
> >> -- 
> >> 2.28.0.rc0.105.gf9edc3c819-goog
> >> 
> >
> > This looks great to me! I applied it to for-5.9.
> 
> You applied it? I'm not aware that you're maintaining x86 nowadays.
> 
> Thanks,
> 
>         tglx

I'm sorry I overstepped. I've dropped them. Please take them with my
ack.

Thanks,
Dennis

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 11/11] x86: support i386 with Clang
  2020-07-20 20:49 ` [PATCH v3 11/11] x86: support i386 with Clang Nick Desaulniers
@ 2020-07-23  9:14   ` Thomas Gleixner
  2020-07-23  9:17     ` Thomas Gleixner
  2020-07-23 10:43   ` [tip: x86/asm] x86/uaccess: Make __get_user_size() Clang compliant on 32-bit tip-bot2 for Nick Desaulniers
  1 sibling, 1 reply; 34+ messages in thread
From: Thomas Gleixner @ 2020-07-23  9:14 UTC (permalink / raw)
  To: Nick Desaulniers, Ingo Molnar, Borislav Petkov
  Cc: Dennis Zhou, Tejun Heo, Christoph Lameter, x86, H. Peter Anvin,
	Al Viro, Andrew Morton, Peter Zijlstra, linux-kernel,
	clang-built-linux, Nick Desaulniers, Arnd Bergmann,
	David Woodhouse, Dmitry Golovin, Linus Torvalds

Nick Desaulniers <ndesaulniers@google.com> writes:

I'm glad I looked myself at this.

> We also don't want to swap the use of "=q" with "=r". For 64b, it
> doesn't matter. For 32b, it's possible that a 32b register without a 8b
> lower alias (i.e. ESI, EDI, EBP) is selected which the assembler will
> then reject.

The above is really garbage.

We don't want? It's simply not possible to do so, because ...

64b,32b,8b. For heavens sake is it too much asked to write a changelog
with understandable wording instead of ambiguous abbreviations?

There is no maximum character limit for changelogs.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 11/11] x86: support i386 with Clang
  2020-07-23  9:14   ` Thomas Gleixner
@ 2020-07-23  9:17     ` Thomas Gleixner
  2020-07-23 11:06       ` Sedat Dilek
  0 siblings, 1 reply; 34+ messages in thread
From: Thomas Gleixner @ 2020-07-23  9:17 UTC (permalink / raw)
  To: Nick Desaulniers, Ingo Molnar, Borislav Petkov
  Cc: Dennis Zhou, Tejun Heo, Christoph Lameter, x86, H. Peter Anvin,
	Al Viro, Andrew Morton, Peter Zijlstra, linux-kernel,
	clang-built-linux, Nick Desaulniers, Arnd Bergmann,
	David Woodhouse, Dmitry Golovin, Linus Torvalds

Thomas Gleixner <tglx@linutronix.de> writes:
> Nick Desaulniers <ndesaulniers@google.com> writes:
>
> I'm glad I looked myself at this.
>
>> We also don't want to swap the use of "=q" with "=r". For 64b, it
>> doesn't matter. For 32b, it's possible that a 32b register without a 8b
>> lower alias (i.e. ESI, EDI, EBP) is selected which the assembler will
>> then reject.
>
> The above is really garbage.
>
> We don't want? It's simply not possible to do so, because ...
>
> 64b,32b,8b. For heavens sake is it too much asked to write a changelog
> with understandable wording instead of ambiguous abbreviations?
>
> There is no maximum character limit for changelogs.

Gah. Hit send too fast.

>> With this, Clang can finally build an i386 defconfig.

With what? I can't find anything which explains the solution at the
conceptual level. Sigh.

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [tip: x86/asm] x86/uaccess: Make __get_user_size() Clang compliant on 32-bit
  2020-07-20 20:49 ` [PATCH v3 11/11] x86: support i386 with Clang Nick Desaulniers
  2020-07-23  9:14   ` Thomas Gleixner
@ 2020-07-23 10:43   ` tip-bot2 for Nick Desaulniers
  1 sibling, 0 replies; 34+ messages in thread
From: tip-bot2 for Nick Desaulniers @ 2020-07-23 10:43 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Arnd Bergmann, David Woodhouse, Dmitry Golovin, Linus Torvalds,
	Nick Desaulniers, Thomas Gleixner, Sedat Dilek, Dennis Zhou, x86,
	LKML

The following commit has been merged into the x86/asm branch of tip:

Commit-ID:     158807de5822d1079e162a3762956fd743dd483e
Gitweb:        https://git.kernel.org/tip/158807de5822d1079e162a3762956fd743dd483e
Author:        Nick Desaulniers <ndesaulniers@google.com>
AuthorDate:    Mon, 20 Jul 2020 13:49:25 -07:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Thu, 23 Jul 2020 12:38:31 +02:00

x86/uaccess: Make __get_user_size() Clang compliant on 32-bit

Clang fails to compile __get_user_size() on 32-bit for the following code:

      long long val;

      __get_user(val, usrptr);

with: error: invalid output size for constraint '=q'

GCC compiles the same code without complaints.

The reason is that GCC and Clang are architecturally different, which leads
to subtle issues for code that's invalid but clearly dead, i.e. with code
that emulates polymorphism with the preprocessor and sizeof.

GCC will perform semantic analysis after early inlining and dead code
elimination, so it will not warn on invalid code that's dead. Clang
strictly performs optimizations after semantic analysis, so it will warn
for dead code.

Neither Clang nor GCC like this very much with -m32:

long long ret;
asm ("movb $5, %0" : "=q" (ret));

However, GCC can tolerate this variant:

long long ret;
switch (sizeof(ret)) {
case 1:
        asm ("movb $5, %0" : "=q" (ret));
        break;
case 8:;
}

Clang, on the other hand, won't accept that because it validates the inline
asm for the '1' case before the optimisation phase where it realises that
it wouldn't have to emit it anyway.

If LLVM (Clang's "back end") fails such as during instruction selection or
register allocation, it cannot provide accurate diagnostics (warnings /
errors) that contain line information, as the AST has been discarded from
memory at that point.

While there have been early discussions about having C/C++ specific
language optimizations in Clang via the use of MLIR, which would enable
such earlier optimizations, such work is not scoped and likely a multi-year
endeavor.

It was discussed to change the asm output constraint for the one byte case
from "=q" to "=r". While it works for 64-bit, it fails on 32-bit. With '=r'
the compiler could fail to chose a register accessible as high/low which is
required for the byte operation. If that happens the assembly will fail.

Use a local temporary variable of type 'unsigned char' as output for the
byte copy inline asm and then assign it to the real output variable. This
prevents Clang from failing the semantic analysis in the above case.

The resulting code for the actual one byte copy is not affected as the
temporary variable is optimized out.

[ tglx: Amended changelog ]

Reported-by: Arnd Bergmann <arnd@arndb.de>
Reported-by: David Woodhouse <dwmw2@infradead.org>
Reported-by: Dmitry Golovin <dima@golovin.in>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Dennis Zhou <dennis@kernel.org>
Link: https://bugs.llvm.org/show_bug.cgi?id=33587
Link: https://github.com/ClangBuiltLinux/linux/issues/3
Link: https://github.com/ClangBuiltLinux/linux/issues/194
Link: https://github.com/ClangBuiltLinux/linux/issues/781
Link: https://lore.kernel.org/lkml/20180209161833.4605-1-dwmw2@infradead.org/
Link: https://lore.kernel.org/lkml/CAK8P3a1EBaWdbAEzirFDSgHVJMtWjuNt2HGG8z+vpXeNHwETFQ@mail.gmail.com/
Link: https://lkml.kernel.org/r/20200720204925.3654302-12-ndesaulniers@google.com
---
 arch/x86/include/asm/uaccess.h | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 18dfa07..2f3e8f2 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -314,11 +314,14 @@ do {									\
 
 #define __get_user_size(x, ptr, size, retval)				\
 do {									\
+	unsigned char x_u8__;						\
+									\
 	retval = 0;							\
 	__chk_user_ptr(ptr);						\
 	switch (size) {							\
 	case 1:								\
-		__get_user_asm(x, ptr, retval, "b", "=q");		\
+		__get_user_asm(x_u8__, ptr, retval, "b", "=q");		\
+		(x) = x_u8__;						\
 		break;							\
 	case 2:								\
 		__get_user_asm(x, ptr, retval, "w", "=r");		\

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [tip: x86/asm] x86/percpu: Remove unused PER_CPU() macro
  2020-07-20 20:49 ` [PATCH v3 10/11] x86/percpu: Remove unused PER_CPU() macro Nick Desaulniers
@ 2020-07-23 10:43   ` tip-bot2 for Brian Gerst
  0 siblings, 0 replies; 34+ messages in thread
From: tip-bot2 for Brian Gerst @ 2020-07-23 10:43 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Brian Gerst, Nick Desaulniers, Thomas Gleixner, Sedat Dilek,
	Peter Zijlstra (Intel),
	Linus Torvalds, Dennis Zhou, x86, LKML

The following commit has been merged into the x86/asm branch of tip:

Commit-ID:     4719ffecbb0659faf1fd39f4b8eb2674f0042890
Gitweb:        https://git.kernel.org/tip/4719ffecbb0659faf1fd39f4b8eb2674f0042890
Author:        Brian Gerst <brgerst@gmail.com>
AuthorDate:    Mon, 20 Jul 2020 13:49:24 -07:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Thu, 23 Jul 2020 11:46:43 +02:00

x86/percpu: Remove unused PER_CPU() macro

Also remove now unused __percpu_mov_op.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Dennis Zhou <dennis@kernel.org>
Link: https://lkml.kernel.org/r/20200720204925.3654302-11-ndesaulniers@google.com

---
 arch/x86/include/asm/percpu.h | 18 ------------------
 1 file changed, 18 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index cf2b9c2..a3c33b7 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -4,33 +4,15 @@
 
 #ifdef CONFIG_X86_64
 #define __percpu_seg		gs
-#define __percpu_mov_op		movq
 #else
 #define __percpu_seg		fs
-#define __percpu_mov_op		movl
 #endif
 
 #ifdef __ASSEMBLY__
 
-/*
- * PER_CPU finds an address of a per-cpu variable.
- *
- * Args:
- *    var - variable name
- *    reg - 32bit register
- *
- * The resulting address is stored in the "reg" argument.
- *
- * Example:
- *    PER_CPU(cpu_gdt_descr, %ebx)
- */
 #ifdef CONFIG_SMP
-#define PER_CPU(var, reg)						\
-	__percpu_mov_op %__percpu_seg:this_cpu_off, reg;		\
-	lea var(reg), reg
 #define PER_CPU_VAR(var)	%__percpu_seg:var
 #else /* ! SMP */
-#define PER_CPU(var, reg)	__percpu_mov_op $var, reg
 #define PER_CPU_VAR(var)	var
 #endif	/* SMP */
 

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [tip: x86/asm] x86/percpu: Clean up percpu_stable_op()
  2020-07-20 20:49 ` [PATCH v3 09/11] x86/percpu: Clean up percpu_stable_op() Nick Desaulniers
@ 2020-07-23 10:43   ` tip-bot2 for Brian Gerst
  0 siblings, 0 replies; 34+ messages in thread
From: tip-bot2 for Brian Gerst @ 2020-07-23 10:43 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Brian Gerst, Nick Desaulniers, Thomas Gleixner, Sedat Dilek,
	Linus Torvalds, Peter Zijlstra (Intel),
	Dennis Zhou, x86, LKML

The following commit has been merged into the x86/asm branch of tip:

Commit-ID:     c94055fe93c8d00bfa23fa2cb9af080f7fc53aa0
Gitweb:        https://git.kernel.org/tip/c94055fe93c8d00bfa23fa2cb9af080f7fc53aa0
Author:        Brian Gerst <brgerst@gmail.com>
AuthorDate:    Mon, 20 Jul 2020 13:49:23 -07:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Thu, 23 Jul 2020 11:46:42 +02:00

x86/percpu: Clean up percpu_stable_op()

Use __pcpu_size_call_return() to simplify this_cpu_read_stable().
Also remove __bad_percpu_size() which is now unused.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Dennis Zhou <dennis@kernel.org>
Link: https://lkml.kernel.org/r/20200720204925.3654302-10-ndesaulniers@google.com

---
 arch/x86/include/asm/percpu.h | 41 +++++++++-------------------------
 1 file changed, 12 insertions(+), 29 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index 7efc0b5..cf2b9c2 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -85,7 +85,6 @@
 
 /* For arch-specific code, we can use direct single-insn ops (they
  * don't give an lvalue though). */
-extern void __bad_percpu_size(void);
 
 #define __pcpu_type_1 u8
 #define __pcpu_type_2 u16
@@ -167,33 +166,13 @@ do {									\
 	(typeof(_var))(unsigned long) pfo_val__;			\
 })
 
-#define percpu_stable_op(op, var)			\
-({							\
-	typeof(var) pfo_ret__;				\
-	switch (sizeof(var)) {				\
-	case 1:						\
-		asm(op "b "__percpu_arg(P1)",%0"	\
-		    : "=q" (pfo_ret__)			\
-		    : "p" (&(var)));			\
-		break;					\
-	case 2:						\
-		asm(op "w "__percpu_arg(P1)",%0"	\
-		    : "=r" (pfo_ret__)			\
-		    : "p" (&(var)));			\
-		break;					\
-	case 4:						\
-		asm(op "l "__percpu_arg(P1)",%0"	\
-		    : "=r" (pfo_ret__)			\
-		    : "p" (&(var)));			\
-		break;					\
-	case 8:						\
-		asm(op "q "__percpu_arg(P1)",%0"	\
-		    : "=r" (pfo_ret__)			\
-		    : "p" (&(var)));			\
-		break;					\
-	default: __bad_percpu_size();			\
-	}						\
-	pfo_ret__;					\
+#define percpu_stable_op(size, op, _var)				\
+({									\
+	__pcpu_type_##size pfo_val__;					\
+	asm(__pcpu_op2_##size(op, __percpu_arg(P[var]), "%[val]")	\
+	    : [val] __pcpu_reg_##size("=", pfo_val__)			\
+	    : [var] "p" (&(_var)));					\
+	(typeof(_var))(unsigned long) pfo_val__;			\
 })
 
 /*
@@ -258,7 +237,11 @@ do {									\
  * per-thread variables implemented as per-cpu variables and thus
  * stable for the duration of the respective task.
  */
-#define this_cpu_read_stable(var)	percpu_stable_op("mov", var)
+#define this_cpu_read_stable_1(pcp)	percpu_stable_op(1, "mov", pcp)
+#define this_cpu_read_stable_2(pcp)	percpu_stable_op(2, "mov", pcp)
+#define this_cpu_read_stable_4(pcp)	percpu_stable_op(4, "mov", pcp)
+#define this_cpu_read_stable_8(pcp)	percpu_stable_op(8, "mov", pcp)
+#define this_cpu_read_stable(pcp)	__pcpu_size_call_return(this_cpu_read_stable_, pcp)
 
 #define raw_cpu_read_1(pcp)		percpu_from_op(1, , "mov", pcp)
 #define raw_cpu_read_2(pcp)		percpu_from_op(2, , "mov", pcp)

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [tip: x86/asm] x86/percpu: Clean up percpu_cmpxchg_op()
  2020-07-20 20:49 ` [PATCH v3 08/11] x86/percpu: Clean up percpu_cmpxchg_op() Nick Desaulniers
@ 2020-07-23 10:43   ` tip-bot2 for Brian Gerst
  0 siblings, 0 replies; 34+ messages in thread
From: tip-bot2 for Brian Gerst @ 2020-07-23 10:43 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Brian Gerst, Nick Desaulniers, Thomas Gleixner, Sedat Dilek,
	Linus Torvalds, Peter Zijlstra (Intel),
	Dennis Zhou, x86, LKML

The following commit has been merged into the x86/asm branch of tip:

Commit-ID:     ebcd580bed4a357ea894e6878d9099b3919f727f
Gitweb:        https://git.kernel.org/tip/ebcd580bed4a357ea894e6878d9099b3919f727f
Author:        Brian Gerst <brgerst@gmail.com>
AuthorDate:    Mon, 20 Jul 2020 13:49:22 -07:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Thu, 23 Jul 2020 11:46:42 +02:00

x86/percpu: Clean up percpu_cmpxchg_op()

The core percpu macros already have a switch on the data size, so the switch
in the x86 code is redundant and produces more dead code.

Also use appropriate types for the width of the instructions.  This avoids
errors when compiling with Clang.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Dennis Zhou <dennis@kernel.org>
Link: https://lkml.kernel.org/r/20200720204925.3654302-9-ndesaulniers@google.com

---
 arch/x86/include/asm/percpu.h | 58 ++++++++++------------------------
 1 file changed, 18 insertions(+), 40 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index ac6d7e7..7efc0b5 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -236,39 +236,17 @@ do {									\
  * cmpxchg has no such implied lock semantics as a result it is much
  * more efficient for cpu local operations.
  */
-#define percpu_cmpxchg_op(qual, var, oval, nval)			\
+#define percpu_cmpxchg_op(size, qual, _var, _oval, _nval)		\
 ({									\
-	typeof(var) pco_ret__;						\
-	typeof(var) pco_old__ = (oval);					\
-	typeof(var) pco_new__ = (nval);					\
-	switch (sizeof(var)) {						\
-	case 1:								\
-		asm qual ("cmpxchgb %2, "__percpu_arg(1)		\
-			    : "=a" (pco_ret__), "+m" (var)		\
-			    : "q" (pco_new__), "0" (pco_old__)		\
-			    : "memory");				\
-		break;							\
-	case 2:								\
-		asm qual ("cmpxchgw %2, "__percpu_arg(1)		\
-			    : "=a" (pco_ret__), "+m" (var)		\
-			    : "r" (pco_new__), "0" (pco_old__)		\
-			    : "memory");				\
-		break;							\
-	case 4:								\
-		asm qual ("cmpxchgl %2, "__percpu_arg(1)		\
-			    : "=a" (pco_ret__), "+m" (var)		\
-			    : "r" (pco_new__), "0" (pco_old__)		\
-			    : "memory");				\
-		break;							\
-	case 8:								\
-		asm qual ("cmpxchgq %2, "__percpu_arg(1)		\
-			    : "=a" (pco_ret__), "+m" (var)		\
-			    : "r" (pco_new__), "0" (pco_old__)		\
-			    : "memory");				\
-		break;							\
-	default: __bad_percpu_size();					\
-	}								\
-	pco_ret__;							\
+	__pcpu_type_##size pco_old__ = __pcpu_cast_##size(_oval);	\
+	__pcpu_type_##size pco_new__ = __pcpu_cast_##size(_nval);	\
+	asm qual (__pcpu_op2_##size("cmpxchg", "%[nval]",		\
+				    __percpu_arg([var]))		\
+		  : [oval] "+a" (pco_old__),				\
+		    [var] "+m" (_var)					\
+		  : [nval] __pcpu_reg_##size(, pco_new__)		\
+		  : "memory");						\
+	(typeof(_var))(unsigned long) pco_old__;			\
 })
 
 /*
@@ -336,16 +314,16 @@ do {									\
 #define raw_cpu_add_return_1(pcp, val)		percpu_add_return_op(1, , pcp, val)
 #define raw_cpu_add_return_2(pcp, val)		percpu_add_return_op(2, , pcp, val)
 #define raw_cpu_add_return_4(pcp, val)		percpu_add_return_op(4, , pcp, val)
-#define raw_cpu_cmpxchg_1(pcp, oval, nval)	percpu_cmpxchg_op(, pcp, oval, nval)
-#define raw_cpu_cmpxchg_2(pcp, oval, nval)	percpu_cmpxchg_op(, pcp, oval, nval)
-#define raw_cpu_cmpxchg_4(pcp, oval, nval)	percpu_cmpxchg_op(, pcp, oval, nval)
+#define raw_cpu_cmpxchg_1(pcp, oval, nval)	percpu_cmpxchg_op(1, , pcp, oval, nval)
+#define raw_cpu_cmpxchg_2(pcp, oval, nval)	percpu_cmpxchg_op(2, , pcp, oval, nval)
+#define raw_cpu_cmpxchg_4(pcp, oval, nval)	percpu_cmpxchg_op(4, , pcp, oval, nval)
 
 #define this_cpu_add_return_1(pcp, val)		percpu_add_return_op(1, volatile, pcp, val)
 #define this_cpu_add_return_2(pcp, val)		percpu_add_return_op(2, volatile, pcp, val)
 #define this_cpu_add_return_4(pcp, val)		percpu_add_return_op(4, volatile, pcp, val)
-#define this_cpu_cmpxchg_1(pcp, oval, nval)	percpu_cmpxchg_op(volatile, pcp, oval, nval)
-#define this_cpu_cmpxchg_2(pcp, oval, nval)	percpu_cmpxchg_op(volatile, pcp, oval, nval)
-#define this_cpu_cmpxchg_4(pcp, oval, nval)	percpu_cmpxchg_op(volatile, pcp, oval, nval)
+#define this_cpu_cmpxchg_1(pcp, oval, nval)	percpu_cmpxchg_op(1, volatile, pcp, oval, nval)
+#define this_cpu_cmpxchg_2(pcp, oval, nval)	percpu_cmpxchg_op(2, volatile, pcp, oval, nval)
+#define this_cpu_cmpxchg_4(pcp, oval, nval)	percpu_cmpxchg_op(4, volatile, pcp, oval, nval)
 
 #ifdef CONFIG_X86_CMPXCHG64
 #define percpu_cmpxchg8b_double(pcp1, pcp2, o1, o2, n1, n2)		\
@@ -376,7 +354,7 @@ do {									\
 #define raw_cpu_or_8(pcp, val)			percpu_to_op(8, , "or", (pcp), val)
 #define raw_cpu_add_return_8(pcp, val)		percpu_add_return_op(8, , pcp, val)
 #define raw_cpu_xchg_8(pcp, nval)		raw_percpu_xchg_op(pcp, nval)
-#define raw_cpu_cmpxchg_8(pcp, oval, nval)	percpu_cmpxchg_op(, pcp, oval, nval)
+#define raw_cpu_cmpxchg_8(pcp, oval, nval)	percpu_cmpxchg_op(8, , pcp, oval, nval)
 
 #define this_cpu_read_8(pcp)			percpu_from_op(8, volatile, "mov", pcp)
 #define this_cpu_write_8(pcp, val)		percpu_to_op(8, volatile, "mov", (pcp), val)
@@ -385,7 +363,7 @@ do {									\
 #define this_cpu_or_8(pcp, val)			percpu_to_op(8, volatile, "or", (pcp), val)
 #define this_cpu_add_return_8(pcp, val)		percpu_add_return_op(8, volatile, pcp, val)
 #define this_cpu_xchg_8(pcp, nval)		percpu_xchg_op(8, volatile, pcp, nval)
-#define this_cpu_cmpxchg_8(pcp, oval, nval)	percpu_cmpxchg_op(volatile, pcp, oval, nval)
+#define this_cpu_cmpxchg_8(pcp, oval, nval)	percpu_cmpxchg_op(8, volatile, pcp, oval, nval)
 
 /*
  * Pretty complex macro to generate cmpxchg16 instruction.  The instruction

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [tip: x86/asm] x86/percpu: Clean up percpu_xchg_op()
  2020-07-20 20:49 ` [PATCH v3 07/11] x86/percpu: Clean up percpu_xchg_op() Nick Desaulniers
@ 2020-07-23 10:43   ` tip-bot2 for Brian Gerst
  0 siblings, 0 replies; 34+ messages in thread
From: tip-bot2 for Brian Gerst @ 2020-07-23 10:43 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Brian Gerst, Nick Desaulniers, Thomas Gleixner, Sedat Dilek,
	Linus Torvalds, Peter Zijlstra (Intel),
	Dennis Zhou, x86, LKML

The following commit has been merged into the x86/asm branch of tip:

Commit-ID:     73ca542fbabb68deaa90130a8153cab1fa8288fe
Gitweb:        https://git.kernel.org/tip/73ca542fbabb68deaa90130a8153cab1fa8288fe
Author:        Brian Gerst <brgerst@gmail.com>
AuthorDate:    Mon, 20 Jul 2020 13:49:21 -07:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Thu, 23 Jul 2020 11:46:41 +02:00

x86/percpu: Clean up percpu_xchg_op()

The core percpu macros already have a switch on the data size, so the switch
in the x86 code is redundant and produces more dead code.

Also use appropriate types for the width of the instructions.  This avoids
errors when compiling with Clang.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Dennis Zhou <dennis@kernel.org>
Link: https://lkml.kernel.org/r/20200720204925.3654302-8-ndesaulniers@google.com

---
 arch/x86/include/asm/percpu.h | 61 ++++++++++------------------------
 1 file changed, 18 insertions(+), 43 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index 0776a11..ac6d7e7 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -215,46 +215,21 @@ do {									\
  * expensive due to the implied lock prefix.  The processor cannot prefetch
  * cachelines if xchg is used.
  */
-#define percpu_xchg_op(qual, var, nval)					\
+#define percpu_xchg_op(size, qual, _var, _nval)				\
 ({									\
-	typeof(var) pxo_ret__;						\
-	typeof(var) pxo_new__ = (nval);					\
-	switch (sizeof(var)) {						\
-	case 1:								\
-		asm qual ("\n\tmov "__percpu_arg(1)",%%al"		\
-		    "\n1:\tcmpxchgb %2, "__percpu_arg(1)		\
-		    "\n\tjnz 1b"					\
-			    : "=&a" (pxo_ret__), "+m" (var)		\
-			    : "q" (pxo_new__)				\
-			    : "memory");				\
-		break;							\
-	case 2:								\
-		asm qual ("\n\tmov "__percpu_arg(1)",%%ax"		\
-		    "\n1:\tcmpxchgw %2, "__percpu_arg(1)		\
-		    "\n\tjnz 1b"					\
-			    : "=&a" (pxo_ret__), "+m" (var)		\
-			    : "r" (pxo_new__)				\
-			    : "memory");				\
-		break;							\
-	case 4:								\
-		asm qual ("\n\tmov "__percpu_arg(1)",%%eax"		\
-		    "\n1:\tcmpxchgl %2, "__percpu_arg(1)		\
-		    "\n\tjnz 1b"					\
-			    : "=&a" (pxo_ret__), "+m" (var)		\
-			    : "r" (pxo_new__)				\
-			    : "memory");				\
-		break;							\
-	case 8:								\
-		asm qual ("\n\tmov "__percpu_arg(1)",%%rax"		\
-		    "\n1:\tcmpxchgq %2, "__percpu_arg(1)		\
-		    "\n\tjnz 1b"					\
-			    : "=&a" (pxo_ret__), "+m" (var)		\
-			    : "r" (pxo_new__)				\
-			    : "memory");				\
-		break;							\
-	default: __bad_percpu_size();					\
-	}								\
-	pxo_ret__;							\
+	__pcpu_type_##size pxo_old__;					\
+	__pcpu_type_##size pxo_new__ = __pcpu_cast_##size(_nval);	\
+	asm qual (__pcpu_op2_##size("mov", __percpu_arg([var]),		\
+				    "%[oval]")				\
+		  "\n1:\t"						\
+		  __pcpu_op2_##size("cmpxchg", "%[nval]",		\
+				    __percpu_arg([var]))		\
+		  "\n\tjnz 1b"						\
+		  : [oval] "=&a" (pxo_old__),				\
+		    [var] "+m" (_var)					\
+		  : [nval] __pcpu_reg_##size(, pxo_new__)		\
+		  : "memory");						\
+	(typeof(_var))(unsigned long) pxo_old__;			\
 })
 
 /*
@@ -354,9 +329,9 @@ do {									\
 #define this_cpu_or_1(pcp, val)		percpu_to_op(1, volatile, "or", (pcp), val)
 #define this_cpu_or_2(pcp, val)		percpu_to_op(2, volatile, "or", (pcp), val)
 #define this_cpu_or_4(pcp, val)		percpu_to_op(4, volatile, "or", (pcp), val)
-#define this_cpu_xchg_1(pcp, nval)	percpu_xchg_op(volatile, pcp, nval)
-#define this_cpu_xchg_2(pcp, nval)	percpu_xchg_op(volatile, pcp, nval)
-#define this_cpu_xchg_4(pcp, nval)	percpu_xchg_op(volatile, pcp, nval)
+#define this_cpu_xchg_1(pcp, nval)	percpu_xchg_op(1, volatile, pcp, nval)
+#define this_cpu_xchg_2(pcp, nval)	percpu_xchg_op(2, volatile, pcp, nval)
+#define this_cpu_xchg_4(pcp, nval)	percpu_xchg_op(4, volatile, pcp, nval)
 
 #define raw_cpu_add_return_1(pcp, val)		percpu_add_return_op(1, , pcp, val)
 #define raw_cpu_add_return_2(pcp, val)		percpu_add_return_op(2, , pcp, val)
@@ -409,7 +384,7 @@ do {									\
 #define this_cpu_and_8(pcp, val)		percpu_to_op(8, volatile, "and", (pcp), val)
 #define this_cpu_or_8(pcp, val)			percpu_to_op(8, volatile, "or", (pcp), val)
 #define this_cpu_add_return_8(pcp, val)		percpu_add_return_op(8, volatile, pcp, val)
-#define this_cpu_xchg_8(pcp, nval)		percpu_xchg_op(volatile, pcp, nval)
+#define this_cpu_xchg_8(pcp, nval)		percpu_xchg_op(8, volatile, pcp, nval)
 #define this_cpu_cmpxchg_8(pcp, oval, nval)	percpu_cmpxchg_op(volatile, pcp, oval, nval)
 
 /*

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [tip: x86/asm] x86/percpu: Clean up percpu_add_return_op()
  2020-07-20 20:49 ` [PATCH v3 06/11] x86/percpu: Clean up percpu_add_return_op() Nick Desaulniers
@ 2020-07-23 10:43   ` tip-bot2 for Brian Gerst
  0 siblings, 0 replies; 34+ messages in thread
From: tip-bot2 for Brian Gerst @ 2020-07-23 10:43 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Brian Gerst, Nick Desaulniers, Thomas Gleixner, Sedat Dilek,
	Linus Torvalds, Peter Zijlstra (Intel),
	Dennis Zhou, x86, LKML

The following commit has been merged into the x86/asm branch of tip:

Commit-ID:     bbff583b84a130d4d1234d68906c41690575be36
Gitweb:        https://git.kernel.org/tip/bbff583b84a130d4d1234d68906c41690575be36
Author:        Brian Gerst <brgerst@gmail.com>
AuthorDate:    Mon, 20 Jul 2020 13:49:20 -07:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Thu, 23 Jul 2020 11:46:41 +02:00

x86/percpu: Clean up percpu_add_return_op()

The core percpu macros already have a switch on the data size, so the switch
in the x86 code is redundant and produces more dead code.

Also use appropriate types for the width of the instructions.  This avoids
errors when compiling with Clang.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Dennis Zhou <dennis@kernel.org>
Link: https://lkml.kernel.org/r/20200720204925.3654302-7-ndesaulniers@google.com

---
 arch/x86/include/asm/percpu.h | 51 ++++++++++------------------------
 1 file changed, 16 insertions(+), 35 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index 9bb5440..0776a11 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -199,34 +199,15 @@ do {									\
 /*
  * Add return operation
  */
-#define percpu_add_return_op(qual, var, val)				\
+#define percpu_add_return_op(size, qual, _var, _val)			\
 ({									\
-	typeof(var) paro_ret__ = val;					\
-	switch (sizeof(var)) {						\
-	case 1:								\
-		asm qual ("xaddb %0, "__percpu_arg(1)			\
-			    : "+q" (paro_ret__), "+m" (var)		\
-			    : : "memory");				\
-		break;							\
-	case 2:								\
-		asm qual ("xaddw %0, "__percpu_arg(1)			\
-			    : "+r" (paro_ret__), "+m" (var)		\
-			    : : "memory");				\
-		break;							\
-	case 4:								\
-		asm qual ("xaddl %0, "__percpu_arg(1)			\
-			    : "+r" (paro_ret__), "+m" (var)		\
-			    : : "memory");				\
-		break;							\
-	case 8:								\
-		asm qual ("xaddq %0, "__percpu_arg(1)			\
-			    : "+r" (paro_ret__), "+m" (var)		\
-			    : : "memory");				\
-		break;							\
-	default: __bad_percpu_size();					\
-	}								\
-	paro_ret__ += val;						\
-	paro_ret__;							\
+	__pcpu_type_##size paro_tmp__ = __pcpu_cast_##size(_val);	\
+	asm qual (__pcpu_op2_##size("xadd", "%[tmp]",			\
+				     __percpu_arg([var]))		\
+		  : [tmp] __pcpu_reg_##size("+", paro_tmp__),		\
+		    [var] "+m" (_var)					\
+		  : : "memory");					\
+	(typeof(_var))(unsigned long) (paro_tmp__ + _val);		\
 })
 
 /*
@@ -377,16 +358,16 @@ do {									\
 #define this_cpu_xchg_2(pcp, nval)	percpu_xchg_op(volatile, pcp, nval)
 #define this_cpu_xchg_4(pcp, nval)	percpu_xchg_op(volatile, pcp, nval)
 
-#define raw_cpu_add_return_1(pcp, val)		percpu_add_return_op(, pcp, val)
-#define raw_cpu_add_return_2(pcp, val)		percpu_add_return_op(, pcp, val)
-#define raw_cpu_add_return_4(pcp, val)		percpu_add_return_op(, pcp, val)
+#define raw_cpu_add_return_1(pcp, val)		percpu_add_return_op(1, , pcp, val)
+#define raw_cpu_add_return_2(pcp, val)		percpu_add_return_op(2, , pcp, val)
+#define raw_cpu_add_return_4(pcp, val)		percpu_add_return_op(4, , pcp, val)
 #define raw_cpu_cmpxchg_1(pcp, oval, nval)	percpu_cmpxchg_op(, pcp, oval, nval)
 #define raw_cpu_cmpxchg_2(pcp, oval, nval)	percpu_cmpxchg_op(, pcp, oval, nval)
 #define raw_cpu_cmpxchg_4(pcp, oval, nval)	percpu_cmpxchg_op(, pcp, oval, nval)
 
-#define this_cpu_add_return_1(pcp, val)		percpu_add_return_op(volatile, pcp, val)
-#define this_cpu_add_return_2(pcp, val)		percpu_add_return_op(volatile, pcp, val)
-#define this_cpu_add_return_4(pcp, val)		percpu_add_return_op(volatile, pcp, val)
+#define this_cpu_add_return_1(pcp, val)		percpu_add_return_op(1, volatile, pcp, val)
+#define this_cpu_add_return_2(pcp, val)		percpu_add_return_op(2, volatile, pcp, val)
+#define this_cpu_add_return_4(pcp, val)		percpu_add_return_op(4, volatile, pcp, val)
 #define this_cpu_cmpxchg_1(pcp, oval, nval)	percpu_cmpxchg_op(volatile, pcp, oval, nval)
 #define this_cpu_cmpxchg_2(pcp, oval, nval)	percpu_cmpxchg_op(volatile, pcp, oval, nval)
 #define this_cpu_cmpxchg_4(pcp, oval, nval)	percpu_cmpxchg_op(volatile, pcp, oval, nval)
@@ -418,7 +399,7 @@ do {									\
 #define raw_cpu_add_8(pcp, val)			percpu_add_op(8, , (pcp), val)
 #define raw_cpu_and_8(pcp, val)			percpu_to_op(8, , "and", (pcp), val)
 #define raw_cpu_or_8(pcp, val)			percpu_to_op(8, , "or", (pcp), val)
-#define raw_cpu_add_return_8(pcp, val)		percpu_add_return_op(, pcp, val)
+#define raw_cpu_add_return_8(pcp, val)		percpu_add_return_op(8, , pcp, val)
 #define raw_cpu_xchg_8(pcp, nval)		raw_percpu_xchg_op(pcp, nval)
 #define raw_cpu_cmpxchg_8(pcp, oval, nval)	percpu_cmpxchg_op(, pcp, oval, nval)
 
@@ -427,7 +408,7 @@ do {									\
 #define this_cpu_add_8(pcp, val)		percpu_add_op(8, volatile, (pcp), val)
 #define this_cpu_and_8(pcp, val)		percpu_to_op(8, volatile, "and", (pcp), val)
 #define this_cpu_or_8(pcp, val)			percpu_to_op(8, volatile, "or", (pcp), val)
-#define this_cpu_add_return_8(pcp, val)		percpu_add_return_op(volatile, pcp, val)
+#define this_cpu_add_return_8(pcp, val)		percpu_add_return_op(8, volatile, pcp, val)
 #define this_cpu_xchg_8(pcp, nval)		percpu_xchg_op(volatile, pcp, nval)
 #define this_cpu_cmpxchg_8(pcp, oval, nval)	percpu_cmpxchg_op(volatile, pcp, oval, nval)
 

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [tip: x86/asm] x86/percpu: Remove "e" constraint from XADD
  2020-07-20 20:49 ` [PATCH v3 05/11] x86/percpu: Remove "e" constraint from XADD Nick Desaulniers
@ 2020-07-23 10:43   ` tip-bot2 for Brian Gerst
  0 siblings, 0 replies; 34+ messages in thread
From: tip-bot2 for Brian Gerst @ 2020-07-23 10:43 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Brian Gerst, Nick Desaulniers, Thomas Gleixner, Sedat Dilek,
	Linus Torvalds, Peter Zijlstra (Intel),
	Dennis Zhou, x86, LKML

The following commit has been merged into the x86/asm branch of tip:

Commit-ID:     e4d16defbbde028aeab2026995f0ced4240df6d6
Gitweb:        https://git.kernel.org/tip/e4d16defbbde028aeab2026995f0ced4240df6d6
Author:        Brian Gerst <brgerst@gmail.com>
AuthorDate:    Mon, 20 Jul 2020 13:49:19 -07:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Thu, 23 Jul 2020 11:46:40 +02:00

x86/percpu: Remove "e" constraint from XADD

The "e" constraint represents a constant, but the XADD instruction doesn't
accept immediate operands.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Dennis Zhou <dennis@kernel.org>
Link: https://lkml.kernel.org/r/20200720204925.3654302-6-ndesaulniers@google.com

---
 arch/x86/include/asm/percpu.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index 2a24f3c..9bb5440 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -220,7 +220,7 @@ do {									\
 		break;							\
 	case 8:								\
 		asm qual ("xaddq %0, "__percpu_arg(1)			\
-			    : "+re" (paro_ret__), "+m" (var)		\
+			    : "+r" (paro_ret__), "+m" (var)		\
 			    : : "memory");				\
 		break;							\
 	default: __bad_percpu_size();					\

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [tip: x86/asm] x86/percpu: Clean up percpu_add_op()
  2020-07-20 20:49 ` [PATCH v3 04/11] x86/percpu: Clean up percpu_add_op() Nick Desaulniers
@ 2020-07-23 10:43   ` tip-bot2 for Brian Gerst
  0 siblings, 0 replies; 34+ messages in thread
From: tip-bot2 for Brian Gerst @ 2020-07-23 10:43 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Brian Gerst, Nick Desaulniers, Thomas Gleixner, Sedat Dilek,
	Linus Torvalds, Peter Zijlstra (Intel),
	Dennis Zhou, x86, LKML

The following commit has been merged into the x86/asm branch of tip:

Commit-ID:     33e5614a435ff8047d768e6501454ae1cc7f131f
Gitweb:        https://git.kernel.org/tip/33e5614a435ff8047d768e6501454ae1cc7f131f
Author:        Brian Gerst <brgerst@gmail.com>
AuthorDate:    Mon, 20 Jul 2020 13:49:18 -07:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Thu, 23 Jul 2020 11:46:40 +02:00

x86/percpu: Clean up percpu_add_op()

The core percpu macros already have a switch on the data size, so the switch
in the x86 code is redundant and produces more dead code.

Also use appropriate types for the width of the instructions.  This avoids
errors when compiling with Clang.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Dennis Zhou <dennis@kernel.org>
Link: https://lkml.kernel.org/r/20200720204925.3654302-5-ndesaulniers@google.com

---
 arch/x86/include/asm/percpu.h |  99 +++++++--------------------------
 1 file changed, 22 insertions(+), 77 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index a40d2e0..2a24f3c 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -130,64 +130,32 @@ do {									\
 	    : [val] __pcpu_reg_imm_##size(pto_val__));			\
 } while (0)
 
+#define percpu_unary_op(size, qual, op, _var)				\
+({									\
+	asm qual (__pcpu_op1_##size(op, __percpu_arg([var]))		\
+	    : [var] "+m" (_var));					\
+})
+
 /*
  * Generate a percpu add to memory instruction and optimize code
  * if one is added or subtracted.
  */
-#define percpu_add_op(qual, var, val)					\
+#define percpu_add_op(size, qual, var, val)				\
 do {									\
-	typedef typeof(var) pao_T__;					\
 	const int pao_ID__ = (__builtin_constant_p(val) &&		\
 			      ((val) == 1 || (val) == -1)) ?		\
 				(int)(val) : 0;				\
 	if (0) {							\
-		pao_T__ pao_tmp__;					\
+		typeof(var) pao_tmp__;					\
 		pao_tmp__ = (val);					\
 		(void)pao_tmp__;					\
 	}								\
-	switch (sizeof(var)) {						\
-	case 1:								\
-		if (pao_ID__ == 1)					\
-			asm qual ("incb "__percpu_arg(0) : "+m" (var));	\
-		else if (pao_ID__ == -1)				\
-			asm qual ("decb "__percpu_arg(0) : "+m" (var));	\
-		else							\
-			asm qual ("addb %1, "__percpu_arg(0)		\
-			    : "+m" (var)				\
-			    : "qi" ((pao_T__)(val)));			\
-		break;							\
-	case 2:								\
-		if (pao_ID__ == 1)					\
-			asm qual ("incw "__percpu_arg(0) : "+m" (var));	\
-		else if (pao_ID__ == -1)				\
-			asm qual ("decw "__percpu_arg(0) : "+m" (var));	\
-		else							\
-			asm qual ("addw %1, "__percpu_arg(0)		\
-			    : "+m" (var)				\
-			    : "ri" ((pao_T__)(val)));			\
-		break;							\
-	case 4:								\
-		if (pao_ID__ == 1)					\
-			asm qual ("incl "__percpu_arg(0) : "+m" (var));	\
-		else if (pao_ID__ == -1)				\
-			asm qual ("decl "__percpu_arg(0) : "+m" (var));	\
-		else							\
-			asm qual ("addl %1, "__percpu_arg(0)		\
-			    : "+m" (var)				\
-			    : "ri" ((pao_T__)(val)));			\
-		break;							\
-	case 8:								\
-		if (pao_ID__ == 1)					\
-			asm qual ("incq "__percpu_arg(0) : "+m" (var));	\
-		else if (pao_ID__ == -1)				\
-			asm qual ("decq "__percpu_arg(0) : "+m" (var));	\
-		else							\
-			asm qual ("addq %1, "__percpu_arg(0)		\
-			    : "+m" (var)				\
-			    : "re" ((pao_T__)(val)));			\
-		break;							\
-	default: __bad_percpu_size();					\
-	}								\
+	if (pao_ID__ == 1)						\
+		percpu_unary_op(size, qual, "inc", var);		\
+	else if (pao_ID__ == -1)					\
+		percpu_unary_op(size, qual, "dec", var);		\
+	else								\
+		percpu_to_op(size, qual, "add", var, val);		\
 } while (0)
 
 #define percpu_from_op(size, qual, op, _var)				\
@@ -228,29 +196,6 @@ do {									\
 	pfo_ret__;					\
 })
 
-#define percpu_unary_op(qual, op, var)			\
-({							\
-	switch (sizeof(var)) {				\
-	case 1:						\
-		asm qual (op "b "__percpu_arg(0)	\
-		    : "+m" (var));			\
-		break;					\
-	case 2:						\
-		asm qual (op "w "__percpu_arg(0)	\
-		    : "+m" (var));			\
-		break;					\
-	case 4:						\
-		asm qual (op "l "__percpu_arg(0)	\
-		    : "+m" (var));			\
-		break;					\
-	case 8:						\
-		asm qual (op "q "__percpu_arg(0)	\
-		    : "+m" (var));			\
-		break;					\
-	default: __bad_percpu_size();			\
-	}						\
-})
-
 /*
  * Add return operation
  */
@@ -388,9 +333,9 @@ do {									\
 #define raw_cpu_write_1(pcp, val)	percpu_to_op(1, , "mov", (pcp), val)
 #define raw_cpu_write_2(pcp, val)	percpu_to_op(2, , "mov", (pcp), val)
 #define raw_cpu_write_4(pcp, val)	percpu_to_op(4, , "mov", (pcp), val)
-#define raw_cpu_add_1(pcp, val)		percpu_add_op(, (pcp), val)
-#define raw_cpu_add_2(pcp, val)		percpu_add_op(, (pcp), val)
-#define raw_cpu_add_4(pcp, val)		percpu_add_op(, (pcp), val)
+#define raw_cpu_add_1(pcp, val)		percpu_add_op(1, , (pcp), val)
+#define raw_cpu_add_2(pcp, val)		percpu_add_op(2, , (pcp), val)
+#define raw_cpu_add_4(pcp, val)		percpu_add_op(4, , (pcp), val)
 #define raw_cpu_and_1(pcp, val)		percpu_to_op(1, , "and", (pcp), val)
 #define raw_cpu_and_2(pcp, val)		percpu_to_op(2, , "and", (pcp), val)
 #define raw_cpu_and_4(pcp, val)		percpu_to_op(4, , "and", (pcp), val)
@@ -419,9 +364,9 @@ do {									\
 #define this_cpu_write_1(pcp, val)	percpu_to_op(1, volatile, "mov", (pcp), val)
 #define this_cpu_write_2(pcp, val)	percpu_to_op(2, volatile, "mov", (pcp), val)
 #define this_cpu_write_4(pcp, val)	percpu_to_op(4, volatile, "mov", (pcp), val)
-#define this_cpu_add_1(pcp, val)	percpu_add_op(volatile, (pcp), val)
-#define this_cpu_add_2(pcp, val)	percpu_add_op(volatile, (pcp), val)
-#define this_cpu_add_4(pcp, val)	percpu_add_op(volatile, (pcp), val)
+#define this_cpu_add_1(pcp, val)	percpu_add_op(1, volatile, (pcp), val)
+#define this_cpu_add_2(pcp, val)	percpu_add_op(2, volatile, (pcp), val)
+#define this_cpu_add_4(pcp, val)	percpu_add_op(4, volatile, (pcp), val)
 #define this_cpu_and_1(pcp, val)	percpu_to_op(1, volatile, "and", (pcp), val)
 #define this_cpu_and_2(pcp, val)	percpu_to_op(2, volatile, "and", (pcp), val)
 #define this_cpu_and_4(pcp, val)	percpu_to_op(4, volatile, "and", (pcp), val)
@@ -470,7 +415,7 @@ do {									\
 #ifdef CONFIG_X86_64
 #define raw_cpu_read_8(pcp)			percpu_from_op(8, , "mov", pcp)
 #define raw_cpu_write_8(pcp, val)		percpu_to_op(8, , "mov", (pcp), val)
-#define raw_cpu_add_8(pcp, val)			percpu_add_op(, (pcp), val)
+#define raw_cpu_add_8(pcp, val)			percpu_add_op(8, , (pcp), val)
 #define raw_cpu_and_8(pcp, val)			percpu_to_op(8, , "and", (pcp), val)
 #define raw_cpu_or_8(pcp, val)			percpu_to_op(8, , "or", (pcp), val)
 #define raw_cpu_add_return_8(pcp, val)		percpu_add_return_op(, pcp, val)
@@ -479,7 +424,7 @@ do {									\
 
 #define this_cpu_read_8(pcp)			percpu_from_op(8, volatile, "mov", pcp)
 #define this_cpu_write_8(pcp, val)		percpu_to_op(8, volatile, "mov", (pcp), val)
-#define this_cpu_add_8(pcp, val)		percpu_add_op(volatile, (pcp), val)
+#define this_cpu_add_8(pcp, val)		percpu_add_op(8, volatile, (pcp), val)
 #define this_cpu_and_8(pcp, val)		percpu_to_op(8, volatile, "and", (pcp), val)
 #define this_cpu_or_8(pcp, val)			percpu_to_op(8, volatile, "or", (pcp), val)
 #define this_cpu_add_return_8(pcp, val)		percpu_add_return_op(volatile, pcp, val)

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [tip: x86/asm] x86/percpu: Clean up percpu_from_op()
  2020-07-20 20:49 ` [PATCH v3 03/11] x86/percpu: Clean up percpu_from_op() Nick Desaulniers
@ 2020-07-23 10:43   ` tip-bot2 for Brian Gerst
  0 siblings, 0 replies; 34+ messages in thread
From: tip-bot2 for Brian Gerst @ 2020-07-23 10:43 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Brian Gerst, Nick Desaulniers, Thomas Gleixner, Sedat Dilek,
	Linus Torvalds, Peter Zijlstra (Intel),
	Dennis Zhou, x86, LKML

The following commit has been merged into the x86/asm branch of tip:

Commit-ID:     bb631e3002840706362a7d76e3ebb3604cce91a7
Gitweb:        https://git.kernel.org/tip/bb631e3002840706362a7d76e3ebb3604cce91a7
Author:        Brian Gerst <brgerst@gmail.com>
AuthorDate:    Mon, 20 Jul 2020 13:49:17 -07:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Thu, 23 Jul 2020 11:46:39 +02:00

x86/percpu: Clean up percpu_from_op()

The core percpu macros already have a switch on the data size, so the switch
in the x86 code is redundant and produces more dead code.

Also use appropriate types for the width of the instructions.  This avoids
errors when compiling with Clang.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Dennis Zhou <dennis@kernel.org>
Link: https://lkml.kernel.org/r/20200720204925.3654302-4-ndesaulniers@google.com

---
 arch/x86/include/asm/percpu.h | 50 ++++++++++------------------------
 1 file changed, 15 insertions(+), 35 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index fb280fb..a40d2e0 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -190,33 +190,13 @@ do {									\
 	}								\
 } while (0)
 
-#define percpu_from_op(qual, op, var)			\
-({							\
-	typeof(var) pfo_ret__;				\
-	switch (sizeof(var)) {				\
-	case 1:						\
-		asm qual (op "b "__percpu_arg(1)",%0"	\
-		    : "=q" (pfo_ret__)			\
-		    : "m" (var));			\
-		break;					\
-	case 2:						\
-		asm qual (op "w "__percpu_arg(1)",%0"	\
-		    : "=r" (pfo_ret__)			\
-		    : "m" (var));			\
-		break;					\
-	case 4:						\
-		asm qual (op "l "__percpu_arg(1)",%0"	\
-		    : "=r" (pfo_ret__)			\
-		    : "m" (var));			\
-		break;					\
-	case 8:						\
-		asm qual (op "q "__percpu_arg(1)",%0"	\
-		    : "=r" (pfo_ret__)			\
-		    : "m" (var));			\
-		break;					\
-	default: __bad_percpu_size();			\
-	}						\
-	pfo_ret__;					\
+#define percpu_from_op(size, qual, op, _var)				\
+({									\
+	__pcpu_type_##size pfo_val__;					\
+	asm qual (__pcpu_op2_##size(op, __percpu_arg([var]), "%[val]")	\
+	    : [val] __pcpu_reg_##size("=", pfo_val__)			\
+	    : [var] "m" (_var));					\
+	(typeof(_var))(unsigned long) pfo_val__;			\
 })
 
 #define percpu_stable_op(op, var)			\
@@ -401,9 +381,9 @@ do {									\
  */
 #define this_cpu_read_stable(var)	percpu_stable_op("mov", var)
 
-#define raw_cpu_read_1(pcp)		percpu_from_op(, "mov", pcp)
-#define raw_cpu_read_2(pcp)		percpu_from_op(, "mov", pcp)
-#define raw_cpu_read_4(pcp)		percpu_from_op(, "mov", pcp)
+#define raw_cpu_read_1(pcp)		percpu_from_op(1, , "mov", pcp)
+#define raw_cpu_read_2(pcp)		percpu_from_op(2, , "mov", pcp)
+#define raw_cpu_read_4(pcp)		percpu_from_op(4, , "mov", pcp)
 
 #define raw_cpu_write_1(pcp, val)	percpu_to_op(1, , "mov", (pcp), val)
 #define raw_cpu_write_2(pcp, val)	percpu_to_op(2, , "mov", (pcp), val)
@@ -433,9 +413,9 @@ do {									\
 #define raw_cpu_xchg_2(pcp, val)	raw_percpu_xchg_op(pcp, val)
 #define raw_cpu_xchg_4(pcp, val)	raw_percpu_xchg_op(pcp, val)
 
-#define this_cpu_read_1(pcp)		percpu_from_op(volatile, "mov", pcp)
-#define this_cpu_read_2(pcp)		percpu_from_op(volatile, "mov", pcp)
-#define this_cpu_read_4(pcp)		percpu_from_op(volatile, "mov", pcp)
+#define this_cpu_read_1(pcp)		percpu_from_op(1, volatile, "mov", pcp)
+#define this_cpu_read_2(pcp)		percpu_from_op(2, volatile, "mov", pcp)
+#define this_cpu_read_4(pcp)		percpu_from_op(4, volatile, "mov", pcp)
 #define this_cpu_write_1(pcp, val)	percpu_to_op(1, volatile, "mov", (pcp), val)
 #define this_cpu_write_2(pcp, val)	percpu_to_op(2, volatile, "mov", (pcp), val)
 #define this_cpu_write_4(pcp, val)	percpu_to_op(4, volatile, "mov", (pcp), val)
@@ -488,7 +468,7 @@ do {									\
  * 32 bit must fall back to generic operations.
  */
 #ifdef CONFIG_X86_64
-#define raw_cpu_read_8(pcp)			percpu_from_op(, "mov", pcp)
+#define raw_cpu_read_8(pcp)			percpu_from_op(8, , "mov", pcp)
 #define raw_cpu_write_8(pcp, val)		percpu_to_op(8, , "mov", (pcp), val)
 #define raw_cpu_add_8(pcp, val)			percpu_add_op(, (pcp), val)
 #define raw_cpu_and_8(pcp, val)			percpu_to_op(8, , "and", (pcp), val)
@@ -497,7 +477,7 @@ do {									\
 #define raw_cpu_xchg_8(pcp, nval)		raw_percpu_xchg_op(pcp, nval)
 #define raw_cpu_cmpxchg_8(pcp, oval, nval)	percpu_cmpxchg_op(, pcp, oval, nval)
 
-#define this_cpu_read_8(pcp)			percpu_from_op(volatile, "mov", pcp)
+#define this_cpu_read_8(pcp)			percpu_from_op(8, volatile, "mov", pcp)
 #define this_cpu_write_8(pcp, val)		percpu_to_op(8, volatile, "mov", (pcp), val)
 #define this_cpu_add_8(pcp, val)		percpu_add_op(volatile, (pcp), val)
 #define this_cpu_and_8(pcp, val)		percpu_to_op(8, volatile, "and", (pcp), val)

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [tip: x86/asm] x86/percpu: Clean up percpu_to_op()
  2020-07-20 20:49 ` [PATCH v3 02/11] x86/percpu: Clean up percpu_to_op() Nick Desaulniers
@ 2020-07-23 10:43   ` tip-bot2 for Brian Gerst
  0 siblings, 0 replies; 34+ messages in thread
From: tip-bot2 for Brian Gerst @ 2020-07-23 10:43 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Brian Gerst, Nick Desaulniers, Thomas Gleixner, Sedat Dilek,
	Linus Torvalds, Peter Zijlstra (Intel),
	Dennis Zhou, x86, LKML

The following commit has been merged into the x86/asm branch of tip:

Commit-ID:     c175acc14719e69ecec4dafbb642a7f38c76c064
Gitweb:        https://git.kernel.org/tip/c175acc14719e69ecec4dafbb642a7f38c76c064
Author:        Brian Gerst <brgerst@gmail.com>
AuthorDate:    Mon, 20 Jul 2020 13:49:16 -07:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Thu, 23 Jul 2020 11:46:39 +02:00

x86/percpu: Clean up percpu_to_op()

The core percpu macros already have a switch on the data size, so the switch
in the x86 code is redundant and produces more dead code.

Also use appropriate types for the width of the instructions.  This avoids
errors when compiling with Clang.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Dennis Zhou <dennis@kernel.org>
Link: https://lkml.kernel.org/r/20200720204925.3654302-3-ndesaulniers@google.com

---
 arch/x86/include/asm/percpu.h | 90 +++++++++++++---------------------
 1 file changed, 35 insertions(+), 55 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index 19838e4..fb280fb 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -117,37 +117,17 @@ extern void __bad_percpu_size(void);
 #define __pcpu_reg_imm_4(x) "ri" (x)
 #define __pcpu_reg_imm_8(x) "re" (x)
 
-#define percpu_to_op(qual, op, var, val)		\
-do {							\
-	typedef typeof(var) pto_T__;			\
-	if (0) {					\
-		pto_T__ pto_tmp__;			\
-		pto_tmp__ = (val);			\
-		(void)pto_tmp__;			\
-	}						\
-	switch (sizeof(var)) {				\
-	case 1:						\
-		asm qual (op "b %1,"__percpu_arg(0)	\
-		    : "+m" (var)			\
-		    : "qi" ((pto_T__)(val)));		\
-		break;					\
-	case 2:						\
-		asm qual (op "w %1,"__percpu_arg(0)	\
-		    : "+m" (var)			\
-		    : "ri" ((pto_T__)(val)));		\
-		break;					\
-	case 4:						\
-		asm qual (op "l %1,"__percpu_arg(0)	\
-		    : "+m" (var)			\
-		    : "ri" ((pto_T__)(val)));		\
-		break;					\
-	case 8:						\
-		asm qual (op "q %1,"__percpu_arg(0)	\
-		    : "+m" (var)			\
-		    : "re" ((pto_T__)(val)));		\
-		break;					\
-	default: __bad_percpu_size();			\
-	}						\
+#define percpu_to_op(size, qual, op, _var, _val)			\
+do {									\
+	__pcpu_type_##size pto_val__ = __pcpu_cast_##size(_val);	\
+	if (0) {		                                        \
+		typeof(_var) pto_tmp__;					\
+		pto_tmp__ = (_val);					\
+		(void)pto_tmp__;					\
+	}								\
+	asm qual(__pcpu_op2_##size(op, "%[val]", __percpu_arg([var]))	\
+	    : [var] "+m" (_var)						\
+	    : [val] __pcpu_reg_imm_##size(pto_val__));			\
 } while (0)
 
 /*
@@ -425,18 +405,18 @@ do {									\
 #define raw_cpu_read_2(pcp)		percpu_from_op(, "mov", pcp)
 #define raw_cpu_read_4(pcp)		percpu_from_op(, "mov", pcp)
 
-#define raw_cpu_write_1(pcp, val)	percpu_to_op(, "mov", (pcp), val)
-#define raw_cpu_write_2(pcp, val)	percpu_to_op(, "mov", (pcp), val)
-#define raw_cpu_write_4(pcp, val)	percpu_to_op(, "mov", (pcp), val)
+#define raw_cpu_write_1(pcp, val)	percpu_to_op(1, , "mov", (pcp), val)
+#define raw_cpu_write_2(pcp, val)	percpu_to_op(2, , "mov", (pcp), val)
+#define raw_cpu_write_4(pcp, val)	percpu_to_op(4, , "mov", (pcp), val)
 #define raw_cpu_add_1(pcp, val)		percpu_add_op(, (pcp), val)
 #define raw_cpu_add_2(pcp, val)		percpu_add_op(, (pcp), val)
 #define raw_cpu_add_4(pcp, val)		percpu_add_op(, (pcp), val)
-#define raw_cpu_and_1(pcp, val)		percpu_to_op(, "and", (pcp), val)
-#define raw_cpu_and_2(pcp, val)		percpu_to_op(, "and", (pcp), val)
-#define raw_cpu_and_4(pcp, val)		percpu_to_op(, "and", (pcp), val)
-#define raw_cpu_or_1(pcp, val)		percpu_to_op(, "or", (pcp), val)
-#define raw_cpu_or_2(pcp, val)		percpu_to_op(, "or", (pcp), val)
-#define raw_cpu_or_4(pcp, val)		percpu_to_op(, "or", (pcp), val)
+#define raw_cpu_and_1(pcp, val)		percpu_to_op(1, , "and", (pcp), val)
+#define raw_cpu_and_2(pcp, val)		percpu_to_op(2, , "and", (pcp), val)
+#define raw_cpu_and_4(pcp, val)		percpu_to_op(4, , "and", (pcp), val)
+#define raw_cpu_or_1(pcp, val)		percpu_to_op(1, , "or", (pcp), val)
+#define raw_cpu_or_2(pcp, val)		percpu_to_op(2, , "or", (pcp), val)
+#define raw_cpu_or_4(pcp, val)		percpu_to_op(4, , "or", (pcp), val)
 
 /*
  * raw_cpu_xchg() can use a load-store since it is not required to be
@@ -456,18 +436,18 @@ do {									\
 #define this_cpu_read_1(pcp)		percpu_from_op(volatile, "mov", pcp)
 #define this_cpu_read_2(pcp)		percpu_from_op(volatile, "mov", pcp)
 #define this_cpu_read_4(pcp)		percpu_from_op(volatile, "mov", pcp)
-#define this_cpu_write_1(pcp, val)	percpu_to_op(volatile, "mov", (pcp), val)
-#define this_cpu_write_2(pcp, val)	percpu_to_op(volatile, "mov", (pcp), val)
-#define this_cpu_write_4(pcp, val)	percpu_to_op(volatile, "mov", (pcp), val)
+#define this_cpu_write_1(pcp, val)	percpu_to_op(1, volatile, "mov", (pcp), val)
+#define this_cpu_write_2(pcp, val)	percpu_to_op(2, volatile, "mov", (pcp), val)
+#define this_cpu_write_4(pcp, val)	percpu_to_op(4, volatile, "mov", (pcp), val)
 #define this_cpu_add_1(pcp, val)	percpu_add_op(volatile, (pcp), val)
 #define this_cpu_add_2(pcp, val)	percpu_add_op(volatile, (pcp), val)
 #define this_cpu_add_4(pcp, val)	percpu_add_op(volatile, (pcp), val)
-#define this_cpu_and_1(pcp, val)	percpu_to_op(volatile, "and", (pcp), val)
-#define this_cpu_and_2(pcp, val)	percpu_to_op(volatile, "and", (pcp), val)
-#define this_cpu_and_4(pcp, val)	percpu_to_op(volatile, "and", (pcp), val)
-#define this_cpu_or_1(pcp, val)		percpu_to_op(volatile, "or", (pcp), val)
-#define this_cpu_or_2(pcp, val)		percpu_to_op(volatile, "or", (pcp), val)
-#define this_cpu_or_4(pcp, val)		percpu_to_op(volatile, "or", (pcp), val)
+#define this_cpu_and_1(pcp, val)	percpu_to_op(1, volatile, "and", (pcp), val)
+#define this_cpu_and_2(pcp, val)	percpu_to_op(2, volatile, "and", (pcp), val)
+#define this_cpu_and_4(pcp, val)	percpu_to_op(4, volatile, "and", (pcp), val)
+#define this_cpu_or_1(pcp, val)		percpu_to_op(1, volatile, "or", (pcp), val)
+#define this_cpu_or_2(pcp, val)		percpu_to_op(2, volatile, "or", (pcp), val)
+#define this_cpu_or_4(pcp, val)		percpu_to_op(4, volatile, "or", (pcp), val)
 #define this_cpu_xchg_1(pcp, nval)	percpu_xchg_op(volatile, pcp, nval)
 #define this_cpu_xchg_2(pcp, nval)	percpu_xchg_op(volatile, pcp, nval)
 #define this_cpu_xchg_4(pcp, nval)	percpu_xchg_op(volatile, pcp, nval)
@@ -509,19 +489,19 @@ do {									\
  */
 #ifdef CONFIG_X86_64
 #define raw_cpu_read_8(pcp)			percpu_from_op(, "mov", pcp)
-#define raw_cpu_write_8(pcp, val)		percpu_to_op(, "mov", (pcp), val)
+#define raw_cpu_write_8(pcp, val)		percpu_to_op(8, , "mov", (pcp), val)
 #define raw_cpu_add_8(pcp, val)			percpu_add_op(, (pcp), val)
-#define raw_cpu_and_8(pcp, val)			percpu_to_op(, "and", (pcp), val)
-#define raw_cpu_or_8(pcp, val)			percpu_to_op(, "or", (pcp), val)
+#define raw_cpu_and_8(pcp, val)			percpu_to_op(8, , "and", (pcp), val)
+#define raw_cpu_or_8(pcp, val)			percpu_to_op(8, , "or", (pcp), val)
 #define raw_cpu_add_return_8(pcp, val)		percpu_add_return_op(, pcp, val)
 #define raw_cpu_xchg_8(pcp, nval)		raw_percpu_xchg_op(pcp, nval)
 #define raw_cpu_cmpxchg_8(pcp, oval, nval)	percpu_cmpxchg_op(, pcp, oval, nval)
 
 #define this_cpu_read_8(pcp)			percpu_from_op(volatile, "mov", pcp)
-#define this_cpu_write_8(pcp, val)		percpu_to_op(volatile, "mov", (pcp), val)
+#define this_cpu_write_8(pcp, val)		percpu_to_op(8, volatile, "mov", (pcp), val)
 #define this_cpu_add_8(pcp, val)		percpu_add_op(volatile, (pcp), val)
-#define this_cpu_and_8(pcp, val)		percpu_to_op(volatile, "and", (pcp), val)
-#define this_cpu_or_8(pcp, val)			percpu_to_op(volatile, "or", (pcp), val)
+#define this_cpu_and_8(pcp, val)		percpu_to_op(8, volatile, "and", (pcp), val)
+#define this_cpu_or_8(pcp, val)			percpu_to_op(8, volatile, "or", (pcp), val)
 #define this_cpu_add_return_8(pcp, val)		percpu_add_return_op(volatile, pcp, val)
 #define this_cpu_xchg_8(pcp, nval)		percpu_xchg_op(volatile, pcp, nval)
 #define this_cpu_cmpxchg_8(pcp, oval, nval)	percpu_cmpxchg_op(volatile, pcp, oval, nval)

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [tip: x86/asm] x86/percpu: Introduce size abstraction macros
  2020-07-20 20:49 ` [PATCH v3 01/11] x86/percpu: Introduce size abstraction macros Nick Desaulniers
@ 2020-07-23 10:43   ` tip-bot2 for Brian Gerst
  0 siblings, 0 replies; 34+ messages in thread
From: tip-bot2 for Brian Gerst @ 2020-07-23 10:43 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Brian Gerst, Nick Desaulniers, Thomas Gleixner, Sedat Dilek,
	Linus Torvalds, Peter Zijlstra (Intel),
	Dennis Zhou, x86, LKML

The following commit has been merged into the x86/asm branch of tip:

Commit-ID:     6865dc3ae93b9acb336ca48bd7b2db3446d89370
Gitweb:        https://git.kernel.org/tip/6865dc3ae93b9acb336ca48bd7b2db3446d89370
Author:        Brian Gerst <brgerst@gmail.com>
AuthorDate:    Mon, 20 Jul 2020 13:49:15 -07:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Thu, 23 Jul 2020 11:46:39 +02:00

x86/percpu: Introduce size abstraction macros

In preparation for cleaning up the percpu operations, define macros for
abstraction based on the width of the operation.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Dennis Zhou <dennis@kernel.org>
Link: https://lkml.kernel.org/r/20200720204925.3654302-2-ndesaulniers@google.com

---
 arch/x86/include/asm/percpu.h | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index 2278797..19838e4 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -87,6 +87,36 @@
  * don't give an lvalue though). */
 extern void __bad_percpu_size(void);
 
+#define __pcpu_type_1 u8
+#define __pcpu_type_2 u16
+#define __pcpu_type_4 u32
+#define __pcpu_type_8 u64
+
+#define __pcpu_cast_1(val) ((u8)(((unsigned long) val) & 0xff))
+#define __pcpu_cast_2(val) ((u16)(((unsigned long) val) & 0xffff))
+#define __pcpu_cast_4(val) ((u32)(((unsigned long) val) & 0xffffffff))
+#define __pcpu_cast_8(val) ((u64)(val))
+
+#define __pcpu_op1_1(op, dst) op "b " dst
+#define __pcpu_op1_2(op, dst) op "w " dst
+#define __pcpu_op1_4(op, dst) op "l " dst
+#define __pcpu_op1_8(op, dst) op "q " dst
+
+#define __pcpu_op2_1(op, src, dst) op "b " src ", " dst
+#define __pcpu_op2_2(op, src, dst) op "w " src ", " dst
+#define __pcpu_op2_4(op, src, dst) op "l " src ", " dst
+#define __pcpu_op2_8(op, src, dst) op "q " src ", " dst
+
+#define __pcpu_reg_1(mod, x) mod "q" (x)
+#define __pcpu_reg_2(mod, x) mod "r" (x)
+#define __pcpu_reg_4(mod, x) mod "r" (x)
+#define __pcpu_reg_8(mod, x) mod "r" (x)
+
+#define __pcpu_reg_imm_1(x) "qi" (x)
+#define __pcpu_reg_imm_2(x) "ri" (x)
+#define __pcpu_reg_imm_4(x) "ri" (x)
+#define __pcpu_reg_imm_8(x) "re" (x)
+
 #define percpu_to_op(qual, op, var, val)		\
 do {							\
 	typedef typeof(var) pto_T__;			\

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 11/11] x86: support i386 with Clang
  2020-07-23  9:17     ` Thomas Gleixner
@ 2020-07-23 11:06       ` Sedat Dilek
  2020-07-23 11:42         ` Arnd Bergmann
  0 siblings, 1 reply; 34+ messages in thread
From: Sedat Dilek @ 2020-07-23 11:06 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Nick Desaulniers, Ingo Molnar, Borislav Petkov, Dennis Zhou,
	Tejun Heo, Christoph Lameter, x86, H. Peter Anvin, Al Viro,
	Andrew Morton, Peter Zijlstra, linux-kernel,
	Clang-Built-Linux ML, Arnd Bergmann, David Woodhouse,
	Dmitry Golovin, Linus Torvalds

On Thu, Jul 23, 2020 at 11:17 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> Thomas Gleixner <tglx@linutronix.de> writes:
> > Nick Desaulniers <ndesaulniers@google.com> writes:
> >
> > I'm glad I looked myself at this.
> >
> >> We also don't want to swap the use of "=q" with "=r". For 64b, it
> >> doesn't matter. For 32b, it's possible that a 32b register without a 8b
> >> lower alias (i.e. ESI, EDI, EBP) is selected which the assembler will
> >> then reject.
> >
> > The above is really garbage.
> >
> > We don't want? It's simply not possible to do so, because ...
> >
> > 64b,32b,8b. For heavens sake is it too much asked to write a changelog
> > with understandable wording instead of ambiguous abbreviations?
> >
> > There is no maximum character limit for changelogs.
>
> Gah. Hit send too fast.
>
> >> With this, Clang can finally build an i386 defconfig.
>
> With what? I can't find anything which explains the solution at the
> conceptual level. Sigh.
>

Hi,

I have applied this patch-series v3 but some basics of "i386" usage
are not clear to me when I wanted to test it and give some feedback.

[1] is the original place in CBL where this was reported and I have
commented on this.

Beyond some old cruft in i386_defconfig like non-existent
"CONFIG_CRYPTO_AES_586" I have some fundamental questions:

What means "ARCH=i386" and where it is used (for)?

I can do:

$ ARCH=x86 make V=1 -j3 $MAKE_OPTS i386_defconfig
$ make V=1 -j3 $MAKE_OPTS i386_defconfig

...which results in the same .config.

Whereas when I do:

$ ARCH=i386 make V=1 -j3 $MAKE_OPTS i386_defconfig

...drops CONFIG_64BIT line entirely.

But "# CONFIG_64BIT is not set" is explicitly set in
arch/x86/configs/i386_defconfig but gets dropped.

Unsure if above is the same like:
$ ARCH=i386 make V=1 -j3 $MAKE_OPTS defconfig

When generating via "make ... i386_defconfig" modern gcc-9 and and a
snapshot version of clang-11 build both with:

$ ARCH=x86 make V=1 -j3 $MAKE_OPTS
... -march=i686 -mtune=generic ...

Checking generated .config reveals:

CONFIG_M686=y

So, I guess modern compilers do at least support "i686" as lowest CPU?

Doing some grep+ping:

$ git grep "ARCH=i386"
Documentation/kbuild/headers_install.rst:  make headers_install
ARCH=i386 INSTALL_HDR_PATH=/usr
tools/testing/ktest/examples/crosstests.conf:MAKE_CMD = make ARCH=i386
tools/testing/ktest/sample.conf:#MAKE_CMD = CC=i386-gcc AS=i386-as
make ARCH=i386

i386-gcc / i386-as - someone uses that?

Again my question (I did not do a diff):

$ make headers_install ARCH=i386 INSTALL_HDR_PATH=/usr
$ make headers_install ARCH=x86 INSTALL_HDR_PATH=/usr

...should generate the same?

To come back to "i386" again:

$ git grep i386 | grep ARCH

...reveals in top-level Makefile [2]:

376: # Additional ARCH settings for x86
377: ifeq ($(ARCH),i386)
378:        SRCARCH := x86

For me this means:

ARCH=i386 make ...
ARCH=x86 make ...

...should result in the same .config, but for what reason CONFIG_64BIT
is dropped when "ARCH=i386" is used.

Coming to a conclusion:

Nick D. says:
> I usually test with make ... i386_defconfig.

Can you enlighten a bit?

Of course, I can send a patch to remove the "CONFIG_CRYPTO_AES_586=y"
line from i386_defconfig.

Thanks.

Regards,
- Sedat -

[1] https://github.com/ClangBuiltLinux/linux/issues/194
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Makefile#n376

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 11/11] x86: support i386 with Clang
  2020-07-23 11:06       ` Sedat Dilek
@ 2020-07-23 11:42         ` Arnd Bergmann
  2020-07-23 13:14           ` Sedat Dilek
  0 siblings, 1 reply; 34+ messages in thread
From: Arnd Bergmann @ 2020-07-23 11:42 UTC (permalink / raw)
  To: Sedat Dilek
  Cc: Thomas Gleixner, Nick Desaulniers, Ingo Molnar, Borislav Petkov,
	Dennis Zhou, Tejun Heo, Christoph Lameter,
	the arch/x86 maintainers, H. Peter Anvin, Al Viro, Andrew Morton,
	Peter Zijlstra, linux-kernel, Clang-Built-Linux ML,
	David Woodhouse, Dmitry Golovin, Linus Torvalds

On Thu, Jul 23, 2020 at 1:07 PM Sedat Dilek <sedat.dilek@gmail.com> wrote:
> On Thu, Jul 23, 2020 at 11:17 AM Thomas Gleixner <tglx@linutronix.de> wrote:
> > Thomas Gleixner <tglx@linutronix.de> writes:
>
> I have applied this patch-series v3 but some basics of "i386" usage
> are not clear to me when I wanted to test it and give some feedback.
>
> [1] is the original place in CBL where this was reported and I have
> commented on this.
>
> Beyond some old cruft in i386_defconfig like non-existent
> "CONFIG_CRYPTO_AES_586" I have some fundamental questions:
>
> What means "ARCH=i386" and where it is used (for)?
>
> I can do:
>
> $ ARCH=x86 make V=1 -j3 $MAKE_OPTS i386_defconfig
> $ make V=1 -j3 $MAKE_OPTS i386_defconfig
>
> ...which results in the same .config.
>
> Whereas when I do:
>
> $ ARCH=i386 make V=1 -j3 $MAKE_OPTS i386_defconfig
>
> ...drops CONFIG_64BIT line entirely.
>
> But "# CONFIG_64BIT is not set" is explicitly set in
> arch/x86/configs/i386_defconfig but gets dropped.
>
> Unsure if above is the same like:
> $ ARCH=i386 make V=1 -j3 $MAKE_OPTS defconfig

The logic was introduced when arch/i386 and arch/x86_64 got
merged into arch/x86, to stay compatible with the original behavior
that would produce a 32-bit or 64-bit kernel depending on which
machine you are running on.

There are probably not a lot of people building kernels on 32-bit
machines any more (real 32-bit machines are really slow compared
to modern ones, and 64-bit machines running 32-bit distros usually
want a 64-bit kernel), so it could in theory be changed.

It will certainly break someone's workflow though, so nobody has
proposed actually changing it so far.

> When generating via "make ... i386_defconfig" modern gcc-9 and and a
> snapshot version of clang-11 build both with:
>
> $ ARCH=x86 make V=1 -j3 $MAKE_OPTS
> ... -march=i686 -mtune=generic ...
>
> Checking generated .config reveals:
>
> CONFIG_M686=y
>
> So, I guess modern compilers do at least support "i686" as lowest CPU?

i686 compiler support goes back to the 1990s, and the kernel now
requires at least gcc-4.9 from 2014, so yes.

> Nick D. says:
> > I usually test with make ... i386_defconfig.
>
> Can you enlighten a bit?
>
> Of course, I can send a patch to remove the "CONFIG_CRYPTO_AES_586=y"
> line from i386_defconfig.

The "i386" in i386_defconfig is just a synonym for x86-32, it does not
imply a particular CPU generation. The original i386 is no longer supported,
i486sx (barely) is and in practice most 32-bit Linux code gets compiled
for some variant of i586 or i686 variant but run on 64-bit hardware.

      Arnd

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 11/11] x86: support i386 with Clang
  2020-07-23 11:42         ` Arnd Bergmann
@ 2020-07-23 13:14           ` Sedat Dilek
  2020-07-23 13:55             ` Arnd Bergmann
  0 siblings, 1 reply; 34+ messages in thread
From: Sedat Dilek @ 2020-07-23 13:14 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Thomas Gleixner, Nick Desaulniers, Ingo Molnar, Borislav Petkov,
	Dennis Zhou, Tejun Heo, Christoph Lameter,
	the arch/x86 maintainers, H. Peter Anvin, Al Viro, Andrew Morton,
	Peter Zijlstra, linux-kernel, Clang-Built-Linux ML,
	David Woodhouse, Dmitry Golovin, Linus Torvalds

On Thu, Jul 23, 2020 at 1:42 PM Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Thu, Jul 23, 2020 at 1:07 PM Sedat Dilek <sedat.dilek@gmail.com> wrote:
> > On Thu, Jul 23, 2020 at 11:17 AM Thomas Gleixner <tglx@linutronix.de> wrote:
> > > Thomas Gleixner <tglx@linutronix.de> writes:
> >
> > I have applied this patch-series v3 but some basics of "i386" usage
> > are not clear to me when I wanted to test it and give some feedback.
> >
> > [1] is the original place in CBL where this was reported and I have
> > commented on this.
> >
> > Beyond some old cruft in i386_defconfig like non-existent
> > "CONFIG_CRYPTO_AES_586" I have some fundamental questions:
> >
> > What means "ARCH=i386" and where it is used (for)?
> >
> > I can do:
> >
> > $ ARCH=x86 make V=1 -j3 $MAKE_OPTS i386_defconfig
> > $ make V=1 -j3 $MAKE_OPTS i386_defconfig
> >
> > ...which results in the same .config.
> >
> > Whereas when I do:
> >
> > $ ARCH=i386 make V=1 -j3 $MAKE_OPTS i386_defconfig
> >
> > ...drops CONFIG_64BIT line entirely.
> >
> > But "# CONFIG_64BIT is not set" is explicitly set in
> > arch/x86/configs/i386_defconfig but gets dropped.
> >
> > Unsure if above is the same like:
> > $ ARCH=i386 make V=1 -j3 $MAKE_OPTS defconfig
>
> The logic was introduced when arch/i386 and arch/x86_64 got
> merged into arch/x86, to stay compatible with the original behavior
> that would produce a 32-bit or 64-bit kernel depending on which
> machine you are running on.
>
> There are probably not a lot of people building kernels on 32-bit
> machines any more (real 32-bit machines are really slow compared
> to modern ones, and 64-bit machines running 32-bit distros usually
> want a 64-bit kernel), so it could in theory be changed.
>
> It will certainly break someone's workflow though, so nobody has
> proposed actually changing it so far.
>
> > When generating via "make ... i386_defconfig" modern gcc-9 and and a
> > snapshot version of clang-11 build both with:
> >
> > $ ARCH=x86 make V=1 -j3 $MAKE_OPTS
> > ... -march=i686 -mtune=generic ...
> >
> > Checking generated .config reveals:
> >
> > CONFIG_M686=y
> >
> > So, I guess modern compilers do at least support "i686" as lowest CPU?
>
> i686 compiler support goes back to the 1990s, and the kernel now
> requires at least gcc-4.9 from 2014, so yes.
>
> > Nick D. says:
> > > I usually test with make ... i386_defconfig.
> >
> > Can you enlighten a bit?
> >
> > Of course, I can send a patch to remove the "CONFIG_CRYPTO_AES_586=y"
> > line from i386_defconfig.
>
> The "i386" in i386_defconfig is just a synonym for x86-32, it does not
> imply a particular CPU generation. The original i386 is no longer supported,
> i486sx (barely) is and in practice most 32-bit Linux code gets compiled
> for some variant of i586 or i686 variant but run on 64-bit hardware.
>

Thanks a lot Arnd for all the detailed informations.

A change of i386_defconfig to x86_defconfig will cause a big cry from
all kernel-bot maintainers :-).

- Sedat -

P.S.: CONFIG_64BIT
What I dropped by accident in my previous mail:
What happens when there is no CONFIG_64BIT line?
There exist explicit checks for (and "inverse") of CONFIG_64BIT like
"ifdef" and "ifndef" or any "defined(...)" and its opposite?
I remember I have seen checks for it in x86 tree.

- EOT -

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 11/11] x86: support i386 with Clang
  2020-07-23 13:14           ` Sedat Dilek
@ 2020-07-23 13:55             ` Arnd Bergmann
  2020-07-24 13:22               ` Sedat Dilek
  0 siblings, 1 reply; 34+ messages in thread
From: Arnd Bergmann @ 2020-07-23 13:55 UTC (permalink / raw)
  To: Sedat Dilek
  Cc: Thomas Gleixner, Nick Desaulniers, Ingo Molnar, Borislav Petkov,
	Dennis Zhou, Tejun Heo, Christoph Lameter,
	the arch/x86 maintainers, H. Peter Anvin, Al Viro, Andrew Morton,
	Peter Zijlstra, linux-kernel, Clang-Built-Linux ML,
	David Woodhouse, Dmitry Golovin, Linus Torvalds

On Thu, Jul 23, 2020 at 3:14 PM Sedat Dilek <sedat.dilek@gmail.com> wrote:
> What happens when there is no CONFIG_64BIT line?
> There exist explicit checks for (and "inverse") of CONFIG_64BIT like
> "ifdef" and "ifndef" or any "defined(...)" and its opposite?
> I remember I have seen checks for it in x86 tree.

As long as you consistently pass ARCH=i386 when running 'make',
nothing bad happens, as ARCH=i386 just hides that option.

If you run "make ARCH=i386 defconfig" followed by "make olddefconfig"
(without ARCH=i386) on a non-i386 machine, the absence of that
CONFIG_64BIT line will lead to the kernel going back to a 64-bit
configuration.

    Arnd

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 11/11] x86: support i386 with Clang
  2020-07-23 13:55             ` Arnd Bergmann
@ 2020-07-24 13:22               ` Sedat Dilek
  0 siblings, 0 replies; 34+ messages in thread
From: Sedat Dilek @ 2020-07-24 13:22 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Thomas Gleixner, Nick Desaulniers, Ingo Molnar, Borislav Petkov,
	Dennis Zhou, Tejun Heo, Christoph Lameter,
	the arch/x86 maintainers, H. Peter Anvin, Al Viro, Andrew Morton,
	Peter Zijlstra, linux-kernel, Clang-Built-Linux ML,
	David Woodhouse, Dmitry Golovin, Linus Torvalds

On Thu, Jul 23, 2020 at 3:56 PM Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Thu, Jul 23, 2020 at 3:14 PM Sedat Dilek <sedat.dilek@gmail.com> wrote:
> > What happens when there is no CONFIG_64BIT line?
> > There exist explicit checks for (and "inverse") of CONFIG_64BIT like
> > "ifdef" and "ifndef" or any "defined(...)" and its opposite?
> > I remember I have seen checks for it in x86 tree.
>
> As long as you consistently pass ARCH=i386 when running 'make',
> nothing bad happens, as ARCH=i386 just hides that option.
>
> If you run "make ARCH=i386 defconfig" followed by "make olddefconfig"
> (without ARCH=i386) on a non-i386 machine, the absence of that
> CONFIG_64BIT line will lead to the kernel going back to a 64-bit
> configuration.
>

Again thank you for your feedback.

Unsure if people are aware of the different behaviours and results.

That's why I keep the same make line with and without "defconfig".

Unfortunately, I had no opportunity to test the patchset :-(.

For testing I had done:
$ MAKE_OPTS="..."
$ ARCH=x86 make V=1 -j3 $MAKE_OPTS i386_defconfig (whereas V=1 and -j3
can be dropped of course)
$ ARCH=x86 make V=1 -j3 $MAKE_OPTS

Side-note:
How wonderful my patch "x86/defconfigs: Remove CONFIG_CRYPTO_AES_586
from i386_defconfig" landed in <tip.git#x86/build>.

- Sedat -

[1] https://git.kernel.org/tip/tip/c/6526b12de07588253a52577f42ec99fc7ca26a1f

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2020-07-24 13:22 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-20 20:49 [PATCH v3 00/11] i386 Clang support Nick Desaulniers
2020-07-20 20:49 ` [PATCH v3 01/11] x86/percpu: Introduce size abstraction macros Nick Desaulniers
2020-07-23 10:43   ` [tip: x86/asm] " tip-bot2 for Brian Gerst
2020-07-20 20:49 ` [PATCH v3 02/11] x86/percpu: Clean up percpu_to_op() Nick Desaulniers
2020-07-23 10:43   ` [tip: x86/asm] " tip-bot2 for Brian Gerst
2020-07-20 20:49 ` [PATCH v3 03/11] x86/percpu: Clean up percpu_from_op() Nick Desaulniers
2020-07-23 10:43   ` [tip: x86/asm] " tip-bot2 for Brian Gerst
2020-07-20 20:49 ` [PATCH v3 04/11] x86/percpu: Clean up percpu_add_op() Nick Desaulniers
2020-07-23 10:43   ` [tip: x86/asm] " tip-bot2 for Brian Gerst
2020-07-20 20:49 ` [PATCH v3 05/11] x86/percpu: Remove "e" constraint from XADD Nick Desaulniers
2020-07-23 10:43   ` [tip: x86/asm] " tip-bot2 for Brian Gerst
2020-07-20 20:49 ` [PATCH v3 06/11] x86/percpu: Clean up percpu_add_return_op() Nick Desaulniers
2020-07-23 10:43   ` [tip: x86/asm] " tip-bot2 for Brian Gerst
2020-07-20 20:49 ` [PATCH v3 07/11] x86/percpu: Clean up percpu_xchg_op() Nick Desaulniers
2020-07-23 10:43   ` [tip: x86/asm] " tip-bot2 for Brian Gerst
2020-07-20 20:49 ` [PATCH v3 08/11] x86/percpu: Clean up percpu_cmpxchg_op() Nick Desaulniers
2020-07-23 10:43   ` [tip: x86/asm] " tip-bot2 for Brian Gerst
2020-07-20 20:49 ` [PATCH v3 09/11] x86/percpu: Clean up percpu_stable_op() Nick Desaulniers
2020-07-23 10:43   ` [tip: x86/asm] " tip-bot2 for Brian Gerst
2020-07-20 20:49 ` [PATCH v3 10/11] x86/percpu: Remove unused PER_CPU() macro Nick Desaulniers
2020-07-23 10:43   ` [tip: x86/asm] " tip-bot2 for Brian Gerst
2020-07-20 20:49 ` [PATCH v3 11/11] x86: support i386 with Clang Nick Desaulniers
2020-07-23  9:14   ` Thomas Gleixner
2020-07-23  9:17     ` Thomas Gleixner
2020-07-23 11:06       ` Sedat Dilek
2020-07-23 11:42         ` Arnd Bergmann
2020-07-23 13:14           ` Sedat Dilek
2020-07-23 13:55             ` Arnd Bergmann
2020-07-24 13:22               ` Sedat Dilek
2020-07-23 10:43   ` [tip: x86/asm] x86/uaccess: Make __get_user_size() Clang compliant on 32-bit tip-bot2 for Nick Desaulniers
2020-07-21  8:09 ` [PATCH v3 00/11] i386 Clang support Sedat Dilek
2020-07-21 22:27 ` Dennis Zhou
2020-07-22 23:08   ` Thomas Gleixner
2020-07-22 23:25     ` Dennis Zhou

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).