linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/7] x86: Clean up percpu operations
@ 2020-05-17 15:29 Brian Gerst
  2020-05-17 15:29 ` [PATCH 1/7] x86/percpu: Introduce size abstraction macros Brian Gerst
                   ` (7 more replies)
  0 siblings, 8 replies; 24+ messages in thread
From: Brian Gerst @ 2020-05-17 15:29 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin,
	Andy Lutomirski, Peter Zijlstra, Nick Desaulniers, Brian Gerst

The core percpu operations already have a switch on the width of the
data type, which resulted in an extra amount of dead code being
generated with the x86 operations having another switch.  This patch set
rewrites the x86 ops to remove the switch.  Additional cleanups are to
use named assembly operands, and to cast variables to the width used in
the assembly to make Clang happy.

Brian Gerst (7):
  x86/percpu: Introduce size abstraction macros
  x86/percpu: Clean up percpu_to_op()
  x86/percpu: Clean up percpu_from_op()
  x86/percpu: Clean up percpu_add_op()
  x86/percpu: Clean up percpu_add_return_op()
  x86/percpu: Clean up percpu_xchg_op()
  x86/percpu: Clean up percpu_cmpxchg_op()

 arch/x86/include/asm/percpu.h | 447 ++++++++++++----------------------
 1 file changed, 158 insertions(+), 289 deletions(-)

-- 
2.25.4


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/7] x86/percpu: Introduce size abstraction macros
  2020-05-17 15:29 [PATCH 0/7] x86: Clean up percpu operations Brian Gerst
@ 2020-05-17 15:29 ` Brian Gerst
  2020-05-18 23:48   ` Nick Desaulniers
  2020-05-17 15:29 ` [PATCH 2/7] x86/percpu: Clean up percpu_to_op() Brian Gerst
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 24+ messages in thread
From: Brian Gerst @ 2020-05-17 15:29 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin,
	Andy Lutomirski, Peter Zijlstra, Nick Desaulniers, Brian Gerst

In preparation for cleaning up the percpu operations, define macros for
abstraction based on the width of the operation.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
---
 arch/x86/include/asm/percpu.h | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index 2278797c769d..89f918a3e99b 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -87,6 +87,36 @@
  * don't give an lvalue though). */
 extern void __bad_percpu_size(void);
 
+#define __pcpu_type_1 u8
+#define __pcpu_type_2 u16
+#define __pcpu_type_4 u32
+#define __pcpu_type_8 u64
+
+#define __pcpu_cast_1(val) ((u8)((unsigned long) val))
+#define __pcpu_cast_2(val) ((u16)((unsigned long) val))
+#define __pcpu_cast_4(val) ((u32)((unsigned long) val))
+#define __pcpu_cast_8(val) ((u64)(val))
+
+#define __pcpu_op1_1(op, dst) op "b " dst
+#define __pcpu_op1_2(op, dst) op "w " dst
+#define __pcpu_op1_4(op, dst) op "l " dst
+#define __pcpu_op1_8(op, dst) op "q " dst
+
+#define __pcpu_op2_1(op, src, dst) op "b " src ", " dst
+#define __pcpu_op2_2(op, src, dst) op "w " src ", " dst
+#define __pcpu_op2_4(op, src, dst) op "l " src ", " dst
+#define __pcpu_op2_8(op, src, dst) op "q " src ", " dst
+
+#define __pcpu_reg_1(out, x) out "q" (x)
+#define __pcpu_reg_2(out, x) out "r" (x)
+#define __pcpu_reg_4(out, x) out "r" (x)
+#define __pcpu_reg_8(out, x) out "r" (x)
+
+#define __pcpu_reg_imm_1(x) "qi" (x)
+#define __pcpu_reg_imm_2(x) "ri" (x)
+#define __pcpu_reg_imm_4(x) "ri" (x)
+#define __pcpu_reg_imm_8(x) "re" (x)
+
 #define percpu_to_op(qual, op, var, val)		\
 do {							\
 	typedef typeof(var) pto_T__;			\
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 2/7] x86/percpu: Clean up percpu_to_op()
  2020-05-17 15:29 [PATCH 0/7] x86: Clean up percpu operations Brian Gerst
  2020-05-17 15:29 ` [PATCH 1/7] x86/percpu: Introduce size abstraction macros Brian Gerst
@ 2020-05-17 15:29 ` Brian Gerst
  2020-05-18 21:15   ` Nick Desaulniers
  2020-05-17 15:29 ` [PATCH 3/7] x86/percpu: Clean up percpu_from_op() Brian Gerst
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 24+ messages in thread
From: Brian Gerst @ 2020-05-17 15:29 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin,
	Andy Lutomirski, Peter Zijlstra, Nick Desaulniers, Brian Gerst

The core percpu macros already have a switch on the data size, so the switch
in the x86 code is redundant and produces more dead code.

Also use appropriate types for the width of the instructions.  This avoids
errors when compiling with Clang.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
---
 arch/x86/include/asm/percpu.h | 90 ++++++++++++++---------------------
 1 file changed, 35 insertions(+), 55 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index 89f918a3e99b..233c7a78d1a6 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -117,37 +117,17 @@ extern void __bad_percpu_size(void);
 #define __pcpu_reg_imm_4(x) "ri" (x)
 #define __pcpu_reg_imm_8(x) "re" (x)
 
-#define percpu_to_op(qual, op, var, val)		\
-do {							\
-	typedef typeof(var) pto_T__;			\
-	if (0) {					\
-		pto_T__ pto_tmp__;			\
-		pto_tmp__ = (val);			\
-		(void)pto_tmp__;			\
-	}						\
-	switch (sizeof(var)) {				\
-	case 1:						\
-		asm qual (op "b %1,"__percpu_arg(0)	\
-		    : "+m" (var)			\
-		    : "qi" ((pto_T__)(val)));		\
-		break;					\
-	case 2:						\
-		asm qual (op "w %1,"__percpu_arg(0)	\
-		    : "+m" (var)			\
-		    : "ri" ((pto_T__)(val)));		\
-		break;					\
-	case 4:						\
-		asm qual (op "l %1,"__percpu_arg(0)	\
-		    : "+m" (var)			\
-		    : "ri" ((pto_T__)(val)));		\
-		break;					\
-	case 8:						\
-		asm qual (op "q %1,"__percpu_arg(0)	\
-		    : "+m" (var)			\
-		    : "re" ((pto_T__)(val)));		\
-		break;					\
-	default: __bad_percpu_size();			\
-	}						\
+#define percpu_to_op(size, qual, op, _var, _val)			\
+do {									\
+	__pcpu_type_##size pto_val__ = __pcpu_cast_##size(_val);	\
+	if (0) {		                                        \
+		typeof(_var) pto_tmp__;					\
+		pto_tmp__ = (_val);					\
+		(void)pto_tmp__;					\
+	}								\
+	asm qual(__pcpu_op2_##size(op, "%[val]", __percpu_arg([var]))	\
+	    : [var] "+m" (_var)						\
+	    : [val] __pcpu_reg_imm_##size(pto_val__));			\
 } while (0)
 
 /*
@@ -425,18 +405,18 @@ do {									\
 #define raw_cpu_read_2(pcp)		percpu_from_op(, "mov", pcp)
 #define raw_cpu_read_4(pcp)		percpu_from_op(, "mov", pcp)
 
-#define raw_cpu_write_1(pcp, val)	percpu_to_op(, "mov", (pcp), val)
-#define raw_cpu_write_2(pcp, val)	percpu_to_op(, "mov", (pcp), val)
-#define raw_cpu_write_4(pcp, val)	percpu_to_op(, "mov", (pcp), val)
+#define raw_cpu_write_1(pcp, val)	percpu_to_op(1, , "mov", (pcp), val)
+#define raw_cpu_write_2(pcp, val)	percpu_to_op(2, , "mov", (pcp), val)
+#define raw_cpu_write_4(pcp, val)	percpu_to_op(4, , "mov", (pcp), val)
 #define raw_cpu_add_1(pcp, val)		percpu_add_op(, (pcp), val)
 #define raw_cpu_add_2(pcp, val)		percpu_add_op(, (pcp), val)
 #define raw_cpu_add_4(pcp, val)		percpu_add_op(, (pcp), val)
-#define raw_cpu_and_1(pcp, val)		percpu_to_op(, "and", (pcp), val)
-#define raw_cpu_and_2(pcp, val)		percpu_to_op(, "and", (pcp), val)
-#define raw_cpu_and_4(pcp, val)		percpu_to_op(, "and", (pcp), val)
-#define raw_cpu_or_1(pcp, val)		percpu_to_op(, "or", (pcp), val)
-#define raw_cpu_or_2(pcp, val)		percpu_to_op(, "or", (pcp), val)
-#define raw_cpu_or_4(pcp, val)		percpu_to_op(, "or", (pcp), val)
+#define raw_cpu_and_1(pcp, val)		percpu_to_op(1, , "and", (pcp), val)
+#define raw_cpu_and_2(pcp, val)		percpu_to_op(2, , "and", (pcp), val)
+#define raw_cpu_and_4(pcp, val)		percpu_to_op(4, , "and", (pcp), val)
+#define raw_cpu_or_1(pcp, val)		percpu_to_op(1, , "or", (pcp), val)
+#define raw_cpu_or_2(pcp, val)		percpu_to_op(2, , "or", (pcp), val)
+#define raw_cpu_or_4(pcp, val)		percpu_to_op(4, , "or", (pcp), val)
 
 /*
  * raw_cpu_xchg() can use a load-store since it is not required to be
@@ -456,18 +436,18 @@ do {									\
 #define this_cpu_read_1(pcp)		percpu_from_op(volatile, "mov", pcp)
 #define this_cpu_read_2(pcp)		percpu_from_op(volatile, "mov", pcp)
 #define this_cpu_read_4(pcp)		percpu_from_op(volatile, "mov", pcp)
-#define this_cpu_write_1(pcp, val)	percpu_to_op(volatile, "mov", (pcp), val)
-#define this_cpu_write_2(pcp, val)	percpu_to_op(volatile, "mov", (pcp), val)
-#define this_cpu_write_4(pcp, val)	percpu_to_op(volatile, "mov", (pcp), val)
+#define this_cpu_write_1(pcp, val)	percpu_to_op(1, volatile, "mov", (pcp), val)
+#define this_cpu_write_2(pcp, val)	percpu_to_op(2, volatile, "mov", (pcp), val)
+#define this_cpu_write_4(pcp, val)	percpu_to_op(4, volatile, "mov", (pcp), val)
 #define this_cpu_add_1(pcp, val)	percpu_add_op(volatile, (pcp), val)
 #define this_cpu_add_2(pcp, val)	percpu_add_op(volatile, (pcp), val)
 #define this_cpu_add_4(pcp, val)	percpu_add_op(volatile, (pcp), val)
-#define this_cpu_and_1(pcp, val)	percpu_to_op(volatile, "and", (pcp), val)
-#define this_cpu_and_2(pcp, val)	percpu_to_op(volatile, "and", (pcp), val)
-#define this_cpu_and_4(pcp, val)	percpu_to_op(volatile, "and", (pcp), val)
-#define this_cpu_or_1(pcp, val)		percpu_to_op(volatile, "or", (pcp), val)
-#define this_cpu_or_2(pcp, val)		percpu_to_op(volatile, "or", (pcp), val)
-#define this_cpu_or_4(pcp, val)		percpu_to_op(volatile, "or", (pcp), val)
+#define this_cpu_and_1(pcp, val)	percpu_to_op(1, volatile, "and", (pcp), val)
+#define this_cpu_and_2(pcp, val)	percpu_to_op(2, volatile, "and", (pcp), val)
+#define this_cpu_and_4(pcp, val)	percpu_to_op(4, volatile, "and", (pcp), val)
+#define this_cpu_or_1(pcp, val)		percpu_to_op(1, volatile, "or", (pcp), val)
+#define this_cpu_or_2(pcp, val)		percpu_to_op(2, volatile, "or", (pcp), val)
+#define this_cpu_or_4(pcp, val)		percpu_to_op(4, volatile, "or", (pcp), val)
 #define this_cpu_xchg_1(pcp, nval)	percpu_xchg_op(volatile, pcp, nval)
 #define this_cpu_xchg_2(pcp, nval)	percpu_xchg_op(volatile, pcp, nval)
 #define this_cpu_xchg_4(pcp, nval)	percpu_xchg_op(volatile, pcp, nval)
@@ -509,19 +489,19 @@ do {									\
  */
 #ifdef CONFIG_X86_64
 #define raw_cpu_read_8(pcp)			percpu_from_op(, "mov", pcp)
-#define raw_cpu_write_8(pcp, val)		percpu_to_op(, "mov", (pcp), val)
+#define raw_cpu_write_8(pcp, val)		percpu_to_op(8, , "mov", (pcp), val)
 #define raw_cpu_add_8(pcp, val)			percpu_add_op(, (pcp), val)
-#define raw_cpu_and_8(pcp, val)			percpu_to_op(, "and", (pcp), val)
-#define raw_cpu_or_8(pcp, val)			percpu_to_op(, "or", (pcp), val)
+#define raw_cpu_and_8(pcp, val)			percpu_to_op(8, , "and", (pcp), val)
+#define raw_cpu_or_8(pcp, val)			percpu_to_op(8, , "or", (pcp), val)
 #define raw_cpu_add_return_8(pcp, val)		percpu_add_return_op(, pcp, val)
 #define raw_cpu_xchg_8(pcp, nval)		raw_percpu_xchg_op(pcp, nval)
 #define raw_cpu_cmpxchg_8(pcp, oval, nval)	percpu_cmpxchg_op(, pcp, oval, nval)
 
 #define this_cpu_read_8(pcp)			percpu_from_op(volatile, "mov", pcp)
-#define this_cpu_write_8(pcp, val)		percpu_to_op(volatile, "mov", (pcp), val)
+#define this_cpu_write_8(pcp, val)		percpu_to_op(8, volatile, "mov", (pcp), val)
 #define this_cpu_add_8(pcp, val)		percpu_add_op(volatile, (pcp), val)
-#define this_cpu_and_8(pcp, val)		percpu_to_op(volatile, "and", (pcp), val)
-#define this_cpu_or_8(pcp, val)			percpu_to_op(volatile, "or", (pcp), val)
+#define this_cpu_and_8(pcp, val)		percpu_to_op(8, volatile, "and", (pcp), val)
+#define this_cpu_or_8(pcp, val)			percpu_to_op(8, volatile, "or", (pcp), val)
 #define this_cpu_add_return_8(pcp, val)		percpu_add_return_op(volatile, pcp, val)
 #define this_cpu_xchg_8(pcp, nval)		percpu_xchg_op(volatile, pcp, nval)
 #define this_cpu_cmpxchg_8(pcp, oval, nval)	percpu_cmpxchg_op(volatile, pcp, oval, nval)
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 3/7] x86/percpu: Clean up percpu_from_op()
  2020-05-17 15:29 [PATCH 0/7] x86: Clean up percpu operations Brian Gerst
  2020-05-17 15:29 ` [PATCH 1/7] x86/percpu: Introduce size abstraction macros Brian Gerst
  2020-05-17 15:29 ` [PATCH 2/7] x86/percpu: Clean up percpu_to_op() Brian Gerst
@ 2020-05-17 15:29 ` Brian Gerst
  2020-05-18 21:27   ` Nick Desaulniers
  2020-05-17 15:29 ` [PATCH 4/7] x86/percpu: Clean up percpu_add_op() Brian Gerst
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 24+ messages in thread
From: Brian Gerst @ 2020-05-17 15:29 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin,
	Andy Lutomirski, Peter Zijlstra, Nick Desaulniers, Brian Gerst

The core percpu macros already have a switch on the data size, so the switch
in the x86 code is redundant and produces more dead code.

Also use appropriate types for the width of the instructions.  This avoids
errors when compiling with Clang.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
---
 arch/x86/include/asm/percpu.h | 50 +++++++++++------------------------
 1 file changed, 15 insertions(+), 35 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index 233c7a78d1a6..93f33202492d 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -190,33 +190,13 @@ do {									\
 	}								\
 } while (0)
 
-#define percpu_from_op(qual, op, var)			\
-({							\
-	typeof(var) pfo_ret__;				\
-	switch (sizeof(var)) {				\
-	case 1:						\
-		asm qual (op "b "__percpu_arg(1)",%0"	\
-		    : "=q" (pfo_ret__)			\
-		    : "m" (var));			\
-		break;					\
-	case 2:						\
-		asm qual (op "w "__percpu_arg(1)",%0"	\
-		    : "=r" (pfo_ret__)			\
-		    : "m" (var));			\
-		break;					\
-	case 4:						\
-		asm qual (op "l "__percpu_arg(1)",%0"	\
-		    : "=r" (pfo_ret__)			\
-		    : "m" (var));			\
-		break;					\
-	case 8:						\
-		asm qual (op "q "__percpu_arg(1)",%0"	\
-		    : "=r" (pfo_ret__)			\
-		    : "m" (var));			\
-		break;					\
-	default: __bad_percpu_size();			\
-	}						\
-	pfo_ret__;					\
+#define percpu_from_op(size, qual, op, _var)				\
+({									\
+	__pcpu_type_##size pfo_val__;					\
+	asm qual (__pcpu_op2_##size(op, __percpu_arg([var]), "%[val]")	\
+	    : [val] __pcpu_reg_##size("=", pfo_val__)			\
+	    : [var] "m" (_var));					\
+	(typeof(_var))(unsigned long) pfo_val__;			\
 })
 
 #define percpu_stable_op(op, var)			\
@@ -401,9 +381,9 @@ do {									\
  */
 #define this_cpu_read_stable(var)	percpu_stable_op("mov", var)
 
-#define raw_cpu_read_1(pcp)		percpu_from_op(, "mov", pcp)
-#define raw_cpu_read_2(pcp)		percpu_from_op(, "mov", pcp)
-#define raw_cpu_read_4(pcp)		percpu_from_op(, "mov", pcp)
+#define raw_cpu_read_1(pcp)		percpu_from_op(1, , "mov", pcp)
+#define raw_cpu_read_2(pcp)		percpu_from_op(2, , "mov", pcp)
+#define raw_cpu_read_4(pcp)		percpu_from_op(4, , "mov", pcp)
 
 #define raw_cpu_write_1(pcp, val)	percpu_to_op(1, , "mov", (pcp), val)
 #define raw_cpu_write_2(pcp, val)	percpu_to_op(2, , "mov", (pcp), val)
@@ -433,9 +413,9 @@ do {									\
 #define raw_cpu_xchg_2(pcp, val)	raw_percpu_xchg_op(pcp, val)
 #define raw_cpu_xchg_4(pcp, val)	raw_percpu_xchg_op(pcp, val)
 
-#define this_cpu_read_1(pcp)		percpu_from_op(volatile, "mov", pcp)
-#define this_cpu_read_2(pcp)		percpu_from_op(volatile, "mov", pcp)
-#define this_cpu_read_4(pcp)		percpu_from_op(volatile, "mov", pcp)
+#define this_cpu_read_1(pcp)		percpu_from_op(1, volatile, "mov", pcp)
+#define this_cpu_read_2(pcp)		percpu_from_op(2, volatile, "mov", pcp)
+#define this_cpu_read_4(pcp)		percpu_from_op(4, volatile, "mov", pcp)
 #define this_cpu_write_1(pcp, val)	percpu_to_op(1, volatile, "mov", (pcp), val)
 #define this_cpu_write_2(pcp, val)	percpu_to_op(2, volatile, "mov", (pcp), val)
 #define this_cpu_write_4(pcp, val)	percpu_to_op(4, volatile, "mov", (pcp), val)
@@ -488,7 +468,7 @@ do {									\
  * 32 bit must fall back to generic operations.
  */
 #ifdef CONFIG_X86_64
-#define raw_cpu_read_8(pcp)			percpu_from_op(, "mov", pcp)
+#define raw_cpu_read_8(pcp)			percpu_from_op(8, , "mov", pcp)
 #define raw_cpu_write_8(pcp, val)		percpu_to_op(8, , "mov", (pcp), val)
 #define raw_cpu_add_8(pcp, val)			percpu_add_op(, (pcp), val)
 #define raw_cpu_and_8(pcp, val)			percpu_to_op(8, , "and", (pcp), val)
@@ -497,7 +477,7 @@ do {									\
 #define raw_cpu_xchg_8(pcp, nval)		raw_percpu_xchg_op(pcp, nval)
 #define raw_cpu_cmpxchg_8(pcp, oval, nval)	percpu_cmpxchg_op(, pcp, oval, nval)
 
-#define this_cpu_read_8(pcp)			percpu_from_op(volatile, "mov", pcp)
+#define this_cpu_read_8(pcp)			percpu_from_op(8, volatile, "mov", pcp)
 #define this_cpu_write_8(pcp, val)		percpu_to_op(8, volatile, "mov", (pcp), val)
 #define this_cpu_add_8(pcp, val)		percpu_add_op(volatile, (pcp), val)
 #define this_cpu_and_8(pcp, val)		percpu_to_op(8, volatile, "and", (pcp), val)
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 4/7] x86/percpu: Clean up percpu_add_op()
  2020-05-17 15:29 [PATCH 0/7] x86: Clean up percpu operations Brian Gerst
                   ` (2 preceding siblings ...)
  2020-05-17 15:29 ` [PATCH 3/7] x86/percpu: Clean up percpu_from_op() Brian Gerst
@ 2020-05-17 15:29 ` Brian Gerst
  2020-05-17 19:18   ` kbuild test robot
                     ` (2 more replies)
  2020-05-17 15:29 ` [PATCH 5/7] x86/percpu: Clean up percpu_add_return_op() Brian Gerst
                   ` (3 subsequent siblings)
  7 siblings, 3 replies; 24+ messages in thread
From: Brian Gerst @ 2020-05-17 15:29 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin,
	Andy Lutomirski, Peter Zijlstra, Nick Desaulniers, Brian Gerst

The core percpu macros already have a switch on the data size, so the switch
in the x86 code is redundant and produces more dead code.

Also use appropriate types for the width of the instructions.  This avoids
errors when compiling with Clang.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
---
 arch/x86/include/asm/percpu.h | 99 ++++++++---------------------------
 1 file changed, 22 insertions(+), 77 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index 93f33202492d..21c5013a681a 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -130,64 +130,32 @@ do {									\
 	    : [val] __pcpu_reg_imm_##size(pto_val__));			\
 } while (0)
 
+#define percpu_unary_op(size, qual, op, _var)				\
+({									\
+	asm qual (__pcpu_op1_##size(op, __percpu_arg([var]))		\
+	    : [var] "+m" (_var));					\
+})
+
 /*
  * Generate a percpu add to memory instruction and optimize code
  * if one is added or subtracted.
  */
-#define percpu_add_op(qual, var, val)					\
+#define percpu_add_op(size, qual, var, val)				\
 do {									\
-	typedef typeof(var) pao_T__;					\
 	const int pao_ID__ = (__builtin_constant_p(val) &&		\
 			      ((val) == 1 || (val) == -1)) ?		\
 				(int)(val) : 0;				\
 	if (0) {							\
-		pao_T__ pao_tmp__;					\
+		typeof(var) pao_tmp__;					\
 		pao_tmp__ = (val);					\
 		(void)pao_tmp__;					\
 	}								\
-	switch (sizeof(var)) {						\
-	case 1:								\
-		if (pao_ID__ == 1)					\
-			asm qual ("incb "__percpu_arg(0) : "+m" (var));	\
-		else if (pao_ID__ == -1)				\
-			asm qual ("decb "__percpu_arg(0) : "+m" (var));	\
-		else							\
-			asm qual ("addb %1, "__percpu_arg(0)		\
-			    : "+m" (var)				\
-			    : "qi" ((pao_T__)(val)));			\
-		break;							\
-	case 2:								\
-		if (pao_ID__ == 1)					\
-			asm qual ("incw "__percpu_arg(0) : "+m" (var));	\
-		else if (pao_ID__ == -1)				\
-			asm qual ("decw "__percpu_arg(0) : "+m" (var));	\
-		else							\
-			asm qual ("addw %1, "__percpu_arg(0)		\
-			    : "+m" (var)				\
-			    : "ri" ((pao_T__)(val)));			\
-		break;							\
-	case 4:								\
-		if (pao_ID__ == 1)					\
-			asm qual ("incl "__percpu_arg(0) : "+m" (var));	\
-		else if (pao_ID__ == -1)				\
-			asm qual ("decl "__percpu_arg(0) : "+m" (var));	\
-		else							\
-			asm qual ("addl %1, "__percpu_arg(0)		\
-			    : "+m" (var)				\
-			    : "ri" ((pao_T__)(val)));			\
-		break;							\
-	case 8:								\
-		if (pao_ID__ == 1)					\
-			asm qual ("incq "__percpu_arg(0) : "+m" (var));	\
-		else if (pao_ID__ == -1)				\
-			asm qual ("decq "__percpu_arg(0) : "+m" (var));	\
-		else							\
-			asm qual ("addq %1, "__percpu_arg(0)		\
-			    : "+m" (var)				\
-			    : "re" ((pao_T__)(val)));			\
-		break;							\
-	default: __bad_percpu_size();					\
-	}								\
+	if (pao_ID__ == 1)						\
+		percpu_unary_op(size, qual, "inc", var);		\
+	else if (pao_ID__ == -1)					\
+		percpu_unary_op(size, qual, "dec", var);		\
+	else								\
+		percpu_to_op(size, qual, "add", var, val);		\
 } while (0)
 
 #define percpu_from_op(size, qual, op, _var)				\
@@ -228,29 +196,6 @@ do {									\
 	pfo_ret__;					\
 })
 
-#define percpu_unary_op(qual, op, var)			\
-({							\
-	switch (sizeof(var)) {				\
-	case 1:						\
-		asm qual (op "b "__percpu_arg(0)	\
-		    : "+m" (var));			\
-		break;					\
-	case 2:						\
-		asm qual (op "w "__percpu_arg(0)	\
-		    : "+m" (var));			\
-		break;					\
-	case 4:						\
-		asm qual (op "l "__percpu_arg(0)	\
-		    : "+m" (var));			\
-		break;					\
-	case 8:						\
-		asm qual (op "q "__percpu_arg(0)	\
-		    : "+m" (var));			\
-		break;					\
-	default: __bad_percpu_size();			\
-	}						\
-})
-
 /*
  * Add return operation
  */
@@ -388,9 +333,9 @@ do {									\
 #define raw_cpu_write_1(pcp, val)	percpu_to_op(1, , "mov", (pcp), val)
 #define raw_cpu_write_2(pcp, val)	percpu_to_op(2, , "mov", (pcp), val)
 #define raw_cpu_write_4(pcp, val)	percpu_to_op(4, , "mov", (pcp), val)
-#define raw_cpu_add_1(pcp, val)		percpu_add_op(, (pcp), val)
-#define raw_cpu_add_2(pcp, val)		percpu_add_op(, (pcp), val)
-#define raw_cpu_add_4(pcp, val)		percpu_add_op(, (pcp), val)
+#define raw_cpu_add_1(pcp, val)		percpu_add_op(1, , (pcp), val)
+#define raw_cpu_add_2(pcp, val)		percpu_add_op(2, , (pcp), val)
+#define raw_cpu_add_4(pcp, val)		percpu_add_op(4, , (pcp), val)
 #define raw_cpu_and_1(pcp, val)		percpu_to_op(1, , "and", (pcp), val)
 #define raw_cpu_and_2(pcp, val)		percpu_to_op(2, , "and", (pcp), val)
 #define raw_cpu_and_4(pcp, val)		percpu_to_op(4, , "and", (pcp), val)
@@ -419,9 +364,9 @@ do {									\
 #define this_cpu_write_1(pcp, val)	percpu_to_op(1, volatile, "mov", (pcp), val)
 #define this_cpu_write_2(pcp, val)	percpu_to_op(2, volatile, "mov", (pcp), val)
 #define this_cpu_write_4(pcp, val)	percpu_to_op(4, volatile, "mov", (pcp), val)
-#define this_cpu_add_1(pcp, val)	percpu_add_op(volatile, (pcp), val)
-#define this_cpu_add_2(pcp, val)	percpu_add_op(volatile, (pcp), val)
-#define this_cpu_add_4(pcp, val)	percpu_add_op(volatile, (pcp), val)
+#define this_cpu_add_1(pcp, val)	percpu_add_op(1, volatile, (pcp), val)
+#define this_cpu_add_2(pcp, val)	percpu_add_op(2, volatile, (pcp), val)
+#define this_cpu_add_4(pcp, val)	percpu_add_op(4, volatile, (pcp), val)
 #define this_cpu_and_1(pcp, val)	percpu_to_op(1, volatile, "and", (pcp), val)
 #define this_cpu_and_2(pcp, val)	percpu_to_op(2, volatile, "and", (pcp), val)
 #define this_cpu_and_4(pcp, val)	percpu_to_op(4, volatile, "and", (pcp), val)
@@ -470,7 +415,7 @@ do {									\
 #ifdef CONFIG_X86_64
 #define raw_cpu_read_8(pcp)			percpu_from_op(8, , "mov", pcp)
 #define raw_cpu_write_8(pcp, val)		percpu_to_op(8, , "mov", (pcp), val)
-#define raw_cpu_add_8(pcp, val)			percpu_add_op(, (pcp), val)
+#define raw_cpu_add_8(pcp, val)			percpu_add_op(8, , (pcp), val)
 #define raw_cpu_and_8(pcp, val)			percpu_to_op(8, , "and", (pcp), val)
 #define raw_cpu_or_8(pcp, val)			percpu_to_op(8, , "or", (pcp), val)
 #define raw_cpu_add_return_8(pcp, val)		percpu_add_return_op(, pcp, val)
@@ -479,7 +424,7 @@ do {									\
 
 #define this_cpu_read_8(pcp)			percpu_from_op(8, volatile, "mov", pcp)
 #define this_cpu_write_8(pcp, val)		percpu_to_op(8, volatile, "mov", (pcp), val)
-#define this_cpu_add_8(pcp, val)		percpu_add_op(volatile, (pcp), val)
+#define this_cpu_add_8(pcp, val)		percpu_add_op(8, volatile, (pcp), val)
 #define this_cpu_and_8(pcp, val)		percpu_to_op(8, volatile, "and", (pcp), val)
 #define this_cpu_or_8(pcp, val)			percpu_to_op(8, volatile, "or", (pcp), val)
 #define this_cpu_add_return_8(pcp, val)		percpu_add_return_op(volatile, pcp, val)
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 5/7] x86/percpu: Clean up percpu_add_return_op()
  2020-05-17 15:29 [PATCH 0/7] x86: Clean up percpu operations Brian Gerst
                   ` (3 preceding siblings ...)
  2020-05-17 15:29 ` [PATCH 4/7] x86/percpu: Clean up percpu_add_op() Brian Gerst
@ 2020-05-17 15:29 ` Brian Gerst
  2020-05-18 22:46   ` Nick Desaulniers
  2020-05-17 15:29 ` [PATCH 6/7] x86/percpu: Clean up percpu_xchg_op() Brian Gerst
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 24+ messages in thread
From: Brian Gerst @ 2020-05-17 15:29 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin,
	Andy Lutomirski, Peter Zijlstra, Nick Desaulniers, Brian Gerst

The core percpu macros already have a switch on the data size, so the switch
in the x86 code is redundant and produces more dead code.

Also use appropriate types for the width of the instructions.  This avoids
errors when compiling with Clang.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
---
 arch/x86/include/asm/percpu.h | 51 +++++++++++------------------------
 1 file changed, 16 insertions(+), 35 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index 21c5013a681a..ac8c391a190e 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -199,34 +199,15 @@ do {									\
 /*
  * Add return operation
  */
-#define percpu_add_return_op(qual, var, val)				\
+#define percpu_add_return_op(size, qual, _var, _val)			\
 ({									\
-	typeof(var) paro_ret__ = val;					\
-	switch (sizeof(var)) {						\
-	case 1:								\
-		asm qual ("xaddb %0, "__percpu_arg(1)			\
-			    : "+q" (paro_ret__), "+m" (var)		\
-			    : : "memory");				\
-		break;							\
-	case 2:								\
-		asm qual ("xaddw %0, "__percpu_arg(1)			\
-			    : "+r" (paro_ret__), "+m" (var)		\
-			    : : "memory");				\
-		break;							\
-	case 4:								\
-		asm qual ("xaddl %0, "__percpu_arg(1)			\
-			    : "+r" (paro_ret__), "+m" (var)		\
-			    : : "memory");				\
-		break;							\
-	case 8:								\
-		asm qual ("xaddq %0, "__percpu_arg(1)			\
-			    : "+re" (paro_ret__), "+m" (var)		\
-			    : : "memory");				\
-		break;							\
-	default: __bad_percpu_size();					\
-	}								\
-	paro_ret__ += val;						\
-	paro_ret__;							\
+	__pcpu_type_##size paro_tmp__ = __pcpu_cast_##size(_val);	\
+	asm qual (__pcpu_op2_##size("xadd", "%[tmp]",			\
+				     __percpu_arg([var]))		\
+		  : [tmp] __pcpu_reg_##size("+", paro_tmp__),		\
+		    [var] "+m" (_var)					\
+		  : : "memory");					\
+	(typeof(_var))(unsigned long) (paro_tmp__ + _val);		\
 })
 
 /*
@@ -377,16 +358,16 @@ do {									\
 #define this_cpu_xchg_2(pcp, nval)	percpu_xchg_op(volatile, pcp, nval)
 #define this_cpu_xchg_4(pcp, nval)	percpu_xchg_op(volatile, pcp, nval)
 
-#define raw_cpu_add_return_1(pcp, val)		percpu_add_return_op(, pcp, val)
-#define raw_cpu_add_return_2(pcp, val)		percpu_add_return_op(, pcp, val)
-#define raw_cpu_add_return_4(pcp, val)		percpu_add_return_op(, pcp, val)
+#define raw_cpu_add_return_1(pcp, val)		percpu_add_return_op(1, , pcp, val)
+#define raw_cpu_add_return_2(pcp, val)		percpu_add_return_op(2, , pcp, val)
+#define raw_cpu_add_return_4(pcp, val)		percpu_add_return_op(4, , pcp, val)
 #define raw_cpu_cmpxchg_1(pcp, oval, nval)	percpu_cmpxchg_op(, pcp, oval, nval)
 #define raw_cpu_cmpxchg_2(pcp, oval, nval)	percpu_cmpxchg_op(, pcp, oval, nval)
 #define raw_cpu_cmpxchg_4(pcp, oval, nval)	percpu_cmpxchg_op(, pcp, oval, nval)
 
-#define this_cpu_add_return_1(pcp, val)		percpu_add_return_op(volatile, pcp, val)
-#define this_cpu_add_return_2(pcp, val)		percpu_add_return_op(volatile, pcp, val)
-#define this_cpu_add_return_4(pcp, val)		percpu_add_return_op(volatile, pcp, val)
+#define this_cpu_add_return_1(pcp, val)		percpu_add_return_op(1, volatile, pcp, val)
+#define this_cpu_add_return_2(pcp, val)		percpu_add_return_op(2, volatile, pcp, val)
+#define this_cpu_add_return_4(pcp, val)		percpu_add_return_op(4, volatile, pcp, val)
 #define this_cpu_cmpxchg_1(pcp, oval, nval)	percpu_cmpxchg_op(volatile, pcp, oval, nval)
 #define this_cpu_cmpxchg_2(pcp, oval, nval)	percpu_cmpxchg_op(volatile, pcp, oval, nval)
 #define this_cpu_cmpxchg_4(pcp, oval, nval)	percpu_cmpxchg_op(volatile, pcp, oval, nval)
@@ -418,7 +399,7 @@ do {									\
 #define raw_cpu_add_8(pcp, val)			percpu_add_op(8, , (pcp), val)
 #define raw_cpu_and_8(pcp, val)			percpu_to_op(8, , "and", (pcp), val)
 #define raw_cpu_or_8(pcp, val)			percpu_to_op(8, , "or", (pcp), val)
-#define raw_cpu_add_return_8(pcp, val)		percpu_add_return_op(, pcp, val)
+#define raw_cpu_add_return_8(pcp, val)		percpu_add_return_op(8, , pcp, val)
 #define raw_cpu_xchg_8(pcp, nval)		raw_percpu_xchg_op(pcp, nval)
 #define raw_cpu_cmpxchg_8(pcp, oval, nval)	percpu_cmpxchg_op(, pcp, oval, nval)
 
@@ -427,7 +408,7 @@ do {									\
 #define this_cpu_add_8(pcp, val)		percpu_add_op(8, volatile, (pcp), val)
 #define this_cpu_and_8(pcp, val)		percpu_to_op(8, volatile, "and", (pcp), val)
 #define this_cpu_or_8(pcp, val)			percpu_to_op(8, volatile, "or", (pcp), val)
-#define this_cpu_add_return_8(pcp, val)		percpu_add_return_op(volatile, pcp, val)
+#define this_cpu_add_return_8(pcp, val)		percpu_add_return_op(8, volatile, pcp, val)
 #define this_cpu_xchg_8(pcp, nval)		percpu_xchg_op(volatile, pcp, nval)
 #define this_cpu_cmpxchg_8(pcp, oval, nval)	percpu_cmpxchg_op(volatile, pcp, oval, nval)
 
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 6/7] x86/percpu: Clean up percpu_xchg_op()
  2020-05-17 15:29 [PATCH 0/7] x86: Clean up percpu operations Brian Gerst
                   ` (4 preceding siblings ...)
  2020-05-17 15:29 ` [PATCH 5/7] x86/percpu: Clean up percpu_add_return_op() Brian Gerst
@ 2020-05-17 15:29 ` Brian Gerst
  2020-05-18 22:15   ` Nick Desaulniers
  2020-05-17 15:29 ` [PATCH 7/7] x86/percpu: Clean up percpu_cmpxchg_op() Brian Gerst
  2020-05-18 19:19 ` [PATCH 0/7] x86: Clean up percpu operations Nick Desaulniers
  7 siblings, 1 reply; 24+ messages in thread
From: Brian Gerst @ 2020-05-17 15:29 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin,
	Andy Lutomirski, Peter Zijlstra, Nick Desaulniers, Brian Gerst

The core percpu macros already have a switch on the data size, so the switch
in the x86 code is redundant and produces more dead code.

Also use appropriate types for the width of the instructions.  This avoids
errors when compiling with Clang.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
---
 arch/x86/include/asm/percpu.h | 61 +++++++++++------------------------
 1 file changed, 18 insertions(+), 43 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index ac8c391a190e..3c95ab3c99cd 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -215,46 +215,21 @@ do {									\
  * expensive due to the implied lock prefix.  The processor cannot prefetch
  * cachelines if xchg is used.
  */
-#define percpu_xchg_op(qual, var, nval)					\
+#define percpu_xchg_op(size, qual, _var, _nval)				\
 ({									\
-	typeof(var) pxo_ret__;						\
-	typeof(var) pxo_new__ = (nval);					\
-	switch (sizeof(var)) {						\
-	case 1:								\
-		asm qual ("\n\tmov "__percpu_arg(1)",%%al"		\
-		    "\n1:\tcmpxchgb %2, "__percpu_arg(1)		\
-		    "\n\tjnz 1b"					\
-			    : "=&a" (pxo_ret__), "+m" (var)		\
-			    : "q" (pxo_new__)				\
-			    : "memory");				\
-		break;							\
-	case 2:								\
-		asm qual ("\n\tmov "__percpu_arg(1)",%%ax"		\
-		    "\n1:\tcmpxchgw %2, "__percpu_arg(1)		\
-		    "\n\tjnz 1b"					\
-			    : "=&a" (pxo_ret__), "+m" (var)		\
-			    : "r" (pxo_new__)				\
-			    : "memory");				\
-		break;							\
-	case 4:								\
-		asm qual ("\n\tmov "__percpu_arg(1)",%%eax"		\
-		    "\n1:\tcmpxchgl %2, "__percpu_arg(1)		\
-		    "\n\tjnz 1b"					\
-			    : "=&a" (pxo_ret__), "+m" (var)		\
-			    : "r" (pxo_new__)				\
-			    : "memory");				\
-		break;							\
-	case 8:								\
-		asm qual ("\n\tmov "__percpu_arg(1)",%%rax"		\
-		    "\n1:\tcmpxchgq %2, "__percpu_arg(1)		\
-		    "\n\tjnz 1b"					\
-			    : "=&a" (pxo_ret__), "+m" (var)		\
-			    : "r" (pxo_new__)				\
-			    : "memory");				\
-		break;							\
-	default: __bad_percpu_size();					\
-	}								\
-	pxo_ret__;							\
+	__pcpu_type_##size pxo_old__;					\
+	__pcpu_type_##size pxo_new__ = __pcpu_cast_##size(_nval);	\
+	asm qual (__pcpu_op2_##size("mov", __percpu_arg([var]),		\
+				    "%[oval]")				\
+		  "\n1:\t"						\
+		  __pcpu_op2_##size("cmpxchg", "%[nval]",		\
+				    __percpu_arg([var]))		\
+		  "\n\tjnz 1b"						\
+		  : [oval] "=&a" (pxo_old__),				\
+		    [var] "+m" (_var)					\
+		  : [nval] __pcpu_reg_##size(, pxo_new__)		\
+		  : "memory");						\
+	(typeof(_var))(unsigned long) pxo_old__;			\
 })
 
 /*
@@ -354,9 +329,9 @@ do {									\
 #define this_cpu_or_1(pcp, val)		percpu_to_op(1, volatile, "or", (pcp), val)
 #define this_cpu_or_2(pcp, val)		percpu_to_op(2, volatile, "or", (pcp), val)
 #define this_cpu_or_4(pcp, val)		percpu_to_op(4, volatile, "or", (pcp), val)
-#define this_cpu_xchg_1(pcp, nval)	percpu_xchg_op(volatile, pcp, nval)
-#define this_cpu_xchg_2(pcp, nval)	percpu_xchg_op(volatile, pcp, nval)
-#define this_cpu_xchg_4(pcp, nval)	percpu_xchg_op(volatile, pcp, nval)
+#define this_cpu_xchg_1(pcp, nval)	percpu_xchg_op(1, volatile, pcp, nval)
+#define this_cpu_xchg_2(pcp, nval)	percpu_xchg_op(2, volatile, pcp, nval)
+#define this_cpu_xchg_4(pcp, nval)	percpu_xchg_op(4, volatile, pcp, nval)
 
 #define raw_cpu_add_return_1(pcp, val)		percpu_add_return_op(1, , pcp, val)
 #define raw_cpu_add_return_2(pcp, val)		percpu_add_return_op(2, , pcp, val)
@@ -409,7 +384,7 @@ do {									\
 #define this_cpu_and_8(pcp, val)		percpu_to_op(8, volatile, "and", (pcp), val)
 #define this_cpu_or_8(pcp, val)			percpu_to_op(8, volatile, "or", (pcp), val)
 #define this_cpu_add_return_8(pcp, val)		percpu_add_return_op(8, volatile, pcp, val)
-#define this_cpu_xchg_8(pcp, nval)		percpu_xchg_op(volatile, pcp, nval)
+#define this_cpu_xchg_8(pcp, nval)		percpu_xchg_op(8, volatile, pcp, nval)
 #define this_cpu_cmpxchg_8(pcp, oval, nval)	percpu_cmpxchg_op(volatile, pcp, oval, nval)
 
 /*
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 7/7] x86/percpu: Clean up percpu_cmpxchg_op()
  2020-05-17 15:29 [PATCH 0/7] x86: Clean up percpu operations Brian Gerst
                   ` (5 preceding siblings ...)
  2020-05-17 15:29 ` [PATCH 6/7] x86/percpu: Clean up percpu_xchg_op() Brian Gerst
@ 2020-05-17 15:29 ` Brian Gerst
  2020-05-18 22:28   ` Nick Desaulniers
  2020-05-18 19:19 ` [PATCH 0/7] x86: Clean up percpu operations Nick Desaulniers
  7 siblings, 1 reply; 24+ messages in thread
From: Brian Gerst @ 2020-05-17 15:29 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin,
	Andy Lutomirski, Peter Zijlstra, Nick Desaulniers, Brian Gerst

The core percpu macros already have a switch on the data size, so the switch
in the x86 code is redundant and produces more dead code.

Also use appropriate types for the width of the instructions.  This avoids
errors when compiling with Clang.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
---
 arch/x86/include/asm/percpu.h | 58 +++++++++++------------------------
 1 file changed, 18 insertions(+), 40 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index 3c95ab3c99cd..b61d4fc5568e 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -236,39 +236,17 @@ do {									\
  * cmpxchg has no such implied lock semantics as a result it is much
  * more efficient for cpu local operations.
  */
-#define percpu_cmpxchg_op(qual, var, oval, nval)			\
+#define percpu_cmpxchg_op(size, qual, _var, _oval, _nval)		\
 ({									\
-	typeof(var) pco_ret__;						\
-	typeof(var) pco_old__ = (oval);					\
-	typeof(var) pco_new__ = (nval);					\
-	switch (sizeof(var)) {						\
-	case 1:								\
-		asm qual ("cmpxchgb %2, "__percpu_arg(1)		\
-			    : "=a" (pco_ret__), "+m" (var)		\
-			    : "q" (pco_new__), "0" (pco_old__)		\
-			    : "memory");				\
-		break;							\
-	case 2:								\
-		asm qual ("cmpxchgw %2, "__percpu_arg(1)		\
-			    : "=a" (pco_ret__), "+m" (var)		\
-			    : "r" (pco_new__), "0" (pco_old__)		\
-			    : "memory");				\
-		break;							\
-	case 4:								\
-		asm qual ("cmpxchgl %2, "__percpu_arg(1)		\
-			    : "=a" (pco_ret__), "+m" (var)		\
-			    : "r" (pco_new__), "0" (pco_old__)		\
-			    : "memory");				\
-		break;							\
-	case 8:								\
-		asm qual ("cmpxchgq %2, "__percpu_arg(1)		\
-			    : "=a" (pco_ret__), "+m" (var)		\
-			    : "r" (pco_new__), "0" (pco_old__)		\
-			    : "memory");				\
-		break;							\
-	default: __bad_percpu_size();					\
-	}								\
-	pco_ret__;							\
+	__pcpu_type_##size pco_old__ = __pcpu_cast_##size(_oval);	\
+	__pcpu_type_##size pco_new__ = __pcpu_cast_##size(_nval);	\
+	asm qual (__pcpu_op2_##size("cmpxchg", "%[nval]",		\
+				    __percpu_arg([var]))		\
+		  : [oval] "+a" (pco_old__),				\
+		    [var] "+m" (_var)					\
+		  : [nval] __pcpu_reg_##size(, pco_new__)		\
+		  : "memory");						\
+	(typeof(_var))(unsigned long) pco_old__;			\
 })
 
 /*
@@ -336,16 +314,16 @@ do {									\
 #define raw_cpu_add_return_1(pcp, val)		percpu_add_return_op(1, , pcp, val)
 #define raw_cpu_add_return_2(pcp, val)		percpu_add_return_op(2, , pcp, val)
 #define raw_cpu_add_return_4(pcp, val)		percpu_add_return_op(4, , pcp, val)
-#define raw_cpu_cmpxchg_1(pcp, oval, nval)	percpu_cmpxchg_op(, pcp, oval, nval)
-#define raw_cpu_cmpxchg_2(pcp, oval, nval)	percpu_cmpxchg_op(, pcp, oval, nval)
-#define raw_cpu_cmpxchg_4(pcp, oval, nval)	percpu_cmpxchg_op(, pcp, oval, nval)
+#define raw_cpu_cmpxchg_1(pcp, oval, nval)	percpu_cmpxchg_op(1, , pcp, oval, nval)
+#define raw_cpu_cmpxchg_2(pcp, oval, nval)	percpu_cmpxchg_op(2, , pcp, oval, nval)
+#define raw_cpu_cmpxchg_4(pcp, oval, nval)	percpu_cmpxchg_op(4, , pcp, oval, nval)
 
 #define this_cpu_add_return_1(pcp, val)		percpu_add_return_op(1, volatile, pcp, val)
 #define this_cpu_add_return_2(pcp, val)		percpu_add_return_op(2, volatile, pcp, val)
 #define this_cpu_add_return_4(pcp, val)		percpu_add_return_op(4, volatile, pcp, val)
-#define this_cpu_cmpxchg_1(pcp, oval, nval)	percpu_cmpxchg_op(volatile, pcp, oval, nval)
-#define this_cpu_cmpxchg_2(pcp, oval, nval)	percpu_cmpxchg_op(volatile, pcp, oval, nval)
-#define this_cpu_cmpxchg_4(pcp, oval, nval)	percpu_cmpxchg_op(volatile, pcp, oval, nval)
+#define this_cpu_cmpxchg_1(pcp, oval, nval)	percpu_cmpxchg_op(1, volatile, pcp, oval, nval)
+#define this_cpu_cmpxchg_2(pcp, oval, nval)	percpu_cmpxchg_op(2, volatile, pcp, oval, nval)
+#define this_cpu_cmpxchg_4(pcp, oval, nval)	percpu_cmpxchg_op(4, volatile, pcp, oval, nval)
 
 #ifdef CONFIG_X86_CMPXCHG64
 #define percpu_cmpxchg8b_double(pcp1, pcp2, o1, o2, n1, n2)		\
@@ -376,7 +354,7 @@ do {									\
 #define raw_cpu_or_8(pcp, val)			percpu_to_op(8, , "or", (pcp), val)
 #define raw_cpu_add_return_8(pcp, val)		percpu_add_return_op(8, , pcp, val)
 #define raw_cpu_xchg_8(pcp, nval)		raw_percpu_xchg_op(pcp, nval)
-#define raw_cpu_cmpxchg_8(pcp, oval, nval)	percpu_cmpxchg_op(, pcp, oval, nval)
+#define raw_cpu_cmpxchg_8(pcp, oval, nval)	percpu_cmpxchg_op(8, , pcp, oval, nval)
 
 #define this_cpu_read_8(pcp)			percpu_from_op(8, volatile, "mov", pcp)
 #define this_cpu_write_8(pcp, val)		percpu_to_op(8, volatile, "mov", (pcp), val)
@@ -385,7 +363,7 @@ do {									\
 #define this_cpu_or_8(pcp, val)			percpu_to_op(8, volatile, "or", (pcp), val)
 #define this_cpu_add_return_8(pcp, val)		percpu_add_return_op(8, volatile, pcp, val)
 #define this_cpu_xchg_8(pcp, nval)		percpu_xchg_op(8, volatile, pcp, nval)
-#define this_cpu_cmpxchg_8(pcp, oval, nval)	percpu_cmpxchg_op(volatile, pcp, oval, nval)
+#define this_cpu_cmpxchg_8(pcp, oval, nval)	percpu_cmpxchg_op(8, volatile, pcp, oval, nval)
 
 /*
  * Pretty complex macro to generate cmpxchg16 instruction.  The instruction
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/7] x86/percpu: Clean up percpu_add_op()
  2020-05-17 15:29 ` [PATCH 4/7] x86/percpu: Clean up percpu_add_op() Brian Gerst
@ 2020-05-17 19:18   ` kbuild test robot
  2020-05-17 20:04   ` kbuild test robot
  2020-05-18 23:42   ` Nick Desaulniers
  2 siblings, 0 replies; 24+ messages in thread
From: kbuild test robot @ 2020-05-17 19:18 UTC (permalink / raw)
  To: Brian Gerst, linux-kernel, x86
  Cc: kbuild-all, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H . Peter Anvin, Andy Lutomirski, Peter Zijlstra,
	Nick Desaulniers, Brian Gerst

[-- Attachment #1: Type: text/plain, Size: 2790 bytes --]

Hi Brian,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on dennis-percpu/for-next]
[also build test WARNING on tip/auto-latest linus/master linux/master v5.7-rc5 next-20200515]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Brian-Gerst/x86-Clean-up-percpu-operations/20200517-233137
base:   https://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu.git for-next
config: i386-allmodconfig (attached as .config)
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.1-193-gb8fad4bc-dirty
        # save the attached .config to linux build tree
        make C=1 ARCH=i386 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__'

If you fix the issue, kindly add following tag as appropriate
Reported-by: kbuild test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)

   net/ipv4/af_inet.c:1472:59: sparse: sparse: restricted __be16 degrades to integer
   include/net/tcp.h:1521:9: sparse: sparse: cast truncates bits from constant value (1d4c0 becomes c0)
>> include/net/tcp.h:1521:9: sparse: sparse: cast truncates bits from constant value (1d4c0 becomes d4c0)

vim +1521 include/net/tcp.h

^1da177e4c3f41 Linus Torvalds  2005-04-16  1512  
7970ddc8f9ffe1 Eric Dumazet    2015-03-16  1513  bool tcp_oow_rate_limited(struct net *net, const struct sk_buff *skb,
7970ddc8f9ffe1 Eric Dumazet    2015-03-16  1514  			  int mib_idx, u32 *last_oow_ack_time);
032ee4236954eb Neal Cardwell   2015-02-06  1515  
a9c19329eccdb1 Pavel Emelyanov 2008-07-16  1516  static inline void tcp_mib_init(struct net *net)
^1da177e4c3f41 Linus Torvalds  2005-04-16  1517  {
^1da177e4c3f41 Linus Torvalds  2005-04-16  1518  	/* See RFC 2012 */
6aef70a851ac77 Eric Dumazet    2016-04-27  1519  	TCP_ADD_STATS(net, TCP_MIB_RTOALGORITHM, 1);
6aef70a851ac77 Eric Dumazet    2016-04-27  1520  	TCP_ADD_STATS(net, TCP_MIB_RTOMIN, TCP_RTO_MIN*1000/HZ);
6aef70a851ac77 Eric Dumazet    2016-04-27 @1521  	TCP_ADD_STATS(net, TCP_MIB_RTOMAX, TCP_RTO_MAX*1000/HZ);
6aef70a851ac77 Eric Dumazet    2016-04-27  1522  	TCP_ADD_STATS(net, TCP_MIB_MAXCONN, -1);
^1da177e4c3f41 Linus Torvalds  2005-04-16  1523  }
^1da177e4c3f41 Linus Torvalds  2005-04-16  1524  

:::::: The code at line 1521 was first introduced by commit
:::::: 6aef70a851ac77967992340faaff33f44598f60a net: snmp: kill various STATS_USER() helpers

:::::: TO: Eric Dumazet <edumazet@google.com>
:::::: CC: David S. Miller <davem@davemloft.net>

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 72122 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/7] x86/percpu: Clean up percpu_add_op()
  2020-05-17 15:29 ` [PATCH 4/7] x86/percpu: Clean up percpu_add_op() Brian Gerst
  2020-05-17 19:18   ` kbuild test robot
@ 2020-05-17 20:04   ` kbuild test robot
  2020-05-18 23:42   ` Nick Desaulniers
  2 siblings, 0 replies; 24+ messages in thread
From: kbuild test robot @ 2020-05-17 20:04 UTC (permalink / raw)
  To: Brian Gerst, linux-kernel, x86
  Cc: kbuild-all, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H . Peter Anvin, Andy Lutomirski, Peter Zijlstra,
	Nick Desaulniers, Brian Gerst

[-- Attachment #1: Type: text/plain, Size: 29038 bytes --]

Hi Brian,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on dennis-percpu/for-next]
[also build test WARNING on tip/auto-latest linus/master linux/master v5.7-rc5 next-20200515]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Brian-Gerst/x86-Clean-up-percpu-operations/20200517-233137
base:   https://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu.git for-next
config: x86_64-allyesconfig (attached as .config)
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.1-193-gb8fad4bc-dirty
        # save the attached .config to linux build tree
        make C=1 ARCH=x86_64 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__'

If you fix the issue, kindly add following tag as appropriate
Reported-by: kbuild test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)

   kernel/fork.c:995:19: sparse: sparse: incorrect type in assignment (different address spaces) @@    expected struct task_struct [noderef] <asn:4> *owner @@    got  [noderef] <asn:4> *owner @@
   kernel/fork.c:995:19: sparse:    expected struct task_struct [noderef] <asn:4> *owner
   kernel/fork.c:995:19: sparse:    got struct task_struct *p
   kernel/fork.c:1507:38: sparse: sparse: incorrect type in argument 1 (different address spaces) @@    expected struct refcount_struct [usertype] *r @@    got struct struct refcount_struct [usertype] *r @@
   kernel/fork.c:1507:38: sparse:    expected struct refcount_struct [usertype] *r
   kernel/fork.c:1507:38: sparse:    got struct refcount_struct [noderef] <asn:4> *
   kernel/fork.c:1516:31: sparse: sparse: incorrect type in argument 1 (different address spaces) @@    expected struct spinlock [usertype] *lock @@    got struct struct spinlock [usertype] *lock @@
   kernel/fork.c:1516:31: sparse:    expected struct spinlock [usertype] *lock
   kernel/fork.c:1516:31: sparse:    got struct spinlock [noderef] <asn:4> *
   kernel/fork.c:1517:9: sparse: sparse: incorrect type in argument 2 (different address spaces) @@    expected void const *from @@    got struct k_sigaction [noderevoid const *from @@
   kernel/fork.c:1517:9: sparse:    expected void const *from
   kernel/fork.c:1517:9: sparse:    got struct k_sigaction [noderef] <asn:4> *
   kernel/fork.c:1518:33: sparse: sparse: incorrect type in argument 1 (different address spaces) @@    expected struct spinlock [usertype] *lock @@    got struct struct spinlock [usertype] *lock @@
   kernel/fork.c:1518:33: sparse:    expected struct spinlock [usertype] *lock
   kernel/fork.c:1518:33: sparse:    got struct spinlock [noderef] <asn:4> *
   kernel/fork.c:1610:9: sparse: sparse: incorrect type in argument 1 (different address spaces) @@    expected struct qspinlock *lock @@    got struct qspinlock [struct qspinlock *lock @@
   kernel/fork.c:1610:9: sparse:    expected struct qspinlock *lock
   kernel/fork.c:1610:9: sparse:    got struct qspinlock [noderef] <asn:4> *
   kernel/fork.c:1910:31: sparse: sparse: incorrect type in argument 1 (different address spaces) @@    expected struct spinlock [usertype] *lock @@    got struct struct spinlock [usertype] *lock @@
   kernel/fork.c:1910:31: sparse:    expected struct spinlock [usertype] *lock
   kernel/fork.c:1910:31: sparse:    got struct spinlock [noderef] <asn:4> *
   kernel/fork.c:1914:33: sparse: sparse: incorrect type in argument 1 (different address spaces) @@    expected struct spinlock [usertype] *lock @@    got struct struct spinlock [usertype] *lock @@
   kernel/fork.c:1914:33: sparse:    expected struct spinlock [usertype] *lock
   kernel/fork.c:1914:33: sparse:    got struct spinlock [noderef] <asn:4> *
   kernel/fork.c:2210:32: sparse: sparse: incorrect type in assignment (different address spaces) @@    expected struct task_struct [noderef] <asn:4> *real_parent @@    got  [noderef] <asn:4> *real_parent @@
   kernel/fork.c:2210:32: sparse:    expected struct task_struct [noderef] <asn:4> *real_parent
   kernel/fork.c:2210:32: sparse:    got struct task_struct *
   kernel/fork.c:2216:27: sparse: sparse: incorrect type in argument 1 (different address spaces) @@    expected struct spinlock [usertype] *lock @@    got struct struct spinlock [usertype] *lock @@
   kernel/fork.c:2216:27: sparse:    expected struct spinlock [usertype] *lock
   kernel/fork.c:2216:27: sparse:    got struct spinlock [noderef] <asn:4> *
   kernel/fork.c:2265:54: sparse: sparse: incorrect type in argument 2 (different address spaces) @@    expected struct list_head *head @@    got struct list_head [struct list_head *head @@
   kernel/fork.c:2265:54: sparse:    expected struct list_head *head
   kernel/fork.c:2265:54: sparse:    got struct list_head [noderef] <asn:4> *
   kernel/fork.c:2286:29: sparse: sparse: incorrect type in argument 1 (different address spaces) @@    expected struct spinlock [usertype] *lock @@    got struct struct spinlock [usertype] *lock @@
   kernel/fork.c:2286:29: sparse:    expected struct spinlock [usertype] *lock
   kernel/fork.c:2286:29: sparse:    got struct spinlock [noderef] <asn:4> *
   kernel/fork.c:2301:29: sparse: sparse: incorrect type in argument 1 (different address spaces) @@    expected struct spinlock [usertype] *lock @@    got struct struct spinlock [usertype] *lock @@
   kernel/fork.c:2301:29: sparse:    expected struct spinlock [usertype] *lock
   kernel/fork.c:2301:29: sparse:    got struct spinlock [noderef] <asn:4> *
   kernel/fork.c:2330:28: sparse: sparse: incorrect type in argument 1 (different address spaces) @@    expected struct sighand_struct *sighand @@    got struct sighand_strstruct sighand_struct *sighand @@
   kernel/fork.c:2330:28: sparse:    expected struct sighand_struct *sighand
   kernel/fork.c:2330:28: sparse:    got struct sighand_struct [noderef] <asn:4> *sighand
   kernel/fork.c:2358:31: sparse: sparse: incorrect type in argument 1 (different address spaces) @@    expected struct spinlock [usertype] *lock @@    got struct struct spinlock [usertype] *lock @@
   kernel/fork.c:2358:31: sparse:    expected struct spinlock [usertype] *lock
   kernel/fork.c:2358:31: sparse:    got struct spinlock [noderef] <asn:4> *
   kernel/fork.c:2360:33: sparse: sparse: incorrect type in argument 1 (different address spaces) @@    expected struct spinlock [usertype] *lock @@    got struct struct spinlock [usertype] *lock @@
   kernel/fork.c:2360:33: sparse:    expected struct spinlock [usertype] *lock
   kernel/fork.c:2360:33: sparse:    got struct spinlock [noderef] <asn:4> *
   kernel/fork.c:2766:24: sparse: sparse: incorrect type in assignment (different address spaces) @@    expected struct task_struct *[assigned] parent @@    got struct struct task_struct *[assigned] parent @@
   kernel/fork.c:2766:24: sparse:    expected struct task_struct *[assigned] parent
   kernel/fork.c:2766:24: sparse:    got struct task_struct [noderef] <asn:4> *real_parent
   kernel/fork.c:2847:43: sparse: sparse: incorrect type in argument 1 (different address spaces) @@    expected struct refcount_struct const [usertype] *r @@    got sstruct refcount_struct const [usertype] *r @@
   kernel/fork.c:2847:43: sparse:    expected struct refcount_struct const [usertype] *r
   kernel/fork.c:2847:43: sparse:    got struct refcount_struct [noderef] <asn:4> *
   kernel/fork.c:1945:27: sparse: sparse: dereference of noderef expression
   kernel/fork.c:1945:27: sparse: sparse: dereference of noderef expression
   kernel/fork.c:1947:22: sparse: sparse: dereference of noderef expression
   include/linux/ptrace.h:218:45: sparse: sparse: incorrect type in argument 2 (different address spaces) @@    expected struct task_struct *new_parent @@    got struct task_structstruct task_struct *new_parent @@
   include/linux/ptrace.h:218:45: sparse:    expected struct task_struct *new_parent
   include/linux/ptrace.h:218:45: sparse:    got struct task_struct [noderef] <asn:4> *parent
   include/linux/ptrace.h:218:62: sparse: sparse: incorrect type in argument 3 (different address spaces) @@    expected struct cred const *ptracer_cred @@    got struct cred const struct cred const *ptracer_cred @@
   include/linux/ptrace.h:218:62: sparse:    expected struct cred const *ptracer_cred
   include/linux/ptrace.h:218:62: sparse:    got struct cred const [noderef] <asn:4> *ptracer_cred
   kernel/fork.c:2263:59: sparse: sparse: dereference of noderef expression
   kernel/fork.c:2264:59: sparse: sparse: dereference of noderef expression
   include/linux/percpu-rwsem.h:91:17: sparse: sparse: cast truncates bits from constant value (ffffffff becomes ff)
>> include/linux/percpu-rwsem.h:91:17: sparse: sparse: cast truncates bits from constant value (ffffffff becomes ffff)
   kernel/fork.c:987:23: sparse: sparse: incompatible types in comparison expression (different address spaces):
   kernel/fork.c:987:23: sparse:    struct task_struct [noderef] <asn:4> *
   kernel/fork.c:987:23: sparse:    struct task_struct *
--
   include/linux/percpu-rwsem.h:91:17: sparse: sparse: cast truncates bits from constant value (ffffffff becomes ff)
>> include/linux/percpu-rwsem.h:91:17: sparse: sparse: cast truncates bits from constant value (ffffffff becomes ffff)
   kernel/cpu.c:827:9: sparse: sparse: context imbalance in 'clear_tasks_mm_cpumask' - different lock contexts for basic block
--
   kernel/signal.c:2919:27: sparse:    got struct spinlock [noderef] <asn:4> *
   kernel/signal.c:2921:29: sparse: sparse: incorrect type in argument 1 (different address spaces) @@    expected struct spinlock [usertype] *lock @@    got struct struct spinlock [usertype] *lock @@
   kernel/signal.c:2921:29: sparse:    expected struct spinlock [usertype] *lock
   kernel/signal.c:2921:29: sparse:    got struct spinlock [noderef] <asn:4> *
   kernel/signal.c:3072:31: sparse: sparse: incorrect type in argument 1 (different address spaces) @@    expected struct spinlock [usertype] *lock @@    got struct struct spinlock [usertype] *lock @@
   kernel/signal.c:3072:31: sparse:    expected struct spinlock [usertype] *lock
   kernel/signal.c:3072:31: sparse:    got struct spinlock [noderef] <asn:4> *
   kernel/signal.c:3075:33: sparse: sparse: incorrect type in argument 1 (different address spaces) @@    expected struct spinlock [usertype] *lock @@    got struct struct spinlock [usertype] *lock @@
   kernel/signal.c:3075:33: sparse:    expected struct spinlock [usertype] *lock
   kernel/signal.c:3075:33: sparse:    got struct spinlock [noderef] <asn:4> *
   kernel/signal.c:3458:27: sparse: sparse: incorrect type in argument 1 (different address spaces) @@    expected struct spinlock [usertype] *lock @@    got struct struct spinlock [usertype] *lock @@
   kernel/signal.c:3458:27: sparse:    expected struct spinlock [usertype] *lock
   kernel/signal.c:3458:27: sparse:    got struct spinlock [noderef] <asn:4> *
   kernel/signal.c:3470:37: sparse: sparse: incorrect type in argument 1 (different address spaces) @@    expected struct spinlock [usertype] *lock @@    got struct struct spinlock [usertype] *lock @@
   kernel/signal.c:3470:37: sparse:    expected struct spinlock [usertype] *lock
   kernel/signal.c:3470:37: sparse:    got struct spinlock [noderef] <asn:4> *
   kernel/signal.c:3475:35: sparse: sparse: incorrect type in argument 1 (different address spaces) @@    expected struct spinlock [usertype] *lock @@    got struct struct spinlock [usertype] *lock @@
   kernel/signal.c:3475:35: sparse:    expected struct spinlock [usertype] *lock
   kernel/signal.c:3475:35: sparse:    got struct spinlock [noderef] <asn:4> *
   kernel/signal.c:3480:29: sparse: sparse: incorrect type in argument 1 (different address spaces) @@    expected struct spinlock [usertype] *lock @@    got struct struct spinlock [usertype] *lock @@
   kernel/signal.c:3480:29: sparse:    expected struct spinlock [usertype] *lock
   kernel/signal.c:3480:29: sparse:    got struct spinlock [noderef] <asn:4> *
   kernel/signal.c:3681:46: sparse: sparse: incorrect type in argument 2 (different address spaces) @@    expected struct siginfo const [noderef] [usertype] <asn:1> *from @@    got deref] [usertype] <asn:1> *from @@
   kernel/signal.c:3681:46: sparse:    expected struct siginfo const [noderef] [usertype] <asn:1> *from
   kernel/signal.c:3681:46: sparse:    got struct siginfo [usertype] *info
   kernel/signal.c:3933:31: sparse: sparse: incorrect type in argument 1 (different address spaces) @@    expected struct spinlock [usertype] *lock @@    got struct struct spinlock [usertype] *lock @@
   kernel/signal.c:3933:31: sparse:    expected struct spinlock [usertype] *lock
   kernel/signal.c:3933:31: sparse:    got struct spinlock [noderef] <asn:4> *
   kernel/signal.c:3945:33: sparse: sparse: incorrect type in argument 1 (different address spaces) @@    expected struct spinlock [usertype] *lock @@    got struct struct spinlock [usertype] *lock @@
   kernel/signal.c:3945:33: sparse:    expected struct spinlock [usertype] *lock
   kernel/signal.c:3945:33: sparse:    got struct spinlock [noderef] <asn:4> *
   kernel/signal.c:3963:11: sparse: sparse: incorrect type in assignment (different address spaces) @@    expected struct k_sigaction *k @@    got struct k_sigactionstruct k_sigaction *k @@
   kernel/signal.c:3963:11: sparse:    expected struct k_sigaction *k
   kernel/signal.c:3963:11: sparse:    got struct k_sigaction [noderef] <asn:4> *
   kernel/signal.c:3965:25: sparse: sparse: incorrect type in argument 1 (different address spaces) @@    expected struct spinlock [usertype] *lock @@    got struct struct spinlock [usertype] *lock @@
   kernel/signal.c:3965:25: sparse:    expected struct spinlock [usertype] *lock
   kernel/signal.c:3965:25: sparse:    got struct spinlock [noderef] <asn:4> *
   kernel/signal.c:3995:27: sparse: sparse: incorrect type in argument 1 (different address spaces) @@    expected struct spinlock [usertype] *lock @@    got struct struct spinlock [usertype] *lock @@
   kernel/signal.c:3995:27: sparse:    expected struct spinlock [usertype] *lock
   kernel/signal.c:3995:27: sparse:    got struct spinlock [noderef] <asn:4> *
   kernel/signal.c:4594:29: sparse: sparse: incorrect type in argument 1 (different address spaces) @@    expected struct spinlock [usertype] *lock @@    got struct struct spinlock [usertype] *lock @@
   kernel/signal.c:4594:29: sparse:    expected struct spinlock [usertype] *lock
   kernel/signal.c:4594:29: sparse:    got struct spinlock [noderef] <asn:4> *
   kernel/signal.c:4603:31: sparse: sparse: incorrect type in argument 1 (different address spaces) @@    expected struct spinlock [usertype] *lock @@    got struct struct spinlock [usertype] *lock @@
   kernel/signal.c:4603:31: sparse:    expected struct spinlock [usertype] *lock
   kernel/signal.c:4603:31: sparse:    got struct spinlock [noderef] <asn:4> *
   kernel/signal.c:4613:23: sparse: sparse: incorrect type in argument 1 (different address spaces) @@    expected struct spinlock [usertype] *lock @@    got struct struct spinlock [usertype] *lock @@
   kernel/signal.c:4613:23: sparse:    expected struct spinlock [usertype] *lock
   kernel/signal.c:4613:23: sparse:    got struct spinlock [noderef] <asn:4> *
   kernel/signal.c:69:34: sparse: sparse: dereference of noderef expression
   kernel/signal.c:528:35: sparse: sparse: dereference of noderef expression
   kernel/signal.c:556:52: sparse: sparse: dereference of noderef expression
   kernel/signal.c:1030:13: sparse: sparse: dereference of noderef expression
   include/linux/signalfd.h:21:13: sparse: sparse: incorrect type in argument 1 (different address spaces) @@    expected struct wait_queue_head *wq_head @@    got struct wait_queue_struct wait_queue_head *wq_head @@
   include/linux/signalfd.h:21:13: sparse:    expected struct wait_queue_head *wq_head
   include/linux/signalfd.h:21:13: sparse:    got struct wait_queue_head [noderef] <asn:4> *
   include/linux/signalfd.h:22:17: sparse: sparse: incorrect type in argument 1 (different address spaces) @@    expected struct wait_queue_head *wq_head @@    got struct wait_queue_struct wait_queue_head *wq_head @@
   include/linux/signalfd.h:22:17: sparse:    expected struct wait_queue_head *wq_head
   include/linux/signalfd.h:22:17: sparse:    got struct wait_queue_head [noderef] <asn:4> *
   include/linux/sched/signal.h:681:37: sparse: sparse: incorrect type in argument 1 (different address spaces) @@    expected struct spinlock [usertype] *lock @@    got struct struct spinlock [usertype] *lock @@
   include/linux/sched/signal.h:681:37: sparse:    expected struct spinlock [usertype] *lock
   include/linux/sched/signal.h:681:37: sparse:    got struct spinlock [noderef] <asn:4> *
   kernel/signal.c:1294:9: sparse: sparse: context imbalance in 'do_send_sig_info' - different lock contexts for basic block
   include/linux/rcupdate.h:651:9: sparse: sparse: context imbalance in '__lock_task_sighand' - different lock contexts for basic block
   include/linux/sched/signal.h:681:37: sparse: sparse: incorrect type in argument 1 (different address spaces) @@    expected struct spinlock [usertype] *lock @@    got struct struct spinlock [usertype] *lock @@
   include/linux/sched/signal.h:681:37: sparse:    expected struct spinlock [usertype] *lock
   include/linux/sched/signal.h:681:37: sparse:    got struct spinlock [noderef] <asn:4> *
   kernel/signal.c:1650:35: sparse: sparse: dereference of noderef expression
   include/linux/signalfd.h:21:13: sparse: sparse: incorrect type in argument 1 (different address spaces) @@    expected struct wait_queue_head *wq_head @@    got struct wait_queue_struct wait_queue_head *wq_head @@
   include/linux/signalfd.h:21:13: sparse:    expected struct wait_queue_head *wq_head
   include/linux/signalfd.h:21:13: sparse:    got struct wait_queue_head [noderef] <asn:4> *
   include/linux/signalfd.h:22:17: sparse: sparse: incorrect type in argument 1 (different address spaces) @@    expected struct wait_queue_head *wq_head @@    got struct wait_queue_struct wait_queue_head *wq_head @@
   include/linux/signalfd.h:22:17: sparse:    expected struct wait_queue_head *wq_head
   include/linux/signalfd.h:22:17: sparse:    got struct wait_queue_head [noderef] <asn:4> *
   include/linux/sched/signal.h:681:37: sparse: sparse: incorrect type in argument 1 (different address spaces) @@    expected struct spinlock [usertype] *lock @@    got struct struct spinlock [usertype] *lock @@
   include/linux/sched/signal.h:681:37: sparse:    expected struct spinlock [usertype] *lock
   include/linux/sched/signal.h:681:37: sparse:    got struct spinlock [noderef] <asn:4> *
   include/linux/rcupdate.h:653:9: sparse: sparse: context imbalance in 'send_sigqueue' - wrong count at exit
   kernel/signal.c:1934:47: sparse: sparse: dereference of noderef expression
   kernel/signal.c:1954:40: sparse: sparse: dereference of noderef expression
   kernel/signal.c:1954:40: sparse: sparse: dereference of noderef expression
   kernel/signal.c:2093:13: sparse: sparse: dereference of noderef expression
   include/linux/ptrace.h:99:40: sparse: sparse: incorrect type in argument 1 (different address spaces) @@    expected struct task_struct *p1 @@    got struct task_structstruct task_struct *p1 @@
   include/linux/ptrace.h:99:40: sparse:    expected struct task_struct *p1
   include/linux/ptrace.h:99:40: sparse:    got struct task_struct [noderef] <asn:4> *real_parent
   include/linux/ptrace.h:99:60: sparse: sparse: incorrect type in argument 2 (different address spaces) @@    expected struct task_struct *p2 @@    got struct task_structstruct task_struct *p2 @@
   include/linux/ptrace.h:99:60: sparse:    expected struct task_struct *p2
   include/linux/ptrace.h:99:60: sparse:    got struct task_struct [noderef] <asn:4> *parent
   kernel/signal.c:2304:13: sparse: sparse: context imbalance in 'do_signal_stop' - different lock contexts for basic block
   kernel/signal.c:2513:49: sparse: sparse: dereference of noderef expression
   kernel/signal.c:2513:49: sparse: sparse: dereference of noderef expression
   include/linux/ptrace.h:99:40: sparse: sparse: incorrect type in argument 1 (different address spaces) @@    expected struct task_struct *p1 @@    got struct task_structstruct task_struct *p1 @@
   include/linux/ptrace.h:99:40: sparse:    expected struct task_struct *p1
   include/linux/ptrace.h:99:40: sparse:    got struct task_struct [noderef] <asn:4> *real_parent
   include/linux/ptrace.h:99:60: sparse: sparse: incorrect type in argument 2 (different address spaces) @@    expected struct task_struct *p2 @@    got struct task_structstruct task_struct *p2 @@
   include/linux/ptrace.h:99:60: sparse:    expected struct task_struct *p2
   include/linux/ptrace.h:99:60: sparse:    got struct task_struct [noderef] <asn:4> *parent
   kernel/signal.c:2596:69: sparse: sparse: context imbalance in 'get_signal' - unexpected unlock
   include/linux/percpu-rwsem.h:91:17: sparse: sparse: cast truncates bits from constant value (ffffffff becomes ff)
>> include/linux/percpu-rwsem.h:91:17: sparse: sparse: cast truncates bits from constant value (ffffffff becomes ffff)
   kernel/signal.c:3741:58: sparse: sparse: incorrect type in argument 2 (different address spaces) @@    expected struct siginfo [usertype] *info @@    got struct siginfo [nostruct siginfo [usertype] *info @@
   kernel/signal.c:3741:58: sparse:    expected struct siginfo [usertype] *info
   kernel/signal.c:3741:58: sparse:    got struct siginfo [noderef] [usertype] <asn:1> *info
   kernel/signal.c:3934:33: sparse: sparse: dereference of noderef expression
--
   kernel/locking/locktorture.c:575:6: sparse: sparse: symbol 'torture_percpu_rwsem_init' was not declared. Should it be static?
   kernel/locking/locktorture.c:320:12: sparse: sparse: context imbalance in 'torture_mutex_lock' - wrong count at exit
   kernel/locking/locktorture.c:340:13: sparse: sparse: context imbalance in 'torture_mutex_unlock' - wrong count at exit
   kernel/locking/locktorture.c:362:12: sparse: sparse: context imbalance in 'torture_ww_mutex_lock' - wrong count at exit
   kernel/locking/locktorture.c:407:13: sparse: sparse: context imbalance in 'torture_ww_mutex_unlock' - wrong count at exit
   kernel/locking/locktorture.c:431:12: sparse: sparse: context imbalance in 'torture_rtmutex_lock' - wrong count at exit
   kernel/locking/locktorture.c:493:13: sparse: sparse: context imbalance in 'torture_rtmutex_unlock' - wrong count at exit
   kernel/locking/locktorture.c:511:12: sparse: sparse: context imbalance in 'torture_rwsem_down_write' - wrong count at exit
   kernel/locking/locktorture.c:531:13: sparse: sparse: context imbalance in 'torture_rwsem_up_write' - wrong count at exit
   kernel/locking/locktorture.c:536:12: sparse: sparse: context imbalance in 'torture_rwsem_down_read' - wrong count at exit
   kernel/locking/locktorture.c:556:13: sparse: sparse: context imbalance in 'torture_rwsem_up_read' - wrong count at exit
   kernel/locking/locktorture.c:580:12: sparse: sparse: context imbalance in 'torture_percpu_rwsem_down_write' - wrong count at exit
   kernel/locking/locktorture.c:586:13: sparse: sparse: context imbalance in 'torture_percpu_rwsem_up_write' - wrong count at exit
   include/linux/percpu-rwsem.h:58:9: sparse: sparse: context imbalance in 'torture_percpu_rwsem_down_read' - wrong count at exit
   include/linux/percpu-rwsem.h:91:17: sparse: sparse: cast truncates bits from constant value (ffffffff becomes ff)
>> include/linux/percpu-rwsem.h:91:17: sparse: sparse: cast truncates bits from constant value (ffffffff becomes ffff)
   include/linux/percpu-rwsem.h:94:9: sparse: sparse: context imbalance in 'torture_percpu_rwsem_up_read' - wrong count at exit
--
   net/sched/sch_generic.c:189:50: sparse: sparse: context imbalance in 'try_bulk_dequeue_skb_slow' - different lock contexts for basic block
   include/net/sch_generic.h:888:9: sparse: sparse: cast truncates bits from constant value (ffffffff becomes ff)
>> include/net/sch_generic.h:888:9: sparse: sparse: cast truncates bits from constant value (ffffffff becomes ffff)
   net/sched/sch_generic.c:248:17: sparse: sparse: context imbalance in 'dequeue_skb' - different lock contexts for basic block
   net/sched/sch_generic.c:294:28: sparse: sparse: context imbalance in 'sch_direct_xmit' - unexpected unlock
   net/sched/sch_generic.c:1127:9: sparse: sparse: context imbalance in 'dev_deactivate_queue' - different lock contexts for basic block
--
   kernel/events/uprobes.c:1978:33: sparse: sparse: incorrect type in argument 1 (different address spaces) @@    expected struct spinlock [usertype] *lock @@    got struct struct spinlock [usertype] *lock @@
   kernel/events/uprobes.c:1978:33: sparse:    expected struct spinlock [usertype] *lock
   kernel/events/uprobes.c:1978:33: sparse:    got struct spinlock [noderef] <asn:4> *
   kernel/events/uprobes.c:1980:35: sparse: sparse: incorrect type in argument 1 (different address spaces) @@    expected struct spinlock [usertype] *lock @@    got struct struct spinlock [usertype] *lock @@
   kernel/events/uprobes.c:1980:35: sparse:    expected struct spinlock [usertype] *lock
   kernel/events/uprobes.c:1980:35: sparse:    got struct spinlock [noderef] <asn:4> *
   kernel/events/uprobes.c:2277:31: sparse: sparse: incorrect type in argument 1 (different address spaces) @@    expected struct spinlock [usertype] *lock @@    got struct struct spinlock [usertype] *lock @@
   kernel/events/uprobes.c:2277:31: sparse:    expected struct spinlock [usertype] *lock
   kernel/events/uprobes.c:2277:31: sparse:    got struct spinlock [noderef] <asn:4> *
   kernel/events/uprobes.c:2279:33: sparse: sparse: incorrect type in argument 1 (different address spaces) @@    expected struct spinlock [usertype] *lock @@    got struct struct spinlock [usertype] *lock @@
   kernel/events/uprobes.c:2279:33: sparse:    expected struct spinlock [usertype] *lock
   kernel/events/uprobes.c:2279:33: sparse:    got struct spinlock [noderef] <asn:4> *
   include/linux/rmap.h:220:28: sparse: sparse: context imbalance in '__replace_page' - unexpected unlock
   include/linux/percpu-rwsem.h:91:17: sparse: sparse: cast truncates bits from constant value (ffffffff becomes ff)
>> include/linux/percpu-rwsem.h:91:17: sparse: sparse: cast truncates bits from constant value (ffffffff becomes ffff)
--
   drivers/infiniband/hw/hfi1/pio_copy.c:421:24: sparse: sparse: cast removes address space '<asn:2>' of expression
   drivers/infiniband/hw/hfi1/pio_copy.c:421:24: sparse: sparse: cast removes address space '<asn:2>' of expression
   drivers/infiniband/hw/hfi1/pio_copy.c:547:24: sparse: sparse: cast removes address space '<asn:2>' of expression
   drivers/infiniband/hw/hfi1/pio_copy.c:547:24: sparse: sparse: cast removes address space '<asn:2>' of expression
   drivers/infiniband/hw/hfi1/pio_copy.c:164:9: sparse: sparse: cast truncates bits from constant value (ffffffff becomes ff)
>> drivers/infiniband/hw/hfi1/pio_copy.c:164:9: sparse: sparse: cast truncates bits from constant value (ffffffff becomes ffff)
   drivers/infiniband/hw/hfi1/pio_copy.c:755:9: sparse: sparse: cast truncates bits from constant value (ffffffff becomes ff)
   drivers/infiniband/hw/hfi1/pio_copy.c:755:9: sparse: sparse: cast truncates bits from constant value (ffffffff becomes ffff)

vim +91 include/linux/percpu-rwsem.h

80127a39681bd6 Peter Zijlstra  2016-07-14  83  
02e525b2aff1d6 Peter Zijlstra  2019-02-21  84  static inline void percpu_up_read(struct percpu_rw_semaphore *sem)
80127a39681bd6 Peter Zijlstra  2016-07-14  85  {
02e525b2aff1d6 Peter Zijlstra  2019-02-21  86  	preempt_disable();
80127a39681bd6 Peter Zijlstra  2016-07-14  87  	/*
80127a39681bd6 Peter Zijlstra  2016-07-14  88  	 * Same as in percpu_down_read().
80127a39681bd6 Peter Zijlstra  2016-07-14  89  	 */
80127a39681bd6 Peter Zijlstra  2016-07-14  90  	if (likely(rcu_sync_is_idle(&sem->rss)))
80127a39681bd6 Peter Zijlstra  2016-07-14 @91  		__this_cpu_dec(*sem->read_count);
80127a39681bd6 Peter Zijlstra  2016-07-14  92  	else
80127a39681bd6 Peter Zijlstra  2016-07-14  93  		__percpu_up_read(sem); /* Unconditional memory barrier */
80127a39681bd6 Peter Zijlstra  2016-07-14  94  	preempt_enable();
80127a39681bd6 Peter Zijlstra  2016-07-14  95  
5facae4f3549b5 Qian Cai        2019-09-19  96  	rwsem_release(&sem->rw_sem.dep_map, _RET_IP_);
80127a39681bd6 Peter Zijlstra  2016-07-14  97  }
5c1eabe68501d1 Mikulas Patocka 2012-10-22  98  

:::::: The code at line 91 was first introduced by commit
:::::: 80127a39681bd68c959f0953f84a830cbd7c3b1c locking/percpu-rwsem: Optimize readers and reduce global impact

:::::: TO: Peter Zijlstra <peterz@infradead.org>
:::::: CC: Ingo Molnar <mingo@kernel.org>

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 72383 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/7] x86: Clean up percpu operations
  2020-05-17 15:29 [PATCH 0/7] x86: Clean up percpu operations Brian Gerst
                   ` (6 preceding siblings ...)
  2020-05-17 15:29 ` [PATCH 7/7] x86/percpu: Clean up percpu_cmpxchg_op() Brian Gerst
@ 2020-05-18 19:19 ` Nick Desaulniers
  2020-05-18 23:54   ` Nick Desaulniers
  7 siblings, 1 reply; 24+ messages in thread
From: Nick Desaulniers @ 2020-05-18 19:19 UTC (permalink / raw)
  To: Brian Gerst
  Cc: LKML, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin,
	Andy Lutomirski, Peter Zijlstra

On Sun, May 17, 2020 at 8:29 AM Brian Gerst <brgerst@gmail.com> wrote:
>
> The core percpu operations already have a switch on the width of the
> data type, which resulted in an extra amount of dead code being

Thanks for the series Brian.  The duplication between knowing the size
at the call site and the switch is exactly what I picked up on, too.
This past weekend I started a similar cleanup, but only got through 2
of the 6 functions you cleaned up.  And via quick glance, your series
looks much better than mine does.  I'll sit down and review+test these
after lunch.  I appreciate you taking the time to create this series.

> generated with the x86 operations having another switch.  This patch set
> rewrites the x86 ops to remove the switch.  Additional cleanups are to
> use named assembly operands, and to cast variables to the width used in
> the assembly to make Clang happy.
>
> Brian Gerst (7):
>   x86/percpu: Introduce size abstraction macros
>   x86/percpu: Clean up percpu_to_op()
>   x86/percpu: Clean up percpu_from_op()
>   x86/percpu: Clean up percpu_add_op()
>   x86/percpu: Clean up percpu_add_return_op()
>   x86/percpu: Clean up percpu_xchg_op()
>   x86/percpu: Clean up percpu_cmpxchg_op()
>
>  arch/x86/include/asm/percpu.h | 447 ++++++++++++----------------------
>  1 file changed, 158 insertions(+), 289 deletions(-)
>
> --
> 2.25.4
>


-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/7] x86/percpu: Clean up percpu_to_op()
  2020-05-17 15:29 ` [PATCH 2/7] x86/percpu: Clean up percpu_to_op() Brian Gerst
@ 2020-05-18 21:15   ` Nick Desaulniers
  2020-05-19  3:38     ` Brian Gerst
  0 siblings, 1 reply; 24+ messages in thread
From: Nick Desaulniers @ 2020-05-18 21:15 UTC (permalink / raw)
  To: Brian Gerst
  Cc: LKML, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin,
	Andy Lutomirski, Peter Zijlstra

On Sun, May 17, 2020 at 8:29 AM Brian Gerst <brgerst@gmail.com> wrote:
>
> The core percpu macros already have a switch on the data size, so the switch
> in the x86 code is redundant and produces more dead code.
>
> Also use appropriate types for the width of the instructions.  This avoids
> errors when compiling with Clang.
>
> Signed-off-by: Brian Gerst <brgerst@gmail.com>
> ---
>  arch/x86/include/asm/percpu.h | 90 ++++++++++++++---------------------
>  1 file changed, 35 insertions(+), 55 deletions(-)
>
> diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
> index 89f918a3e99b..233c7a78d1a6 100644
> --- a/arch/x86/include/asm/percpu.h
> +++ b/arch/x86/include/asm/percpu.h
> @@ -117,37 +117,17 @@ extern void __bad_percpu_size(void);
>  #define __pcpu_reg_imm_4(x) "ri" (x)
>  #define __pcpu_reg_imm_8(x) "re" (x)
>
> -#define percpu_to_op(qual, op, var, val)               \
> -do {                                                   \
> -       typedef typeof(var) pto_T__;                    \
> -       if (0) {                                        \
> -               pto_T__ pto_tmp__;                      \
> -               pto_tmp__ = (val);                      \
> -               (void)pto_tmp__;                        \
> -       }                                               \
> -       switch (sizeof(var)) {                          \
> -       case 1:                                         \
> -               asm qual (op "b %1,"__percpu_arg(0)     \
> -                   : "+m" (var)                        \
> -                   : "qi" ((pto_T__)(val)));           \
> -               break;                                  \
> -       case 2:                                         \
> -               asm qual (op "w %1,"__percpu_arg(0)     \
> -                   : "+m" (var)                        \
> -                   : "ri" ((pto_T__)(val)));           \
> -               break;                                  \
> -       case 4:                                         \
> -               asm qual (op "l %1,"__percpu_arg(0)     \
> -                   : "+m" (var)                        \
> -                   : "ri" ((pto_T__)(val)));           \
> -               break;                                  \
> -       case 8:                                         \
> -               asm qual (op "q %1,"__percpu_arg(0)     \
> -                   : "+m" (var)                        \
> -                   : "re" ((pto_T__)(val)));           \
> -               break;                                  \
> -       default: __bad_percpu_size();                   \
> -       }                                               \
> +#define percpu_to_op(size, qual, op, _var, _val)                       \
> +do {                                                                   \
> +       __pcpu_type_##size pto_val__ = __pcpu_cast_##size(_val);        \
> +       if (0) {                                                        \
> +               typeof(_var) pto_tmp__;                                 \
> +               pto_tmp__ = (_val);                                     \
> +               (void)pto_tmp__;                                        \
> +       }                                                               \

Please replace the whole `if (0)` block with:
```c
__same_type(_var, _val);
```
from include/linux/compiler.h.

> +       asm qual(__pcpu_op2_##size(op, "%[val]", __percpu_arg([var]))   \
> +           : [var] "+m" (_var)                                         \
> +           : [val] __pcpu_reg_imm_##size(pto_val__));                  \
>  } while (0)
>
>  /*
> @@ -425,18 +405,18 @@ do {                                                                      \
>  #define raw_cpu_read_2(pcp)            percpu_from_op(, "mov", pcp)
>  #define raw_cpu_read_4(pcp)            percpu_from_op(, "mov", pcp)
>
> -#define raw_cpu_write_1(pcp, val)      percpu_to_op(, "mov", (pcp), val)
> -#define raw_cpu_write_2(pcp, val)      percpu_to_op(, "mov", (pcp), val)
> -#define raw_cpu_write_4(pcp, val)      percpu_to_op(, "mov", (pcp), val)
> +#define raw_cpu_write_1(pcp, val)      percpu_to_op(1, , "mov", (pcp), val)
> +#define raw_cpu_write_2(pcp, val)      percpu_to_op(2, , "mov", (pcp), val)
> +#define raw_cpu_write_4(pcp, val)      percpu_to_op(4, , "mov", (pcp), val)
>  #define raw_cpu_add_1(pcp, val)                percpu_add_op(, (pcp), val)
>  #define raw_cpu_add_2(pcp, val)                percpu_add_op(, (pcp), val)
>  #define raw_cpu_add_4(pcp, val)                percpu_add_op(, (pcp), val)
> -#define raw_cpu_and_1(pcp, val)                percpu_to_op(, "and", (pcp), val)
> -#define raw_cpu_and_2(pcp, val)                percpu_to_op(, "and", (pcp), val)
> -#define raw_cpu_and_4(pcp, val)                percpu_to_op(, "and", (pcp), val)
> -#define raw_cpu_or_1(pcp, val)         percpu_to_op(, "or", (pcp), val)
> -#define raw_cpu_or_2(pcp, val)         percpu_to_op(, "or", (pcp), val)
> -#define raw_cpu_or_4(pcp, val)         percpu_to_op(, "or", (pcp), val)
> +#define raw_cpu_and_1(pcp, val)                percpu_to_op(1, , "and", (pcp), val)
> +#define raw_cpu_and_2(pcp, val)                percpu_to_op(2, , "and", (pcp), val)
> +#define raw_cpu_and_4(pcp, val)                percpu_to_op(4, , "and", (pcp), val)
> +#define raw_cpu_or_1(pcp, val)         percpu_to_op(1, , "or", (pcp), val)
> +#define raw_cpu_or_2(pcp, val)         percpu_to_op(2, , "or", (pcp), val)
> +#define raw_cpu_or_4(pcp, val)         percpu_to_op(4, , "or", (pcp), val)
>
>  /*
>   * raw_cpu_xchg() can use a load-store since it is not required to be
> @@ -456,18 +436,18 @@ do {                                                                      \
>  #define this_cpu_read_1(pcp)           percpu_from_op(volatile, "mov", pcp)
>  #define this_cpu_read_2(pcp)           percpu_from_op(volatile, "mov", pcp)
>  #define this_cpu_read_4(pcp)           percpu_from_op(volatile, "mov", pcp)
> -#define this_cpu_write_1(pcp, val)     percpu_to_op(volatile, "mov", (pcp), val)
> -#define this_cpu_write_2(pcp, val)     percpu_to_op(volatile, "mov", (pcp), val)
> -#define this_cpu_write_4(pcp, val)     percpu_to_op(volatile, "mov", (pcp), val)
> +#define this_cpu_write_1(pcp, val)     percpu_to_op(1, volatile, "mov", (pcp), val)
> +#define this_cpu_write_2(pcp, val)     percpu_to_op(2, volatile, "mov", (pcp), val)
> +#define this_cpu_write_4(pcp, val)     percpu_to_op(4, volatile, "mov", (pcp), val)
>  #define this_cpu_add_1(pcp, val)       percpu_add_op(volatile, (pcp), val)
>  #define this_cpu_add_2(pcp, val)       percpu_add_op(volatile, (pcp), val)
>  #define this_cpu_add_4(pcp, val)       percpu_add_op(volatile, (pcp), val)
> -#define this_cpu_and_1(pcp, val)       percpu_to_op(volatile, "and", (pcp), val)
> -#define this_cpu_and_2(pcp, val)       percpu_to_op(volatile, "and", (pcp), val)
> -#define this_cpu_and_4(pcp, val)       percpu_to_op(volatile, "and", (pcp), val)
> -#define this_cpu_or_1(pcp, val)                percpu_to_op(volatile, "or", (pcp), val)
> -#define this_cpu_or_2(pcp, val)                percpu_to_op(volatile, "or", (pcp), val)
> -#define this_cpu_or_4(pcp, val)                percpu_to_op(volatile, "or", (pcp), val)
> +#define this_cpu_and_1(pcp, val)       percpu_to_op(1, volatile, "and", (pcp), val)
> +#define this_cpu_and_2(pcp, val)       percpu_to_op(2, volatile, "and", (pcp), val)
> +#define this_cpu_and_4(pcp, val)       percpu_to_op(4, volatile, "and", (pcp), val)
> +#define this_cpu_or_1(pcp, val)                percpu_to_op(1, volatile, "or", (pcp), val)
> +#define this_cpu_or_2(pcp, val)                percpu_to_op(2, volatile, "or", (pcp), val)
> +#define this_cpu_or_4(pcp, val)                percpu_to_op(4, volatile, "or", (pcp), val)
>  #define this_cpu_xchg_1(pcp, nval)     percpu_xchg_op(volatile, pcp, nval)
>  #define this_cpu_xchg_2(pcp, nval)     percpu_xchg_op(volatile, pcp, nval)
>  #define this_cpu_xchg_4(pcp, nval)     percpu_xchg_op(volatile, pcp, nval)
> @@ -509,19 +489,19 @@ do {                                                                      \
>   */
>  #ifdef CONFIG_X86_64
>  #define raw_cpu_read_8(pcp)                    percpu_from_op(, "mov", pcp)
> -#define raw_cpu_write_8(pcp, val)              percpu_to_op(, "mov", (pcp), val)
> +#define raw_cpu_write_8(pcp, val)              percpu_to_op(8, , "mov", (pcp), val)
>  #define raw_cpu_add_8(pcp, val)                        percpu_add_op(, (pcp), val)
> -#define raw_cpu_and_8(pcp, val)                        percpu_to_op(, "and", (pcp), val)
> -#define raw_cpu_or_8(pcp, val)                 percpu_to_op(, "or", (pcp), val)
> +#define raw_cpu_and_8(pcp, val)                        percpu_to_op(8, , "and", (pcp), val)
> +#define raw_cpu_or_8(pcp, val)                 percpu_to_op(8, , "or", (pcp), val)
>  #define raw_cpu_add_return_8(pcp, val)         percpu_add_return_op(, pcp, val)
>  #define raw_cpu_xchg_8(pcp, nval)              raw_percpu_xchg_op(pcp, nval)
>  #define raw_cpu_cmpxchg_8(pcp, oval, nval)     percpu_cmpxchg_op(, pcp, oval, nval)
>
>  #define this_cpu_read_8(pcp)                   percpu_from_op(volatile, "mov", pcp)
> -#define this_cpu_write_8(pcp, val)             percpu_to_op(volatile, "mov", (pcp), val)
> +#define this_cpu_write_8(pcp, val)             percpu_to_op(8, volatile, "mov", (pcp), val)
>  #define this_cpu_add_8(pcp, val)               percpu_add_op(volatile, (pcp), val)
> -#define this_cpu_and_8(pcp, val)               percpu_to_op(volatile, "and", (pcp), val)
> -#define this_cpu_or_8(pcp, val)                        percpu_to_op(volatile, "or", (pcp), val)
> +#define this_cpu_and_8(pcp, val)               percpu_to_op(8, volatile, "and", (pcp), val)
> +#define this_cpu_or_8(pcp, val)                        percpu_to_op(8, volatile, "or", (pcp), val)
>  #define this_cpu_add_return_8(pcp, val)                percpu_add_return_op(volatile, pcp, val)
>  #define this_cpu_xchg_8(pcp, nval)             percpu_xchg_op(volatile, pcp, nval)
>  #define this_cpu_cmpxchg_8(pcp, oval, nval)    percpu_cmpxchg_op(volatile, pcp, oval, nval)
> --
> 2.25.4
>


-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/7] x86/percpu: Clean up percpu_from_op()
  2020-05-17 15:29 ` [PATCH 3/7] x86/percpu: Clean up percpu_from_op() Brian Gerst
@ 2020-05-18 21:27   ` Nick Desaulniers
  0 siblings, 0 replies; 24+ messages in thread
From: Nick Desaulniers @ 2020-05-18 21:27 UTC (permalink / raw)
  To: Brian Gerst
  Cc: LKML, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin,
	Andy Lutomirski, Peter Zijlstra

On Sun, May 17, 2020 at 8:29 AM Brian Gerst <brgerst@gmail.com> wrote:
>
> The core percpu macros already have a switch on the data size, so the switch
> in the x86 code is redundant and produces more dead code.
>
> Also use appropriate types for the width of the instructions.  This avoids
> errors when compiling with Clang.
>
> Signed-off-by: Brian Gerst <brgerst@gmail.com>

Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>

> ---
>  arch/x86/include/asm/percpu.h | 50 +++++++++++------------------------
>  1 file changed, 15 insertions(+), 35 deletions(-)
>
> diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
> index 233c7a78d1a6..93f33202492d 100644
> --- a/arch/x86/include/asm/percpu.h
> +++ b/arch/x86/include/asm/percpu.h
> @@ -190,33 +190,13 @@ do {                                                                      \
>         }                                                               \
>  } while (0)
>
> -#define percpu_from_op(qual, op, var)                  \
> -({                                                     \
> -       typeof(var) pfo_ret__;                          \
> -       switch (sizeof(var)) {                          \
> -       case 1:                                         \
> -               asm qual (op "b "__percpu_arg(1)",%0"   \
> -                   : "=q" (pfo_ret__)                  \
> -                   : "m" (var));                       \
> -               break;                                  \
> -       case 2:                                         \
> -               asm qual (op "w "__percpu_arg(1)",%0"   \
> -                   : "=r" (pfo_ret__)                  \
> -                   : "m" (var));                       \
> -               break;                                  \
> -       case 4:                                         \
> -               asm qual (op "l "__percpu_arg(1)",%0"   \
> -                   : "=r" (pfo_ret__)                  \
> -                   : "m" (var));                       \
> -               break;                                  \
> -       case 8:                                         \
> -               asm qual (op "q "__percpu_arg(1)",%0"   \
> -                   : "=r" (pfo_ret__)                  \
> -                   : "m" (var));                       \
> -               break;                                  \
> -       default: __bad_percpu_size();                   \
> -       }                                               \
> -       pfo_ret__;                                      \
> +#define percpu_from_op(size, qual, op, _var)                           \
> +({                                                                     \
> +       __pcpu_type_##size pfo_val__;                                   \
> +       asm qual (__pcpu_op2_##size(op, __percpu_arg([var]), "%[val]")  \
> +           : [val] __pcpu_reg_##size("=", pfo_val__)                   \
> +           : [var] "m" (_var));                                        \
> +       (typeof(_var))(unsigned long) pfo_val__;                        \
>  })
>
>  #define percpu_stable_op(op, var)                      \
> @@ -401,9 +381,9 @@ do {                                                                        \
>   */
>  #define this_cpu_read_stable(var)      percpu_stable_op("mov", var)
>
> -#define raw_cpu_read_1(pcp)            percpu_from_op(, "mov", pcp)
> -#define raw_cpu_read_2(pcp)            percpu_from_op(, "mov", pcp)
> -#define raw_cpu_read_4(pcp)            percpu_from_op(, "mov", pcp)
> +#define raw_cpu_read_1(pcp)            percpu_from_op(1, , "mov", pcp)
> +#define raw_cpu_read_2(pcp)            percpu_from_op(2, , "mov", pcp)
> +#define raw_cpu_read_4(pcp)            percpu_from_op(4, , "mov", pcp)
>
>  #define raw_cpu_write_1(pcp, val)      percpu_to_op(1, , "mov", (pcp), val)
>  #define raw_cpu_write_2(pcp, val)      percpu_to_op(2, , "mov", (pcp), val)
> @@ -433,9 +413,9 @@ do {                                                                        \
>  #define raw_cpu_xchg_2(pcp, val)       raw_percpu_xchg_op(pcp, val)
>  #define raw_cpu_xchg_4(pcp, val)       raw_percpu_xchg_op(pcp, val)
>
> -#define this_cpu_read_1(pcp)           percpu_from_op(volatile, "mov", pcp)
> -#define this_cpu_read_2(pcp)           percpu_from_op(volatile, "mov", pcp)
> -#define this_cpu_read_4(pcp)           percpu_from_op(volatile, "mov", pcp)
> +#define this_cpu_read_1(pcp)           percpu_from_op(1, volatile, "mov", pcp)
> +#define this_cpu_read_2(pcp)           percpu_from_op(2, volatile, "mov", pcp)
> +#define this_cpu_read_4(pcp)           percpu_from_op(4, volatile, "mov", pcp)
>  #define this_cpu_write_1(pcp, val)     percpu_to_op(1, volatile, "mov", (pcp), val)
>  #define this_cpu_write_2(pcp, val)     percpu_to_op(2, volatile, "mov", (pcp), val)
>  #define this_cpu_write_4(pcp, val)     percpu_to_op(4, volatile, "mov", (pcp), val)
> @@ -488,7 +468,7 @@ do {                                                                        \
>   * 32 bit must fall back to generic operations.
>   */
>  #ifdef CONFIG_X86_64
> -#define raw_cpu_read_8(pcp)                    percpu_from_op(, "mov", pcp)
> +#define raw_cpu_read_8(pcp)                    percpu_from_op(8, , "mov", pcp)
>  #define raw_cpu_write_8(pcp, val)              percpu_to_op(8, , "mov", (pcp), val)
>  #define raw_cpu_add_8(pcp, val)                        percpu_add_op(, (pcp), val)
>  #define raw_cpu_and_8(pcp, val)                        percpu_to_op(8, , "and", (pcp), val)
> @@ -497,7 +477,7 @@ do {                                                                        \
>  #define raw_cpu_xchg_8(pcp, nval)              raw_percpu_xchg_op(pcp, nval)
>  #define raw_cpu_cmpxchg_8(pcp, oval, nval)     percpu_cmpxchg_op(, pcp, oval, nval)
>
> -#define this_cpu_read_8(pcp)                   percpu_from_op(volatile, "mov", pcp)
> +#define this_cpu_read_8(pcp)                   percpu_from_op(8, volatile, "mov", pcp)
>  #define this_cpu_write_8(pcp, val)             percpu_to_op(8, volatile, "mov", (pcp), val)
>  #define this_cpu_add_8(pcp, val)               percpu_add_op(volatile, (pcp), val)
>  #define this_cpu_and_8(pcp, val)               percpu_to_op(8, volatile, "and", (pcp), val)
> --
> 2.25.4
>


-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 6/7] x86/percpu: Clean up percpu_xchg_op()
  2020-05-17 15:29 ` [PATCH 6/7] x86/percpu: Clean up percpu_xchg_op() Brian Gerst
@ 2020-05-18 22:15   ` Nick Desaulniers
  0 siblings, 0 replies; 24+ messages in thread
From: Nick Desaulniers @ 2020-05-18 22:15 UTC (permalink / raw)
  To: Brian Gerst
  Cc: LKML, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin,
	Andy Lutomirski, Peter Zijlstra

On Sun, May 17, 2020 at 8:29 AM Brian Gerst <brgerst@gmail.com> wrote:
>
> The core percpu macros already have a switch on the data size, so the switch
> in the x86 code is redundant and produces more dead code.
>
> Also use appropriate types for the width of the instructions.  This avoids
> errors when compiling with Clang.
>
> Signed-off-by: Brian Gerst <brgerst@gmail.com>

Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>

> ---
>  arch/x86/include/asm/percpu.h | 61 +++++++++++------------------------
>  1 file changed, 18 insertions(+), 43 deletions(-)
>
> diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
> index ac8c391a190e..3c95ab3c99cd 100644
> --- a/arch/x86/include/asm/percpu.h
> +++ b/arch/x86/include/asm/percpu.h
> @@ -215,46 +215,21 @@ do {                                                                      \
>   * expensive due to the implied lock prefix.  The processor cannot prefetch
>   * cachelines if xchg is used.
>   */
> -#define percpu_xchg_op(qual, var, nval)                                        \
> +#define percpu_xchg_op(size, qual, _var, _nval)                                \
>  ({                                                                     \
> -       typeof(var) pxo_ret__;                                          \
> -       typeof(var) pxo_new__ = (nval);                                 \
> -       switch (sizeof(var)) {                                          \
> -       case 1:                                                         \
> -               asm qual ("\n\tmov "__percpu_arg(1)",%%al"              \
> -                   "\n1:\tcmpxchgb %2, "__percpu_arg(1)                \
> -                   "\n\tjnz 1b"                                        \
> -                           : "=&a" (pxo_ret__), "+m" (var)             \
> -                           : "q" (pxo_new__)                           \
> -                           : "memory");                                \
> -               break;                                                  \
> -       case 2:                                                         \
> -               asm qual ("\n\tmov "__percpu_arg(1)",%%ax"              \
> -                   "\n1:\tcmpxchgw %2, "__percpu_arg(1)                \
> -                   "\n\tjnz 1b"                                        \
> -                           : "=&a" (pxo_ret__), "+m" (var)             \
> -                           : "r" (pxo_new__)                           \
> -                           : "memory");                                \
> -               break;                                                  \
> -       case 4:                                                         \
> -               asm qual ("\n\tmov "__percpu_arg(1)",%%eax"             \
> -                   "\n1:\tcmpxchgl %2, "__percpu_arg(1)                \
> -                   "\n\tjnz 1b"                                        \
> -                           : "=&a" (pxo_ret__), "+m" (var)             \
> -                           : "r" (pxo_new__)                           \
> -                           : "memory");                                \
> -               break;                                                  \
> -       case 8:                                                         \
> -               asm qual ("\n\tmov "__percpu_arg(1)",%%rax"             \
> -                   "\n1:\tcmpxchgq %2, "__percpu_arg(1)                \
> -                   "\n\tjnz 1b"                                        \
> -                           : "=&a" (pxo_ret__), "+m" (var)             \
> -                           : "r" (pxo_new__)                           \
> -                           : "memory");                                \
> -               break;                                                  \
> -       default: __bad_percpu_size();                                   \
> -       }                                                               \
> -       pxo_ret__;                                                      \
> +       __pcpu_type_##size pxo_old__;                                   \
> +       __pcpu_type_##size pxo_new__ = __pcpu_cast_##size(_nval);       \
> +       asm qual (__pcpu_op2_##size("mov", __percpu_arg([var]),         \
> +                                   "%[oval]")                          \
> +                 "\n1:\t"                                              \
> +                 __pcpu_op2_##size("cmpxchg", "%[nval]",               \
> +                                   __percpu_arg([var]))                \
> +                 "\n\tjnz 1b"                                          \
> +                 : [oval] "=&a" (pxo_old__),                           \
> +                   [var] "+m" (_var)                                   \
> +                 : [nval] __pcpu_reg_##size(, pxo_new__)               \
> +                 : "memory");                                          \
> +       (typeof(_var))(unsigned long) pxo_old__;                        \
>  })
>
>  /*
> @@ -354,9 +329,9 @@ do {                                                                        \
>  #define this_cpu_or_1(pcp, val)                percpu_to_op(1, volatile, "or", (pcp), val)
>  #define this_cpu_or_2(pcp, val)                percpu_to_op(2, volatile, "or", (pcp), val)
>  #define this_cpu_or_4(pcp, val)                percpu_to_op(4, volatile, "or", (pcp), val)
> -#define this_cpu_xchg_1(pcp, nval)     percpu_xchg_op(volatile, pcp, nval)
> -#define this_cpu_xchg_2(pcp, nval)     percpu_xchg_op(volatile, pcp, nval)
> -#define this_cpu_xchg_4(pcp, nval)     percpu_xchg_op(volatile, pcp, nval)
> +#define this_cpu_xchg_1(pcp, nval)     percpu_xchg_op(1, volatile, pcp, nval)
> +#define this_cpu_xchg_2(pcp, nval)     percpu_xchg_op(2, volatile, pcp, nval)
> +#define this_cpu_xchg_4(pcp, nval)     percpu_xchg_op(4, volatile, pcp, nval)
>
>  #define raw_cpu_add_return_1(pcp, val)         percpu_add_return_op(1, , pcp, val)
>  #define raw_cpu_add_return_2(pcp, val)         percpu_add_return_op(2, , pcp, val)
> @@ -409,7 +384,7 @@ do {                                                                        \
>  #define this_cpu_and_8(pcp, val)               percpu_to_op(8, volatile, "and", (pcp), val)
>  #define this_cpu_or_8(pcp, val)                        percpu_to_op(8, volatile, "or", (pcp), val)
>  #define this_cpu_add_return_8(pcp, val)                percpu_add_return_op(8, volatile, pcp, val)
> -#define this_cpu_xchg_8(pcp, nval)             percpu_xchg_op(volatile, pcp, nval)
> +#define this_cpu_xchg_8(pcp, nval)             percpu_xchg_op(8, volatile, pcp, nval)
>  #define this_cpu_cmpxchg_8(pcp, oval, nval)    percpu_cmpxchg_op(volatile, pcp, oval, nval)
>
>  /*
> --
> 2.25.4
>


-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 7/7] x86/percpu: Clean up percpu_cmpxchg_op()
  2020-05-17 15:29 ` [PATCH 7/7] x86/percpu: Clean up percpu_cmpxchg_op() Brian Gerst
@ 2020-05-18 22:28   ` Nick Desaulniers
  0 siblings, 0 replies; 24+ messages in thread
From: Nick Desaulniers @ 2020-05-18 22:28 UTC (permalink / raw)
  To: Brian Gerst
  Cc: LKML, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin,
	Andy Lutomirski, Peter Zijlstra

On Sun, May 17, 2020 at 8:29 AM Brian Gerst <brgerst@gmail.com> wrote:
>
> The core percpu macros already have a switch on the data size, so the switch
> in the x86 code is redundant and produces more dead code.
>
> Also use appropriate types for the width of the instructions.  This avoids
> errors when compiling with Clang.
>
> Signed-off-by: Brian Gerst <brgerst@gmail.com>
> ---
>  arch/x86/include/asm/percpu.h | 58 +++++++++++------------------------
>  1 file changed, 18 insertions(+), 40 deletions(-)
>
> diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
> index 3c95ab3c99cd..b61d4fc5568e 100644
> --- a/arch/x86/include/asm/percpu.h
> +++ b/arch/x86/include/asm/percpu.h
> @@ -236,39 +236,17 @@ do {                                                                      \
>   * cmpxchg has no such implied lock semantics as a result it is much
>   * more efficient for cpu local operations.
>   */
> -#define percpu_cmpxchg_op(qual, var, oval, nval)                       \
> +#define percpu_cmpxchg_op(size, qual, _var, _oval, _nval)              \
>  ({                                                                     \
> -       typeof(var) pco_ret__;                                          \
> -       typeof(var) pco_old__ = (oval);                                 \
> -       typeof(var) pco_new__ = (nval);                                 \
> -       switch (sizeof(var)) {                                          \
> -       case 1:                                                         \
> -               asm qual ("cmpxchgb %2, "__percpu_arg(1)                \
> -                           : "=a" (pco_ret__), "+m" (var)              \
> -                           : "q" (pco_new__), "0" (pco_old__)          \
> -                           : "memory");                                \
> -               break;                                                  \
> -       case 2:                                                         \
> -               asm qual ("cmpxchgw %2, "__percpu_arg(1)                \
> -                           : "=a" (pco_ret__), "+m" (var)              \
> -                           : "r" (pco_new__), "0" (pco_old__)          \
> -                           : "memory");                                \
> -               break;                                                  \
> -       case 4:                                                         \
> -               asm qual ("cmpxchgl %2, "__percpu_arg(1)                \
> -                           : "=a" (pco_ret__), "+m" (var)              \
> -                           : "r" (pco_new__), "0" (pco_old__)          \
> -                           : "memory");                                \
> -               break;                                                  \
> -       case 8:                                                         \
> -               asm qual ("cmpxchgq %2, "__percpu_arg(1)                \
> -                           : "=a" (pco_ret__), "+m" (var)              \
> -                           : "r" (pco_new__), "0" (pco_old__)          \
> -                           : "memory");                                \
> -               break;                                                  \
> -       default: __bad_percpu_size();                                   \
> -       }                                                               \
> -       pco_ret__;                                                      \
> +       __pcpu_type_##size pco_old__ = __pcpu_cast_##size(_oval);       \
> +       __pcpu_type_##size pco_new__ = __pcpu_cast_##size(_nval);       \
> +       asm qual (__pcpu_op2_##size("cmpxchg", "%[nval]",               \
> +                                   __percpu_arg([var]))                \
> +                 : [oval] "+a" (pco_old__),                            \
> +                   [var] "+m" (_var)                                   \
> +                 : [nval] __pcpu_reg_##size(, pco_new__)               \

Looks like we're no longer using "=a" and "0" constraints.  Looking
these up for reference for other reviewers:

"0" [0]:
Input constraints can also be digits (for example, "0"). This
indicates that the specified input must be in the same place as the
output constraint at the (zero-based) index in the output constraint
list. When using asmSymbolicName syntax for the output operands, you
may use these names (enclosed in brackets ‘[]’) instead of digits.
[0] https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Extended-Asm

"+" [1]:
Means that this operand is both read and written by the instruction.
[1] https://gcc.gnu.org/onlinedocs/gcc/Modifiers.html#Modifiers

"=" [1]:
Means that this operand is written to by this instruction: the
previous value is discarded and replaced by new data.

So this looks like a nice simplification.
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>

> +                 : "memory");                                          \
> +       (typeof(_var))(unsigned long) pco_old__;                        \
>  })
>
>  /*
> @@ -336,16 +314,16 @@ do {                                                                      \
>  #define raw_cpu_add_return_1(pcp, val)         percpu_add_return_op(1, , pcp, val)
>  #define raw_cpu_add_return_2(pcp, val)         percpu_add_return_op(2, , pcp, val)
>  #define raw_cpu_add_return_4(pcp, val)         percpu_add_return_op(4, , pcp, val)
> -#define raw_cpu_cmpxchg_1(pcp, oval, nval)     percpu_cmpxchg_op(, pcp, oval, nval)
> -#define raw_cpu_cmpxchg_2(pcp, oval, nval)     percpu_cmpxchg_op(, pcp, oval, nval)
> -#define raw_cpu_cmpxchg_4(pcp, oval, nval)     percpu_cmpxchg_op(, pcp, oval, nval)
> +#define raw_cpu_cmpxchg_1(pcp, oval, nval)     percpu_cmpxchg_op(1, , pcp, oval, nval)
> +#define raw_cpu_cmpxchg_2(pcp, oval, nval)     percpu_cmpxchg_op(2, , pcp, oval, nval)
> +#define raw_cpu_cmpxchg_4(pcp, oval, nval)     percpu_cmpxchg_op(4, , pcp, oval, nval)
>
>  #define this_cpu_add_return_1(pcp, val)                percpu_add_return_op(1, volatile, pcp, val)
>  #define this_cpu_add_return_2(pcp, val)                percpu_add_return_op(2, volatile, pcp, val)
>  #define this_cpu_add_return_4(pcp, val)                percpu_add_return_op(4, volatile, pcp, val)
> -#define this_cpu_cmpxchg_1(pcp, oval, nval)    percpu_cmpxchg_op(volatile, pcp, oval, nval)
> -#define this_cpu_cmpxchg_2(pcp, oval, nval)    percpu_cmpxchg_op(volatile, pcp, oval, nval)
> -#define this_cpu_cmpxchg_4(pcp, oval, nval)    percpu_cmpxchg_op(volatile, pcp, oval, nval)
> +#define this_cpu_cmpxchg_1(pcp, oval, nval)    percpu_cmpxchg_op(1, volatile, pcp, oval, nval)
> +#define this_cpu_cmpxchg_2(pcp, oval, nval)    percpu_cmpxchg_op(2, volatile, pcp, oval, nval)
> +#define this_cpu_cmpxchg_4(pcp, oval, nval)    percpu_cmpxchg_op(4, volatile, pcp, oval, nval)
>
>  #ifdef CONFIG_X86_CMPXCHG64
>  #define percpu_cmpxchg8b_double(pcp1, pcp2, o1, o2, n1, n2)            \
> @@ -376,7 +354,7 @@ do {                                                                        \
>  #define raw_cpu_or_8(pcp, val)                 percpu_to_op(8, , "or", (pcp), val)
>  #define raw_cpu_add_return_8(pcp, val)         percpu_add_return_op(8, , pcp, val)
>  #define raw_cpu_xchg_8(pcp, nval)              raw_percpu_xchg_op(pcp, nval)
> -#define raw_cpu_cmpxchg_8(pcp, oval, nval)     percpu_cmpxchg_op(, pcp, oval, nval)
> +#define raw_cpu_cmpxchg_8(pcp, oval, nval)     percpu_cmpxchg_op(8, , pcp, oval, nval)
>
>  #define this_cpu_read_8(pcp)                   percpu_from_op(8, volatile, "mov", pcp)
>  #define this_cpu_write_8(pcp, val)             percpu_to_op(8, volatile, "mov", (pcp), val)
> @@ -385,7 +363,7 @@ do {                                                                        \
>  #define this_cpu_or_8(pcp, val)                        percpu_to_op(8, volatile, "or", (pcp), val)
>  #define this_cpu_add_return_8(pcp, val)                percpu_add_return_op(8, volatile, pcp, val)
>  #define this_cpu_xchg_8(pcp, nval)             percpu_xchg_op(8, volatile, pcp, nval)
> -#define this_cpu_cmpxchg_8(pcp, oval, nval)    percpu_cmpxchg_op(volatile, pcp, oval, nval)
> +#define this_cpu_cmpxchg_8(pcp, oval, nval)    percpu_cmpxchg_op(8, volatile, pcp, oval, nval)
>
>  /*
>   * Pretty complex macro to generate cmpxchg16 instruction.  The instruction
> --
> 2.25.4
>


-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 5/7] x86/percpu: Clean up percpu_add_return_op()
  2020-05-17 15:29 ` [PATCH 5/7] x86/percpu: Clean up percpu_add_return_op() Brian Gerst
@ 2020-05-18 22:46   ` Nick Desaulniers
  2020-05-18 23:45     ` Brian Gerst
  0 siblings, 1 reply; 24+ messages in thread
From: Nick Desaulniers @ 2020-05-18 22:46 UTC (permalink / raw)
  To: Brian Gerst
  Cc: LKML, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin,
	Andy Lutomirski, Peter Zijlstra

On Sun, May 17, 2020 at 8:29 AM Brian Gerst <brgerst@gmail.com> wrote:
>
> The core percpu macros already have a switch on the data size, so the switch
> in the x86 code is redundant and produces more dead code.
>
> Also use appropriate types for the width of the instructions.  This avoids
> errors when compiling with Clang.
>
> Signed-off-by: Brian Gerst <brgerst@gmail.com>
> ---
>  arch/x86/include/asm/percpu.h | 51 +++++++++++------------------------
>  1 file changed, 16 insertions(+), 35 deletions(-)
>
> diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
> index 21c5013a681a..ac8c391a190e 100644
> --- a/arch/x86/include/asm/percpu.h
> +++ b/arch/x86/include/asm/percpu.h
> @@ -199,34 +199,15 @@ do {                                                                      \
>  /*
>   * Add return operation
>   */
> -#define percpu_add_return_op(qual, var, val)                           \
> +#define percpu_add_return_op(size, qual, _var, _val)                   \
>  ({                                                                     \
> -       typeof(var) paro_ret__ = val;                                   \
> -       switch (sizeof(var)) {                                          \
> -       case 1:                                                         \
> -               asm qual ("xaddb %0, "__percpu_arg(1)                   \
> -                           : "+q" (paro_ret__), "+m" (var)             \
> -                           : : "memory");                              \
> -               break;                                                  \
> -       case 2:                                                         \
> -               asm qual ("xaddw %0, "__percpu_arg(1)                   \
> -                           : "+r" (paro_ret__), "+m" (var)             \
> -                           : : "memory");                              \
> -               break;                                                  \
> -       case 4:                                                         \
> -               asm qual ("xaddl %0, "__percpu_arg(1)                   \
> -                           : "+r" (paro_ret__), "+m" (var)             \
> -                           : : "memory");                              \
> -               break;                                                  \
> -       case 8:                                                         \
> -               asm qual ("xaddq %0, "__percpu_arg(1)                   \
> -                           : "+re" (paro_ret__), "+m" (var)            \

^ before we use the "+re" constraint for 8B input.

> -                           : : "memory");                              \
> -               break;                                                  \
> -       default: __bad_percpu_size();                                   \

Comment on the series as a whole.  After applying the series, the
final reference to __bad_percpu_size and switch statement in
arch/x86/include/asm/percpu.h in the definition of the
percpu_stable_op() macro.  If you clean that up, too, then the rest of
this file feels more consistent with your series, even if it's not a
blocker for Clang i386 support. Then you can get rid of
__bad_percpu_size, too!


> -       }                                                               \
> -       paro_ret__ += val;                                              \
> -       paro_ret__;                                                     \
> +       __pcpu_type_##size paro_tmp__ = __pcpu_cast_##size(_val);       \
> +       asm qual (__pcpu_op2_##size("xadd", "%[tmp]",                   \
> +                                    __percpu_arg([var]))               \
> +                 : [tmp] __pcpu_reg_##size("+", paro_tmp__),           \

^ after, for `size == 8`, we use "+r". [0] says for "e":

32-bit signed integer constant, or a symbolic reference known to fit
that range (for immediate operands in sign-extending x86-64
instructions).

I'm guessing we're restricting the input to not allow for 64b signed
integer constants?  Looking at the documentation for `xadd` (ie.
"exchange and add") [1], it looks like immediates are not allowed as
operands, only registers or memory addresses.  So it seems that "e"
was never necessary.  It might be helpful to note that in the commit
message, should you end up sending a v2 of the series.  Maybe some
folks with more x86 inline asm experience can triple check/verify?

Assuming we're good to go, LGTM.
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>

[0] https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html#Machine-Constraints
[1] https://www.felixcloutier.com/x86/xadd


> +                   [var] "+m" (_var)                                   \
> +                 : : "memory");                                        \
> +       (typeof(_var))(unsigned long) (paro_tmp__ + _val);              \
>  })
>
>  /*
> @@ -377,16 +358,16 @@ do {                                                                      \
>  #define this_cpu_xchg_2(pcp, nval)     percpu_xchg_op(volatile, pcp, nval)
>  #define this_cpu_xchg_4(pcp, nval)     percpu_xchg_op(volatile, pcp, nval)
>
> -#define raw_cpu_add_return_1(pcp, val)         percpu_add_return_op(, pcp, val)
> -#define raw_cpu_add_return_2(pcp, val)         percpu_add_return_op(, pcp, val)
> -#define raw_cpu_add_return_4(pcp, val)         percpu_add_return_op(, pcp, val)
> +#define raw_cpu_add_return_1(pcp, val)         percpu_add_return_op(1, , pcp, val)
> +#define raw_cpu_add_return_2(pcp, val)         percpu_add_return_op(2, , pcp, val)
> +#define raw_cpu_add_return_4(pcp, val)         percpu_add_return_op(4, , pcp, val)
>  #define raw_cpu_cmpxchg_1(pcp, oval, nval)     percpu_cmpxchg_op(, pcp, oval, nval)
>  #define raw_cpu_cmpxchg_2(pcp, oval, nval)     percpu_cmpxchg_op(, pcp, oval, nval)
>  #define raw_cpu_cmpxchg_4(pcp, oval, nval)     percpu_cmpxchg_op(, pcp, oval, nval)
>
> -#define this_cpu_add_return_1(pcp, val)                percpu_add_return_op(volatile, pcp, val)
> -#define this_cpu_add_return_2(pcp, val)                percpu_add_return_op(volatile, pcp, val)
> -#define this_cpu_add_return_4(pcp, val)                percpu_add_return_op(volatile, pcp, val)
> +#define this_cpu_add_return_1(pcp, val)                percpu_add_return_op(1, volatile, pcp, val)
> +#define this_cpu_add_return_2(pcp, val)                percpu_add_return_op(2, volatile, pcp, val)
> +#define this_cpu_add_return_4(pcp, val)                percpu_add_return_op(4, volatile, pcp, val)
>  #define this_cpu_cmpxchg_1(pcp, oval, nval)    percpu_cmpxchg_op(volatile, pcp, oval, nval)
>  #define this_cpu_cmpxchg_2(pcp, oval, nval)    percpu_cmpxchg_op(volatile, pcp, oval, nval)
>  #define this_cpu_cmpxchg_4(pcp, oval, nval)    percpu_cmpxchg_op(volatile, pcp, oval, nval)
> @@ -418,7 +399,7 @@ do {                                                                        \
>  #define raw_cpu_add_8(pcp, val)                        percpu_add_op(8, , (pcp), val)
>  #define raw_cpu_and_8(pcp, val)                        percpu_to_op(8, , "and", (pcp), val)
>  #define raw_cpu_or_8(pcp, val)                 percpu_to_op(8, , "or", (pcp), val)
> -#define raw_cpu_add_return_8(pcp, val)         percpu_add_return_op(, pcp, val)
> +#define raw_cpu_add_return_8(pcp, val)         percpu_add_return_op(8, , pcp, val)
>  #define raw_cpu_xchg_8(pcp, nval)              raw_percpu_xchg_op(pcp, nval)
>  #define raw_cpu_cmpxchg_8(pcp, oval, nval)     percpu_cmpxchg_op(, pcp, oval, nval)
>
> @@ -427,7 +408,7 @@ do {                                                                        \
>  #define this_cpu_add_8(pcp, val)               percpu_add_op(8, volatile, (pcp), val)
>  #define this_cpu_and_8(pcp, val)               percpu_to_op(8, volatile, "and", (pcp), val)
>  #define this_cpu_or_8(pcp, val)                        percpu_to_op(8, volatile, "or", (pcp), val)
> -#define this_cpu_add_return_8(pcp, val)                percpu_add_return_op(volatile, pcp, val)
> +#define this_cpu_add_return_8(pcp, val)                percpu_add_return_op(8, volatile, pcp, val)
>  #define this_cpu_xchg_8(pcp, nval)             percpu_xchg_op(volatile, pcp, nval)
>  #define this_cpu_cmpxchg_8(pcp, oval, nval)    percpu_cmpxchg_op(volatile, pcp, oval, nval)
>
> --
> 2.25.4
>


--
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/7] x86/percpu: Clean up percpu_add_op()
  2020-05-17 15:29 ` [PATCH 4/7] x86/percpu: Clean up percpu_add_op() Brian Gerst
  2020-05-17 19:18   ` kbuild test robot
  2020-05-17 20:04   ` kbuild test robot
@ 2020-05-18 23:42   ` Nick Desaulniers
  2 siblings, 0 replies; 24+ messages in thread
From: Nick Desaulniers @ 2020-05-18 23:42 UTC (permalink / raw)
  To: Brian Gerst
  Cc: LKML, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin,
	Andy Lutomirski, Peter Zijlstra

On Sun, May 17, 2020 at 8:29 AM Brian Gerst <brgerst@gmail.com> wrote:
>
> The core percpu macros already have a switch on the data size, so the switch
> in the x86 code is redundant and produces more dead code.
>
> Also use appropriate types for the width of the instructions.  This avoids
> errors when compiling with Clang.
>
> Signed-off-by: Brian Gerst <brgerst@gmail.com>
> ---
>  arch/x86/include/asm/percpu.h | 99 ++++++++---------------------------
>  1 file changed, 22 insertions(+), 77 deletions(-)
>
> diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
> index 93f33202492d..21c5013a681a 100644
> --- a/arch/x86/include/asm/percpu.h
> +++ b/arch/x86/include/asm/percpu.h
> @@ -130,64 +130,32 @@ do {                                                                      \
>             : [val] __pcpu_reg_imm_##size(pto_val__));                  \
>  } while (0)
>
> +#define percpu_unary_op(size, qual, op, _var)                          \
> +({                                                                     \
> +       asm qual (__pcpu_op1_##size(op, __percpu_arg([var]))            \
> +           : [var] "+m" (_var));                                       \
> +})
> +
>  /*
>   * Generate a percpu add to memory instruction and optimize code
>   * if one is added or subtracted.
>   */
> -#define percpu_add_op(qual, var, val)                                  \
> +#define percpu_add_op(size, qual, var, val)                            \
>  do {                                                                   \
> -       typedef typeof(var) pao_T__;                                    \
>         const int pao_ID__ = (__builtin_constant_p(val) &&              \
>                               ((val) == 1 || (val) == -1)) ?            \
>                                 (int)(val) : 0;                         \
>         if (0) {                                                        \
> -               pao_T__ pao_tmp__;                                      \
> +               typeof(var) pao_tmp__;                                  \
>                 pao_tmp__ = (val);                                      \
>                 (void)pao_tmp__;                                        \
>         }                                                               \

Prefer __same_type() from include/linux/compiler_types.h here as well.
Nice elimination of the typedef.

> -       switch (sizeof(var)) {                                          \
> -       case 1:                                                         \
> -               if (pao_ID__ == 1)                                      \
> -                       asm qual ("incb "__percpu_arg(0) : "+m" (var)); \
> -               else if (pao_ID__ == -1)                                \
> -                       asm qual ("decb "__percpu_arg(0) : "+m" (var)); \
> -               else                                                    \
> -                       asm qual ("addb %1, "__percpu_arg(0)            \
> -                           : "+m" (var)                                \
> -                           : "qi" ((pao_T__)(val)));                   \
> -               break;                                                  \
> -       case 2:                                                         \
> -               if (pao_ID__ == 1)                                      \
> -                       asm qual ("incw "__percpu_arg(0) : "+m" (var)); \
> -               else if (pao_ID__ == -1)                                \
> -                       asm qual ("decw "__percpu_arg(0) : "+m" (var)); \
> -               else                                                    \
> -                       asm qual ("addw %1, "__percpu_arg(0)            \
> -                           : "+m" (var)                                \
> -                           : "ri" ((pao_T__)(val)));                   \
> -               break;                                                  \
> -       case 4:                                                         \
> -               if (pao_ID__ == 1)                                      \
> -                       asm qual ("incl "__percpu_arg(0) : "+m" (var)); \
> -               else if (pao_ID__ == -1)                                \
> -                       asm qual ("decl "__percpu_arg(0) : "+m" (var)); \
> -               else                                                    \
> -                       asm qual ("addl %1, "__percpu_arg(0)            \
> -                           : "+m" (var)                                \
> -                           : "ri" ((pao_T__)(val)));                   \
> -               break;                                                  \
> -       case 8:                                                         \
> -               if (pao_ID__ == 1)                                      \
> -                       asm qual ("incq "__percpu_arg(0) : "+m" (var)); \
> -               else if (pao_ID__ == -1)                                \
> -                       asm qual ("decq "__percpu_arg(0) : "+m" (var)); \
> -               else                                                    \
> -                       asm qual ("addq %1, "__percpu_arg(0)            \
> -                           : "+m" (var)                                \
> -                           : "re" ((pao_T__)(val)));                   \
> -               break;                                                  \
> -       default: __bad_percpu_size();                                   \
> -       }                                                               \
> +       if (pao_ID__ == 1)                                              \
> +               percpu_unary_op(size, qual, "inc", var);                \
> +       else if (pao_ID__ == -1)                                        \
> +               percpu_unary_op(size, qual, "dec", var);                \
> +       else                                                            \
> +               percpu_to_op(size, qual, "add", var, val);              \

Nice simplification, and it was clever of you to reuse percpu_to_op()
for the case of addition!

In terms of the 0day bot test failures:
>> include/linux/percpu-rwsem.h:91:17: sparse: sparse: cast truncates bits from constant value (ffffffff becomes ffff)
>> include/net/tcp.h:1521:9: sparse: sparse: cast truncates bits from constant value (1d4c0 becomes d4c0)
>> include/net/sch_generic.h:888:9: sparse: sparse: cast truncates bits from constant value (ffffffff becomes ffff)
>> drivers/infiniband/hw/hfi1/pio_copy.c:164:9: sparse: sparse: cast truncates bits from constant value (ffffffff becomes ffff)
which is reminiscent of our discussion in:
https://lore.kernel.org/lkml/CAMzpN2gu4stkRKTsMTVxyzckO3SMhfA+dmCnSu6-aMg5QAA_JQ@mail.gmail.com/

I'm not able to repro with i386_defconfig, but it looks like one
report was i386_allmodconfig, the other was (64b) allyesconfig.
I am able to repro with:
$ rm -f drivers/infiniband/hw/hfi1/pio_copy.o
$ make -j71 CC=clang allyesconfig
$ make -j71 CC=clang C=1 drivers/infiniband/hw/hfi1/pio_copy.o

It's not immediately clear to me from this diff which cast sparse is
complaining about.  Looking at the report and expanding this_cpu_dec()
seems to eventually expand to `this_cpu_add_<number>, so the reporting
does look potentially against the right patch in the series.  (Seeing
`(typeof((ptr) + 0))NULL` in __verify_pcpu_ptr() in
include/linux/percpu-defs.h is a real WTF, though the comment
explains).

Indeed, if I checkout before the series is applied, the warnings
disappear.  So let's follow up tomorrow on where this is coming from.

>  } while (0)
>
>  #define percpu_from_op(size, qual, op, _var)                           \
> @@ -228,29 +196,6 @@ do {                                                                       \
>         pfo_ret__;                                      \
>  })
>
> -#define percpu_unary_op(qual, op, var)                 \
> -({                                                     \
> -       switch (sizeof(var)) {                          \
> -       case 1:                                         \
> -               asm qual (op "b "__percpu_arg(0)        \
> -                   : "+m" (var));                      \
> -               break;                                  \
> -       case 2:                                         \
> -               asm qual (op "w "__percpu_arg(0)        \
> -                   : "+m" (var));                      \
> -               break;                                  \
> -       case 4:                                         \
> -               asm qual (op "l "__percpu_arg(0)        \
> -                   : "+m" (var));                      \
> -               break;                                  \
> -       case 8:                                         \
> -               asm qual (op "q "__percpu_arg(0)        \
> -                   : "+m" (var));                      \
> -               break;                                  \
> -       default: __bad_percpu_size();                   \
> -       }                                               \
> -})
> -
>  /*
>   * Add return operation
>   */
> @@ -388,9 +333,9 @@ do {                                                                        \
>  #define raw_cpu_write_1(pcp, val)      percpu_to_op(1, , "mov", (pcp), val)
>  #define raw_cpu_write_2(pcp, val)      percpu_to_op(2, , "mov", (pcp), val)
>  #define raw_cpu_write_4(pcp, val)      percpu_to_op(4, , "mov", (pcp), val)
> -#define raw_cpu_add_1(pcp, val)                percpu_add_op(, (pcp), val)
> -#define raw_cpu_add_2(pcp, val)                percpu_add_op(, (pcp), val)
> -#define raw_cpu_add_4(pcp, val)                percpu_add_op(, (pcp), val)
> +#define raw_cpu_add_1(pcp, val)                percpu_add_op(1, , (pcp), val)
> +#define raw_cpu_add_2(pcp, val)                percpu_add_op(2, , (pcp), val)
> +#define raw_cpu_add_4(pcp, val)                percpu_add_op(4, , (pcp), val)
>  #define raw_cpu_and_1(pcp, val)                percpu_to_op(1, , "and", (pcp), val)
>  #define raw_cpu_and_2(pcp, val)                percpu_to_op(2, , "and", (pcp), val)
>  #define raw_cpu_and_4(pcp, val)                percpu_to_op(4, , "and", (pcp), val)
> @@ -419,9 +364,9 @@ do {                                                                        \
>  #define this_cpu_write_1(pcp, val)     percpu_to_op(1, volatile, "mov", (pcp), val)
>  #define this_cpu_write_2(pcp, val)     percpu_to_op(2, volatile, "mov", (pcp), val)
>  #define this_cpu_write_4(pcp, val)     percpu_to_op(4, volatile, "mov", (pcp), val)
> -#define this_cpu_add_1(pcp, val)       percpu_add_op(volatile, (pcp), val)
> -#define this_cpu_add_2(pcp, val)       percpu_add_op(volatile, (pcp), val)
> -#define this_cpu_add_4(pcp, val)       percpu_add_op(volatile, (pcp), val)
> +#define this_cpu_add_1(pcp, val)       percpu_add_op(1, volatile, (pcp), val)
> +#define this_cpu_add_2(pcp, val)       percpu_add_op(2, volatile, (pcp), val)
> +#define this_cpu_add_4(pcp, val)       percpu_add_op(4, volatile, (pcp), val)
>  #define this_cpu_and_1(pcp, val)       percpu_to_op(1, volatile, "and", (pcp), val)
>  #define this_cpu_and_2(pcp, val)       percpu_to_op(2, volatile, "and", (pcp), val)
>  #define this_cpu_and_4(pcp, val)       percpu_to_op(4, volatile, "and", (pcp), val)
> @@ -470,7 +415,7 @@ do {                                                                        \
>  #ifdef CONFIG_X86_64
>  #define raw_cpu_read_8(pcp)                    percpu_from_op(8, , "mov", pcp)
>  #define raw_cpu_write_8(pcp, val)              percpu_to_op(8, , "mov", (pcp), val)
> -#define raw_cpu_add_8(pcp, val)                        percpu_add_op(, (pcp), val)
> +#define raw_cpu_add_8(pcp, val)                        percpu_add_op(8, , (pcp), val)
>  #define raw_cpu_and_8(pcp, val)                        percpu_to_op(8, , "and", (pcp), val)
>  #define raw_cpu_or_8(pcp, val)                 percpu_to_op(8, , "or", (pcp), val)
>  #define raw_cpu_add_return_8(pcp, val)         percpu_add_return_op(, pcp, val)
> @@ -479,7 +424,7 @@ do {                                                                        \
>
>  #define this_cpu_read_8(pcp)                   percpu_from_op(8, volatile, "mov", pcp)
>  #define this_cpu_write_8(pcp, val)             percpu_to_op(8, volatile, "mov", (pcp), val)
> -#define this_cpu_add_8(pcp, val)               percpu_add_op(volatile, (pcp), val)
> +#define this_cpu_add_8(pcp, val)               percpu_add_op(8, volatile, (pcp), val)
>  #define this_cpu_and_8(pcp, val)               percpu_to_op(8, volatile, "and", (pcp), val)
>  #define this_cpu_or_8(pcp, val)                        percpu_to_op(8, volatile, "or", (pcp), val)
>  #define this_cpu_add_return_8(pcp, val)                percpu_add_return_op(volatile, pcp, val)
> --
> 2.25.4
>


-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 5/7] x86/percpu: Clean up percpu_add_return_op()
  2020-05-18 22:46   ` Nick Desaulniers
@ 2020-05-18 23:45     ` Brian Gerst
  0 siblings, 0 replies; 24+ messages in thread
From: Brian Gerst @ 2020-05-18 23:45 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: LKML, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin,
	Andy Lutomirski, Peter Zijlstra

On Mon, May 18, 2020 at 6:46 PM Nick Desaulniers
<ndesaulniers@google.com> wrote:
>
> On Sun, May 17, 2020 at 8:29 AM Brian Gerst <brgerst@gmail.com> wrote:
> >
> > The core percpu macros already have a switch on the data size, so the switch
> > in the x86 code is redundant and produces more dead code.
> >
> > Also use appropriate types for the width of the instructions.  This avoids
> > errors when compiling with Clang.
> >
> > Signed-off-by: Brian Gerst <brgerst@gmail.com>
> > ---
> >  arch/x86/include/asm/percpu.h | 51 +++++++++++------------------------
> >  1 file changed, 16 insertions(+), 35 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
> > index 21c5013a681a..ac8c391a190e 100644
> > --- a/arch/x86/include/asm/percpu.h
> > +++ b/arch/x86/include/asm/percpu.h
> > @@ -199,34 +199,15 @@ do {                                                                      \
> >  /*
> >   * Add return operation
> >   */
> > -#define percpu_add_return_op(qual, var, val)                           \
> > +#define percpu_add_return_op(size, qual, _var, _val)                   \
> >  ({                                                                     \
> > -       typeof(var) paro_ret__ = val;                                   \
> > -       switch (sizeof(var)) {                                          \
> > -       case 1:                                                         \
> > -               asm qual ("xaddb %0, "__percpu_arg(1)                   \
> > -                           : "+q" (paro_ret__), "+m" (var)             \
> > -                           : : "memory");                              \
> > -               break;                                                  \
> > -       case 2:                                                         \
> > -               asm qual ("xaddw %0, "__percpu_arg(1)                   \
> > -                           : "+r" (paro_ret__), "+m" (var)             \
> > -                           : : "memory");                              \
> > -               break;                                                  \
> > -       case 4:                                                         \
> > -               asm qual ("xaddl %0, "__percpu_arg(1)                   \
> > -                           : "+r" (paro_ret__), "+m" (var)             \
> > -                           : : "memory");                              \
> > -               break;                                                  \
> > -       case 8:                                                         \
> > -               asm qual ("xaddq %0, "__percpu_arg(1)                   \
> > -                           : "+re" (paro_ret__), "+m" (var)            \
>
> ^ before we use the "+re" constraint for 8B input.
>
> > -                           : : "memory");                              \
> > -               break;                                                  \
> > -       default: __bad_percpu_size();                                   \
>
> Comment on the series as a whole.  After applying the series, the
> final reference to __bad_percpu_size and switch statement in
> arch/x86/include/asm/percpu.h in the definition of the
> percpu_stable_op() macro.  If you clean that up, too, then the rest of
> this file feels more consistent with your series, even if it's not a
> blocker for Clang i386 support. Then you can get rid of
> __bad_percpu_size, too!

I haven't yet figured out what to do with percpu_stable_op().  It's
x86-specific, so there isn't another switch in the core code.  I think
it is supposed to be similar to READ_ONCE() but for percpu variables,
but I'm not 100% sure.

> > -       }                                                               \
> > -       paro_ret__ += val;                                              \
> > -       paro_ret__;                                                     \
> > +       __pcpu_type_##size paro_tmp__ = __pcpu_cast_##size(_val);       \
> > +       asm qual (__pcpu_op2_##size("xadd", "%[tmp]",                   \
> > +                                    __percpu_arg([var]))               \
> > +                 : [tmp] __pcpu_reg_##size("+", paro_tmp__),           \
>
> ^ after, for `size == 8`, we use "+r". [0] says for "e":
>
> 32-bit signed integer constant, or a symbolic reference known to fit
> that range (for immediate operands in sign-extending x86-64
> instructions).
>
> I'm guessing we're restricting the input to not allow for 64b signed
> integer constants?  Looking at the documentation for `xadd` (ie.
> "exchange and add") [1], it looks like immediates are not allowed as
> operands, only registers or memory addresses.  So it seems that "e"
> was never necessary.  It might be helpful to note that in the commit
> message, should you end up sending a v2 of the series.  Maybe some
> folks with more x86 inline asm experience can triple check/verify?

That is correct.  The "e" constraint shouldn't have been there, since
XADD doesn't allow immediates.  I'll make that clearer in V2.

--
Brian Gerst

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/7] x86/percpu: Introduce size abstraction macros
  2020-05-17 15:29 ` [PATCH 1/7] x86/percpu: Introduce size abstraction macros Brian Gerst
@ 2020-05-18 23:48   ` Nick Desaulniers
  0 siblings, 0 replies; 24+ messages in thread
From: Nick Desaulniers @ 2020-05-18 23:48 UTC (permalink / raw)
  To: Brian Gerst
  Cc: LKML, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin,
	Andy Lutomirski, Peter Zijlstra

On Sun, May 17, 2020 at 8:29 AM Brian Gerst <brgerst@gmail.com> wrote:
>
> In preparation for cleaning up the percpu operations, define macros for
> abstraction based on the width of the operation.
>
> Signed-off-by: Brian Gerst <brgerst@gmail.com>
> ---
>  arch/x86/include/asm/percpu.h | 30 ++++++++++++++++++++++++++++++
>  1 file changed, 30 insertions(+)
>
> diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
> index 2278797c769d..89f918a3e99b 100644
> --- a/arch/x86/include/asm/percpu.h
> +++ b/arch/x86/include/asm/percpu.h
> @@ -87,6 +87,36 @@
>   * don't give an lvalue though). */
>  extern void __bad_percpu_size(void);
>
> +#define __pcpu_type_1 u8
> +#define __pcpu_type_2 u16
> +#define __pcpu_type_4 u32
> +#define __pcpu_type_8 u64
> +
> +#define __pcpu_cast_1(val) ((u8)((unsigned long) val))
> +#define __pcpu_cast_2(val) ((u16)((unsigned long) val))
> +#define __pcpu_cast_4(val) ((u32)((unsigned long) val))
> +#define __pcpu_cast_8(val) ((u64)(val))
> +
> +#define __pcpu_op1_1(op, dst) op "b " dst
> +#define __pcpu_op1_2(op, dst) op "w " dst
> +#define __pcpu_op1_4(op, dst) op "l " dst
> +#define __pcpu_op1_8(op, dst) op "q " dst
> +
> +#define __pcpu_op2_1(op, src, dst) op "b " src ", " dst
> +#define __pcpu_op2_2(op, src, dst) op "w " src ", " dst
> +#define __pcpu_op2_4(op, src, dst) op "l " src ", " dst
> +#define __pcpu_op2_8(op, src, dst) op "q " src ", " dst

`op1` and `op2` aren't the most descriptive, though we kind of would
like terseness here.  I guess "op1"s have 1 operand, and "op2"s have 2
operands.

> +
> +#define __pcpu_reg_1(out, x) out "q" (x)
> +#define __pcpu_reg_2(out, x) out "r" (x)
> +#define __pcpu_reg_4(out, x) out "r" (x)
> +#define __pcpu_reg_8(out, x) out "r" (x)

I think `mod` is more descriptive than `out`, as there are modifiers.
https://gcc.gnu.org/onlinedocs/gcc/Modifiers.html#Modifiers

I don't want to bikeshed, since the naming changes potentially mean
touching each patch.  Just food for thought in case other reviewers
agree/disagree.  So I'll just add:
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>

> +
> +#define __pcpu_reg_imm_1(x) "qi" (x)
> +#define __pcpu_reg_imm_2(x) "ri" (x)
> +#define __pcpu_reg_imm_4(x) "ri" (x)
> +#define __pcpu_reg_imm_8(x) "re" (x)
> +
>  #define percpu_to_op(qual, op, var, val)               \
>  do {                                                   \
>         typedef typeof(var) pto_T__;                    \
> --
> 2.25.4
>


-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/7] x86: Clean up percpu operations
  2020-05-18 19:19 ` [PATCH 0/7] x86: Clean up percpu operations Nick Desaulniers
@ 2020-05-18 23:54   ` Nick Desaulniers
  0 siblings, 0 replies; 24+ messages in thread
From: Nick Desaulniers @ 2020-05-18 23:54 UTC (permalink / raw)
  To: Brian Gerst
  Cc: LKML, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin,
	Andy Lutomirski, Peter Zijlstra

On Mon, May 18, 2020 at 12:19 PM Nick Desaulniers
<ndesaulniers@google.com> wrote:
>
> On Sun, May 17, 2020 at 8:29 AM Brian Gerst <brgerst@gmail.com> wrote:
> >
> > The core percpu operations already have a switch on the width of the
> > data type, which resulted in an extra amount of dead code being
>
> Thanks for the series Brian.  The duplication between knowing the size
> at the call site and the switch is exactly what I picked up on, too.
> This past weekend I started a similar cleanup, but only got through 2
> of the 6 functions you cleaned up.  And via quick glance, your series
> looks much better than mine does.  I'll sit down and review+test these
> after lunch.  I appreciate you taking the time to create this series.

Patch set looks good.  I tested:
* i386_defconfig
* defconfig
* allyesconfig
with ToT Clang and GCC 9.2.1.  Booted the first two in QEMU as a smoke
test as well.

Note that for Clang, I also needed to apply this hunk to
arch/x86/include/asm/uaccess.h
(https://github.com/ClangBuiltLinux/continuous-integration/blob/master/patches/llvm-all/linux-next/x86/x86-support-i386-with-Clang.patch)
which will be the final issue for i386_defconfig+clang.

I think other than the 0day report on patch 4/7, and the use of
__same_type, the series is mostly good to go.  You can add my:
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
to any v2 of the whole series.

I generally find the use of named assembly operands and a few of the
simplifications to be a nice touch, which I pointed out per patch.

Thanks again for shaving this yak.

>
> > generated with the x86 operations having another switch.  This patch set
> > rewrites the x86 ops to remove the switch.  Additional cleanups are to
> > use named assembly operands, and to cast variables to the width used in
> > the assembly to make Clang happy.
> >
> > Brian Gerst (7):
> >   x86/percpu: Introduce size abstraction macros
> >   x86/percpu: Clean up percpu_to_op()
> >   x86/percpu: Clean up percpu_from_op()
> >   x86/percpu: Clean up percpu_add_op()
> >   x86/percpu: Clean up percpu_add_return_op()
> >   x86/percpu: Clean up percpu_xchg_op()
> >   x86/percpu: Clean up percpu_cmpxchg_op()
> >
> >  arch/x86/include/asm/percpu.h | 447 ++++++++++++----------------------
> >  1 file changed, 158 insertions(+), 289 deletions(-)
> >
> > --

-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/7] x86/percpu: Clean up percpu_to_op()
  2020-05-18 21:15   ` Nick Desaulniers
@ 2020-05-19  3:38     ` Brian Gerst
  2020-05-20 17:26       ` Nick Desaulniers
  0 siblings, 1 reply; 24+ messages in thread
From: Brian Gerst @ 2020-05-19  3:38 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: LKML, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin,
	Andy Lutomirski, Peter Zijlstra

On Mon, May 18, 2020 at 5:15 PM Nick Desaulniers
<ndesaulniers@google.com> wrote:
>
> On Sun, May 17, 2020 at 8:29 AM Brian Gerst <brgerst@gmail.com> wrote:
> >
> > The core percpu macros already have a switch on the data size, so the switch
> > in the x86 code is redundant and produces more dead code.
> >
> > Also use appropriate types for the width of the instructions.  This avoids
> > errors when compiling with Clang.
> >
> > Signed-off-by: Brian Gerst <brgerst@gmail.com>
> > ---
> >  arch/x86/include/asm/percpu.h | 90 ++++++++++++++---------------------
> >  1 file changed, 35 insertions(+), 55 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
> > index 89f918a3e99b..233c7a78d1a6 100644
> > --- a/arch/x86/include/asm/percpu.h
> > +++ b/arch/x86/include/asm/percpu.h
> > @@ -117,37 +117,17 @@ extern void __bad_percpu_size(void);
> >  #define __pcpu_reg_imm_4(x) "ri" (x)
> >  #define __pcpu_reg_imm_8(x) "re" (x)
> >
> > -#define percpu_to_op(qual, op, var, val)               \
> > -do {                                                   \
> > -       typedef typeof(var) pto_T__;                    \
> > -       if (0) {                                        \
> > -               pto_T__ pto_tmp__;                      \
> > -               pto_tmp__ = (val);                      \
> > -               (void)pto_tmp__;                        \
> > -       }                                               \
> > -       switch (sizeof(var)) {                          \
> > -       case 1:                                         \
> > -               asm qual (op "b %1,"__percpu_arg(0)     \
> > -                   : "+m" (var)                        \
> > -                   : "qi" ((pto_T__)(val)));           \
> > -               break;                                  \
> > -       case 2:                                         \
> > -               asm qual (op "w %1,"__percpu_arg(0)     \
> > -                   : "+m" (var)                        \
> > -                   : "ri" ((pto_T__)(val)));           \
> > -               break;                                  \
> > -       case 4:                                         \
> > -               asm qual (op "l %1,"__percpu_arg(0)     \
> > -                   : "+m" (var)                        \
> > -                   : "ri" ((pto_T__)(val)));           \
> > -               break;                                  \
> > -       case 8:                                         \
> > -               asm qual (op "q %1,"__percpu_arg(0)     \
> > -                   : "+m" (var)                        \
> > -                   : "re" ((pto_T__)(val)));           \
> > -               break;                                  \
> > -       default: __bad_percpu_size();                   \
> > -       }                                               \
> > +#define percpu_to_op(size, qual, op, _var, _val)                       \
> > +do {                                                                   \
> > +       __pcpu_type_##size pto_val__ = __pcpu_cast_##size(_val);        \
> > +       if (0) {                                                        \
> > +               typeof(_var) pto_tmp__;                                 \
> > +               pto_tmp__ = (_val);                                     \
> > +               (void)pto_tmp__;                                        \
> > +       }                                                               \
>
> Please replace the whole `if (0)` block with:
> ```c
> __same_type(_var, _val);
> ```
> from include/linux/compiler.h.

The problem with __builtin_types_compatible_p() is that it considers
unsigned long and u64 (aka unsigned long long) as different types even
though they are the same width on x86-64.  While this may be a good
cleanup to look at in the future, it's not a simple drop-in
replacement.

--
Brian Gerst

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/7] x86/percpu: Clean up percpu_to_op()
  2020-05-19  3:38     ` Brian Gerst
@ 2020-05-20 17:26       ` Nick Desaulniers
  2020-05-21 13:06         ` Brian Gerst
  0 siblings, 1 reply; 24+ messages in thread
From: Nick Desaulniers @ 2020-05-20 17:26 UTC (permalink / raw)
  To: Brian Gerst
  Cc: LKML, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin,
	Andy Lutomirski, Peter Zijlstra

On Mon, May 18, 2020 at 8:38 PM Brian Gerst <brgerst@gmail.com> wrote:
>
> On Mon, May 18, 2020 at 5:15 PM Nick Desaulniers
> <ndesaulniers@google.com> wrote:
> >
> > On Sun, May 17, 2020 at 8:29 AM Brian Gerst <brgerst@gmail.com> wrote:
> > >
> > > The core percpu macros already have a switch on the data size, so the switch
> > > in the x86 code is redundant and produces more dead code.
> > >
> > > Also use appropriate types for the width of the instructions.  This avoids
> > > errors when compiling with Clang.
> > >
> > > Signed-off-by: Brian Gerst <brgerst@gmail.com>
> > > ---
> > >  arch/x86/include/asm/percpu.h | 90 ++++++++++++++---------------------
> > >  1 file changed, 35 insertions(+), 55 deletions(-)
> > >
> > > diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
> > > index 89f918a3e99b..233c7a78d1a6 100644
> > > --- a/arch/x86/include/asm/percpu.h
> > > +++ b/arch/x86/include/asm/percpu.h
> > > @@ -117,37 +117,17 @@ extern void __bad_percpu_size(void);
> > >  #define __pcpu_reg_imm_4(x) "ri" (x)
> > >  #define __pcpu_reg_imm_8(x) "re" (x)
> > >
> > > -#define percpu_to_op(qual, op, var, val)               \
> > > -do {                                                   \
> > > -       typedef typeof(var) pto_T__;                    \
> > > -       if (0) {                                        \
> > > -               pto_T__ pto_tmp__;                      \
> > > -               pto_tmp__ = (val);                      \
> > > -               (void)pto_tmp__;                        \
> > > -       }                                               \
> > > -       switch (sizeof(var)) {                          \
> > > -       case 1:                                         \
> > > -               asm qual (op "b %1,"__percpu_arg(0)     \
> > > -                   : "+m" (var)                        \
> > > -                   : "qi" ((pto_T__)(val)));           \
> > > -               break;                                  \
> > > -       case 2:                                         \
> > > -               asm qual (op "w %1,"__percpu_arg(0)     \
> > > -                   : "+m" (var)                        \
> > > -                   : "ri" ((pto_T__)(val)));           \
> > > -               break;                                  \
> > > -       case 4:                                         \
> > > -               asm qual (op "l %1,"__percpu_arg(0)     \
> > > -                   : "+m" (var)                        \
> > > -                   : "ri" ((pto_T__)(val)));           \
> > > -               break;                                  \
> > > -       case 8:                                         \
> > > -               asm qual (op "q %1,"__percpu_arg(0)     \
> > > -                   : "+m" (var)                        \
> > > -                   : "re" ((pto_T__)(val)));           \
> > > -               break;                                  \
> > > -       default: __bad_percpu_size();                   \
> > > -       }                                               \
> > > +#define percpu_to_op(size, qual, op, _var, _val)                       \
> > > +do {                                                                   \
> > > +       __pcpu_type_##size pto_val__ = __pcpu_cast_##size(_val);        \
> > > +       if (0) {                                                        \
> > > +               typeof(_var) pto_tmp__;                                 \
> > > +               pto_tmp__ = (_val);                                     \
> > > +               (void)pto_tmp__;                                        \
> > > +       }                                                               \
> >
> > Please replace the whole `if (0)` block with:
> > ```c
> > __same_type(_var, _val);
> > ```
> > from include/linux/compiler.h.
>
> The problem with __builtin_types_compatible_p() is that it considers
> unsigned long and u64 (aka unsigned long long) as different types even
> though they are the same width on x86-64.  While this may be a good
> cleanup to look at in the future, it's not a simple drop-in
> replacement.

Does it trigger errors in this case?

It's interesting to know how this trick differs from
__builtin_types_compatible_p().  Might even be helpful to wrap this
pattern in a macro with a comment with the pros/cons of this approach
vs __same_type.

On the other hand, the use of `long` seems tricky in x86 code as x86
(32b) is ILP32 but x86_64 (64b) is LP64.  So the use of `long` is
ambiguous in the sense that it's a different size depending on the
target ABI.  Wouldn't it potentially be a bug for x86 kernel code to
use `long` percpu variables (or rather mix `long` and `long long` in
the same operation) in that case, since the sizes of the two would be
different for i386?
-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/7] x86/percpu: Clean up percpu_to_op()
  2020-05-20 17:26       ` Nick Desaulniers
@ 2020-05-21 13:06         ` Brian Gerst
  2020-05-26 17:54           ` Nick Desaulniers
  0 siblings, 1 reply; 24+ messages in thread
From: Brian Gerst @ 2020-05-21 13:06 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: LKML, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin,
	Andy Lutomirski, Peter Zijlstra

On Wed, May 20, 2020 at 1:26 PM Nick Desaulniers
<ndesaulniers@google.com> wrote:
>
> On Mon, May 18, 2020 at 8:38 PM Brian Gerst <brgerst@gmail.com> wrote:
> >
> > On Mon, May 18, 2020 at 5:15 PM Nick Desaulniers
> > <ndesaulniers@google.com> wrote:
> > >
> > > On Sun, May 17, 2020 at 8:29 AM Brian Gerst <brgerst@gmail.com> wrote:
> > > >
> > > > The core percpu macros already have a switch on the data size, so the switch
> > > > in the x86 code is redundant and produces more dead code.
> > > >
> > > > Also use appropriate types for the width of the instructions.  This avoids
> > > > errors when compiling with Clang.
> > > >
> > > > Signed-off-by: Brian Gerst <brgerst@gmail.com>
> > > > ---
> > > >  arch/x86/include/asm/percpu.h | 90 ++++++++++++++---------------------
> > > >  1 file changed, 35 insertions(+), 55 deletions(-)
> > > >
> > > > diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
> > > > index 89f918a3e99b..233c7a78d1a6 100644
> > > > --- a/arch/x86/include/asm/percpu.h
> > > > +++ b/arch/x86/include/asm/percpu.h
> > > > @@ -117,37 +117,17 @@ extern void __bad_percpu_size(void);
> > > >  #define __pcpu_reg_imm_4(x) "ri" (x)
> > > >  #define __pcpu_reg_imm_8(x) "re" (x)
> > > >
> > > > -#define percpu_to_op(qual, op, var, val)               \
> > > > -do {                                                   \
> > > > -       typedef typeof(var) pto_T__;                    \
> > > > -       if (0) {                                        \
> > > > -               pto_T__ pto_tmp__;                      \
> > > > -               pto_tmp__ = (val);                      \
> > > > -               (void)pto_tmp__;                        \
> > > > -       }                                               \
> > > > -       switch (sizeof(var)) {                          \
> > > > -       case 1:                                         \
> > > > -               asm qual (op "b %1,"__percpu_arg(0)     \
> > > > -                   : "+m" (var)                        \
> > > > -                   : "qi" ((pto_T__)(val)));           \
> > > > -               break;                                  \
> > > > -       case 2:                                         \
> > > > -               asm qual (op "w %1,"__percpu_arg(0)     \
> > > > -                   : "+m" (var)                        \
> > > > -                   : "ri" ((pto_T__)(val)));           \
> > > > -               break;                                  \
> > > > -       case 4:                                         \
> > > > -               asm qual (op "l %1,"__percpu_arg(0)     \
> > > > -                   : "+m" (var)                        \
> > > > -                   : "ri" ((pto_T__)(val)));           \
> > > > -               break;                                  \
> > > > -       case 8:                                         \
> > > > -               asm qual (op "q %1,"__percpu_arg(0)     \
> > > > -                   : "+m" (var)                        \
> > > > -                   : "re" ((pto_T__)(val)));           \
> > > > -               break;                                  \
> > > > -       default: __bad_percpu_size();                   \
> > > > -       }                                               \
> > > > +#define percpu_to_op(size, qual, op, _var, _val)                       \
> > > > +do {                                                                   \
> > > > +       __pcpu_type_##size pto_val__ = __pcpu_cast_##size(_val);        \
> > > > +       if (0) {                                                        \
> > > > +               typeof(_var) pto_tmp__;                                 \
> > > > +               pto_tmp__ = (_val);                                     \
> > > > +               (void)pto_tmp__;                                        \
> > > > +       }                                                               \
> > >
> > > Please replace the whole `if (0)` block with:
> > > ```c
> > > __same_type(_var, _val);
> > > ```
> > > from include/linux/compiler.h.
> >
> > The problem with __builtin_types_compatible_p() is that it considers
> > unsigned long and u64 (aka unsigned long long) as different types even
> > though they are the same width on x86-64.  While this may be a good
> > cleanup to look at in the future, it's not a simple drop-in
> > replacement.
>
> Does it trigger errors in this case?

Yes, see boot_init_stack_canary().  That code looks a bit sketchy but
it's not wrong, for x86-64 at least.

It also doesn't seem to like "void *" compared to any other pointer type:

In function ‘fpregs_deactivate’,
    inlined from ‘fpu__drop’ at arch/x86/kernel/fpu/core.c:285:3:
./include/linux/compiler.h:379:38: error: call to
‘__compiletime_assert_317’ declared with attribute error: BUILD_BUG_ON
failed: !__same_type((fpu_fpregs_owner_ctx), ((void *)0))
  379 |  _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
      |                                      ^
./include/linux/compiler.h:360:4: note: in definition of macro
‘__compiletime_assert’
  360 |    prefix ## suffix();    \
      |    ^~~~~~
./include/linux/compiler.h:379:2: note: in expansion of macro
‘_compiletime_assert’
  379 |  _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
      |  ^~~~~~~~~~~~~~~~~~~
./include/linux/build_bug.h:39:37: note: in expansion of macro
‘compiletime_assert’
   39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
      |                                     ^~~~~~~~~~~~~~~~~~
./include/linux/build_bug.h:50:2: note: in expansion of macro ‘BUILD_BUG_ON_MSG’
   50 |  BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
      |  ^~~~~~~~~~~~~~~~
./arch/x86/include/asm/percpu.h:105:2: note: in expansion of macro
‘BUILD_BUG_ON’
  105 |  BUILD_BUG_ON(!__same_type(_var, _val));    \
      |  ^~~~~~~~~~~~
./arch/x86/include/asm/percpu.h:338:37: note: in expansion of macro
‘percpu_to_op’
  338 | #define this_cpu_write_8(pcp, val)  percpu_to_op(8, volatile,
"mov", (pcp), val)
      |                                     ^~~~~~~~~~~~
./include/linux/percpu-defs.h:380:11: note: in expansion of macro
‘this_cpu_write_8’
  380 |   case 8: stem##8(variable, __VA_ARGS__);break;  \
      |           ^~~~
./include/linux/percpu-defs.h:508:34: note: in expansion of macro
‘__pcpu_size_call’
  508 | #define this_cpu_write(pcp, val)
__pcpu_size_call(this_cpu_write_, pcp, val)
      |                                  ^~~~~~~~~~~~~~~~
./arch/x86/include/asm/fpu/internal.h:525:2: note: in expansion of
macro ‘this_cpu_write’
  525 |  this_cpu_write(fpu_fpregs_owner_ctx, NULL);
      |  ^~~~~~~~~~~~~~

>
> It's interesting to know how this trick differs from
> __builtin_types_compatible_p().  Might even be helpful to wrap this
> pattern in a macro with a comment with the pros/cons of this approach
> vs __same_type.

I think the original code is more to catch a mismatch between pointers
and integers.  It doesn't seem to care about truncation

> On the other hand, the use of `long` seems tricky in x86 code as x86
> (32b) is ILP32 but x86_64 (64b) is LP64.  So the use of `long` is
> ambiguous in the sense that it's a different size depending on the
> target ABI.  Wouldn't it potentially be a bug for x86 kernel code to
> use `long` percpu variables (or rather mix `long` and `long long` in
> the same operation) in that case, since the sizes of the two would be
> different for i386?

Not necessarily.  Some things like registers are naturally 32-bit on a
32-bit kernel and 64-bit on a 64-bit kernel, so 'long' is appropriate
there.

--
Brian Gerst

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/7] x86/percpu: Clean up percpu_to_op()
  2020-05-21 13:06         ` Brian Gerst
@ 2020-05-26 17:54           ` Nick Desaulniers
  0 siblings, 0 replies; 24+ messages in thread
From: Nick Desaulniers @ 2020-05-26 17:54 UTC (permalink / raw)
  To: Brian Gerst
  Cc: LKML, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin,
	Andy Lutomirski, Peter Zijlstra

On Thu, May 21, 2020 at 6:06 AM Brian Gerst <brgerst@gmail.com> wrote:
>
> On Wed, May 20, 2020 at 1:26 PM Nick Desaulniers
> <ndesaulniers@google.com> wrote:
> >
> > On Mon, May 18, 2020 at 8:38 PM Brian Gerst <brgerst@gmail.com> wrote:
> > >
> > > On Mon, May 18, 2020 at 5:15 PM Nick Desaulniers
> > > <ndesaulniers@google.com> wrote:
> > > >
> > > > On Sun, May 17, 2020 at 8:29 AM Brian Gerst <brgerst@gmail.com> wrote:
> > > > >
> > > > > The core percpu macros already have a switch on the data size, so the switch
> > > > > in the x86 code is redundant and produces more dead code.
> > > > >
> > > > > Also use appropriate types for the width of the instructions.  This avoids
> > > > > errors when compiling with Clang.
> > > > >
> > > > > Signed-off-by: Brian Gerst <brgerst@gmail.com>
> > > > > ---
> > > > >  arch/x86/include/asm/percpu.h | 90 ++++++++++++++---------------------
> > > > >  1 file changed, 35 insertions(+), 55 deletions(-)
> > > > >
> > > > > diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
> > > > > index 89f918a3e99b..233c7a78d1a6 100644
> > > > > --- a/arch/x86/include/asm/percpu.h
> > > > > +++ b/arch/x86/include/asm/percpu.h
> > > > > @@ -117,37 +117,17 @@ extern void __bad_percpu_size(void);
> > > > >  #define __pcpu_reg_imm_4(x) "ri" (x)
> > > > >  #define __pcpu_reg_imm_8(x) "re" (x)
> > > > >
> > > > > -#define percpu_to_op(qual, op, var, val)               \
> > > > > -do {                                                   \
> > > > > -       typedef typeof(var) pto_T__;                    \
> > > > > -       if (0) {                                        \
> > > > > -               pto_T__ pto_tmp__;                      \
> > > > > -               pto_tmp__ = (val);                      \
> > > > > -               (void)pto_tmp__;                        \
> > > > > -       }                                               \
> > > > > -       switch (sizeof(var)) {                          \
> > > > > -       case 1:                                         \
> > > > > -               asm qual (op "b %1,"__percpu_arg(0)     \
> > > > > -                   : "+m" (var)                        \
> > > > > -                   : "qi" ((pto_T__)(val)));           \
> > > > > -               break;                                  \
> > > > > -       case 2:                                         \
> > > > > -               asm qual (op "w %1,"__percpu_arg(0)     \
> > > > > -                   : "+m" (var)                        \
> > > > > -                   : "ri" ((pto_T__)(val)));           \
> > > > > -               break;                                  \
> > > > > -       case 4:                                         \
> > > > > -               asm qual (op "l %1,"__percpu_arg(0)     \
> > > > > -                   : "+m" (var)                        \
> > > > > -                   : "ri" ((pto_T__)(val)));           \
> > > > > -               break;                                  \
> > > > > -       case 8:                                         \
> > > > > -               asm qual (op "q %1,"__percpu_arg(0)     \
> > > > > -                   : "+m" (var)                        \
> > > > > -                   : "re" ((pto_T__)(val)));           \
> > > > > -               break;                                  \
> > > > > -       default: __bad_percpu_size();                   \
> > > > > -       }                                               \
> > > > > +#define percpu_to_op(size, qual, op, _var, _val)                       \
> > > > > +do {                                                                   \
> > > > > +       __pcpu_type_##size pto_val__ = __pcpu_cast_##size(_val);        \
> > > > > +       if (0) {                                                        \
> > > > > +               typeof(_var) pto_tmp__;                                 \
> > > > > +               pto_tmp__ = (_val);                                     \
> > > > > +               (void)pto_tmp__;                                        \
> > > > > +       }                                                               \
> > > >
> > > > Please replace the whole `if (0)` block with:
> > > > ```c
> > > > __same_type(_var, _val);
> > > > ```
> > > > from include/linux/compiler.h.
> > >
> > > The problem with __builtin_types_compatible_p() is that it considers
> > > unsigned long and u64 (aka unsigned long long) as different types even
> > > though they are the same width on x86-64.  While this may be a good
> > > cleanup to look at in the future, it's not a simple drop-in
> > > replacement.
> >
> > Does it trigger errors in this case?
>
> Yes, see boot_init_stack_canary().  That code looks a bit sketchy but
> it's not wrong, for x86-64 at least.
>
> It also doesn't seem to like "void *" compared to any other pointer type:
>
> In function ‘fpregs_deactivate’,
>     inlined from ‘fpu__drop’ at arch/x86/kernel/fpu/core.c:285:3:
> ./include/linux/compiler.h:379:38: error: call to
> ‘__compiletime_assert_317’ declared with attribute error: BUILD_BUG_ON
> failed: !__same_type((fpu_fpregs_owner_ctx), ((void *)0))
>   379 |  _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
>       |                                      ^
> ./include/linux/compiler.h:360:4: note: in definition of macro
> ‘__compiletime_assert’
>   360 |    prefix ## suffix();    \
>       |    ^~~~~~
> ./include/linux/compiler.h:379:2: note: in expansion of macro
> ‘_compiletime_assert’
>   379 |  _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
>       |  ^~~~~~~~~~~~~~~~~~~
> ./include/linux/build_bug.h:39:37: note: in expansion of macro
> ‘compiletime_assert’
>    39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
>       |                                     ^~~~~~~~~~~~~~~~~~
> ./include/linux/build_bug.h:50:2: note: in expansion of macro ‘BUILD_BUG_ON_MSG’
>    50 |  BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
>       |  ^~~~~~~~~~~~~~~~
> ./arch/x86/include/asm/percpu.h:105:2: note: in expansion of macro
> ‘BUILD_BUG_ON’
>   105 |  BUILD_BUG_ON(!__same_type(_var, _val));    \
>       |  ^~~~~~~~~~~~
> ./arch/x86/include/asm/percpu.h:338:37: note: in expansion of macro
> ‘percpu_to_op’
>   338 | #define this_cpu_write_8(pcp, val)  percpu_to_op(8, volatile,
> "mov", (pcp), val)
>       |                                     ^~~~~~~~~~~~
> ./include/linux/percpu-defs.h:380:11: note: in expansion of macro
> ‘this_cpu_write_8’
>   380 |   case 8: stem##8(variable, __VA_ARGS__);break;  \
>       |           ^~~~
> ./include/linux/percpu-defs.h:508:34: note: in expansion of macro
> ‘__pcpu_size_call’
>   508 | #define this_cpu_write(pcp, val)
> __pcpu_size_call(this_cpu_write_, pcp, val)
>       |                                  ^~~~~~~~~~~~~~~~
> ./arch/x86/include/asm/fpu/internal.h:525:2: note: in expansion of
> macro ‘this_cpu_write’
>   525 |  this_cpu_write(fpu_fpregs_owner_ctx, NULL);
>       |  ^~~~~~~~~~~~~~
>
> >
> > It's interesting to know how this trick differs from
> > __builtin_types_compatible_p().  Might even be helpful to wrap this
> > pattern in a macro with a comment with the pros/cons of this approach
> > vs __same_type.
>
> I think the original code is more to catch a mismatch between pointers
> and integers.  It doesn't seem to care about truncation
>
> > On the other hand, the use of `long` seems tricky in x86 code as x86
> > (32b) is ILP32 but x86_64 (64b) is LP64.  So the use of `long` is
> > ambiguous in the sense that it's a different size depending on the
> > target ABI.  Wouldn't it potentially be a bug for x86 kernel code to
> > use `long` percpu variables (or rather mix `long` and `long long` in
> > the same operation) in that case, since the sizes of the two would be
> > different for i386?
>
> Not necessarily.  Some things like registers are naturally 32-bit on a
> 32-bit kernel and 64-bit on a 64-bit kernel, so 'long' is appropriate
> there.

Sorry for not getting back to this sooner, amazing how fast emails get
buried in an inbox.

Interesting findings.  Feels almost like a _Static_assert that the
sizeof these types match might be more straightforward, but I don't
need to nitpick pre-existing code that this patch simply carries
forward.  I realized I never signed off on this.

Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>

It looks like there's still an outstanding issue with patch 4/7?
https://lore.kernel.org/lkml/CAKwvOdnVU3kZnGzkYjEFJWMPuVjOmAHuRSB8FJ-Ks+FWzX2M_Q@mail.gmail.com/
-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2020-05-26 17:54 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-17 15:29 [PATCH 0/7] x86: Clean up percpu operations Brian Gerst
2020-05-17 15:29 ` [PATCH 1/7] x86/percpu: Introduce size abstraction macros Brian Gerst
2020-05-18 23:48   ` Nick Desaulniers
2020-05-17 15:29 ` [PATCH 2/7] x86/percpu: Clean up percpu_to_op() Brian Gerst
2020-05-18 21:15   ` Nick Desaulniers
2020-05-19  3:38     ` Brian Gerst
2020-05-20 17:26       ` Nick Desaulniers
2020-05-21 13:06         ` Brian Gerst
2020-05-26 17:54           ` Nick Desaulniers
2020-05-17 15:29 ` [PATCH 3/7] x86/percpu: Clean up percpu_from_op() Brian Gerst
2020-05-18 21:27   ` Nick Desaulniers
2020-05-17 15:29 ` [PATCH 4/7] x86/percpu: Clean up percpu_add_op() Brian Gerst
2020-05-17 19:18   ` kbuild test robot
2020-05-17 20:04   ` kbuild test robot
2020-05-18 23:42   ` Nick Desaulniers
2020-05-17 15:29 ` [PATCH 5/7] x86/percpu: Clean up percpu_add_return_op() Brian Gerst
2020-05-18 22:46   ` Nick Desaulniers
2020-05-18 23:45     ` Brian Gerst
2020-05-17 15:29 ` [PATCH 6/7] x86/percpu: Clean up percpu_xchg_op() Brian Gerst
2020-05-18 22:15   ` Nick Desaulniers
2020-05-17 15:29 ` [PATCH 7/7] x86/percpu: Clean up percpu_cmpxchg_op() Brian Gerst
2020-05-18 22:28   ` Nick Desaulniers
2020-05-18 19:19 ` [PATCH 0/7] x86: Clean up percpu operations Nick Desaulniers
2020-05-18 23:54   ` Nick Desaulniers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).