linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/10] x86: Clean up percpu operations
@ 2020-05-30 22:11 Brian Gerst
  2020-05-30 22:11 ` [PATCH v2 01/10] x86/percpu: Introduce size abstraction macros Brian Gerst
                   ` (11 more replies)
  0 siblings, 12 replies; 32+ messages in thread
From: Brian Gerst @ 2020-05-30 22:11 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin,
	Andy Lutomirski, Peter Zijlstra, Nick Desaulniers, Brian Gerst

The core percpu operations already have a switch on the width of the
data type, which resulted in an extra amount of dead code being
generated with the x86 operations having another switch.  This patch set
rewrites the x86 ops to remove the switch.  Additional cleanups are to
use named assembly operands, and to cast variables to the width used in
the assembly to make Clang happy.

Changes from v1:
- Add separate patch for XADD constraint fix
- Fixed sparse truncation warning
- Add cleanup of percpu_stable_op()
- Add patch to Remove PER_CPU()

Brian Gerst (10):
  x86/percpu: Introduce size abstraction macros
  x86/percpu: Clean up percpu_to_op()
  x86/percpu: Clean up percpu_from_op()
  x86/percpu: Clean up percpu_add_op()
  x86/percpu: Remove "e" constraint from XADD
  x86/percpu: Clean up percpu_add_return_op()
  x86/percpu: Clean up percpu_xchg_op()
  x86/percpu: Clean up percpu_cmpxchg_op()
  x86/percpu: Clean up percpu_stable_op()
  x86/percpu: Remove unused PER_CPU() macro

 arch/x86/include/asm/percpu.h | 510 ++++++++++++----------------------
 1 file changed, 172 insertions(+), 338 deletions(-)


base-commit: 229aaa8c059f2c908e0561453509f996f2b2d5c4
-- 
2.25.4


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2 01/10] x86/percpu: Introduce size abstraction macros
  2020-05-30 22:11 [PATCH v2 00/10] x86: Clean up percpu operations Brian Gerst
@ 2020-05-30 22:11 ` Brian Gerst
  2020-06-01 20:48   ` Nick Desaulniers
  2020-05-30 22:11 ` [PATCH v2 02/10] x86/percpu: Clean up percpu_to_op() Brian Gerst
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 32+ messages in thread
From: Brian Gerst @ 2020-05-30 22:11 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin,
	Andy Lutomirski, Peter Zijlstra, Nick Desaulniers, Brian Gerst

In preparation for cleaning up the percpu operations, define macros for
abstraction based on the width of the operation.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
---
 arch/x86/include/asm/percpu.h | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index 2278797c769d..19838e4f7a8f 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -87,6 +87,36 @@
  * don't give an lvalue though). */
 extern void __bad_percpu_size(void);
 
+#define __pcpu_type_1 u8
+#define __pcpu_type_2 u16
+#define __pcpu_type_4 u32
+#define __pcpu_type_8 u64
+
+#define __pcpu_cast_1(val) ((u8)(((unsigned long) val) & 0xff))
+#define __pcpu_cast_2(val) ((u16)(((unsigned long) val) & 0xffff))
+#define __pcpu_cast_4(val) ((u32)(((unsigned long) val) & 0xffffffff))
+#define __pcpu_cast_8(val) ((u64)(val))
+
+#define __pcpu_op1_1(op, dst) op "b " dst
+#define __pcpu_op1_2(op, dst) op "w " dst
+#define __pcpu_op1_4(op, dst) op "l " dst
+#define __pcpu_op1_8(op, dst) op "q " dst
+
+#define __pcpu_op2_1(op, src, dst) op "b " src ", " dst
+#define __pcpu_op2_2(op, src, dst) op "w " src ", " dst
+#define __pcpu_op2_4(op, src, dst) op "l " src ", " dst
+#define __pcpu_op2_8(op, src, dst) op "q " src ", " dst
+
+#define __pcpu_reg_1(mod, x) mod "q" (x)
+#define __pcpu_reg_2(mod, x) mod "r" (x)
+#define __pcpu_reg_4(mod, x) mod "r" (x)
+#define __pcpu_reg_8(mod, x) mod "r" (x)
+
+#define __pcpu_reg_imm_1(x) "qi" (x)
+#define __pcpu_reg_imm_2(x) "ri" (x)
+#define __pcpu_reg_imm_4(x) "ri" (x)
+#define __pcpu_reg_imm_8(x) "re" (x)
+
 #define percpu_to_op(qual, op, var, val)		\
 do {							\
 	typedef typeof(var) pto_T__;			\
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 02/10] x86/percpu: Clean up percpu_to_op()
  2020-05-30 22:11 [PATCH v2 00/10] x86: Clean up percpu operations Brian Gerst
  2020-05-30 22:11 ` [PATCH v2 01/10] x86/percpu: Introduce size abstraction macros Brian Gerst
@ 2020-05-30 22:11 ` Brian Gerst
  2020-07-09 10:30   ` Peter Zijlstra
  2020-05-30 22:11 ` [PATCH v2 03/10] x86/percpu: Clean up percpu_from_op() Brian Gerst
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 32+ messages in thread
From: Brian Gerst @ 2020-05-30 22:11 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin,
	Andy Lutomirski, Peter Zijlstra, Nick Desaulniers, Brian Gerst

The core percpu macros already have a switch on the data size, so the switch
in the x86 code is redundant and produces more dead code.

Also use appropriate types for the width of the instructions.  This avoids
errors when compiling with Clang.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
---
 arch/x86/include/asm/percpu.h | 90 ++++++++++++++---------------------
 1 file changed, 35 insertions(+), 55 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index 19838e4f7a8f..fb280fba94c5 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -117,37 +117,17 @@ extern void __bad_percpu_size(void);
 #define __pcpu_reg_imm_4(x) "ri" (x)
 #define __pcpu_reg_imm_8(x) "re" (x)
 
-#define percpu_to_op(qual, op, var, val)		\
-do {							\
-	typedef typeof(var) pto_T__;			\
-	if (0) {					\
-		pto_T__ pto_tmp__;			\
-		pto_tmp__ = (val);			\
-		(void)pto_tmp__;			\
-	}						\
-	switch (sizeof(var)) {				\
-	case 1:						\
-		asm qual (op "b %1,"__percpu_arg(0)	\
-		    : "+m" (var)			\
-		    : "qi" ((pto_T__)(val)));		\
-		break;					\
-	case 2:						\
-		asm qual (op "w %1,"__percpu_arg(0)	\
-		    : "+m" (var)			\
-		    : "ri" ((pto_T__)(val)));		\
-		break;					\
-	case 4:						\
-		asm qual (op "l %1,"__percpu_arg(0)	\
-		    : "+m" (var)			\
-		    : "ri" ((pto_T__)(val)));		\
-		break;					\
-	case 8:						\
-		asm qual (op "q %1,"__percpu_arg(0)	\
-		    : "+m" (var)			\
-		    : "re" ((pto_T__)(val)));		\
-		break;					\
-	default: __bad_percpu_size();			\
-	}						\
+#define percpu_to_op(size, qual, op, _var, _val)			\
+do {									\
+	__pcpu_type_##size pto_val__ = __pcpu_cast_##size(_val);	\
+	if (0) {		                                        \
+		typeof(_var) pto_tmp__;					\
+		pto_tmp__ = (_val);					\
+		(void)pto_tmp__;					\
+	}								\
+	asm qual(__pcpu_op2_##size(op, "%[val]", __percpu_arg([var]))	\
+	    : [var] "+m" (_var)						\
+	    : [val] __pcpu_reg_imm_##size(pto_val__));			\
 } while (0)
 
 /*
@@ -425,18 +405,18 @@ do {									\
 #define raw_cpu_read_2(pcp)		percpu_from_op(, "mov", pcp)
 #define raw_cpu_read_4(pcp)		percpu_from_op(, "mov", pcp)
 
-#define raw_cpu_write_1(pcp, val)	percpu_to_op(, "mov", (pcp), val)
-#define raw_cpu_write_2(pcp, val)	percpu_to_op(, "mov", (pcp), val)
-#define raw_cpu_write_4(pcp, val)	percpu_to_op(, "mov", (pcp), val)
+#define raw_cpu_write_1(pcp, val)	percpu_to_op(1, , "mov", (pcp), val)
+#define raw_cpu_write_2(pcp, val)	percpu_to_op(2, , "mov", (pcp), val)
+#define raw_cpu_write_4(pcp, val)	percpu_to_op(4, , "mov", (pcp), val)
 #define raw_cpu_add_1(pcp, val)		percpu_add_op(, (pcp), val)
 #define raw_cpu_add_2(pcp, val)		percpu_add_op(, (pcp), val)
 #define raw_cpu_add_4(pcp, val)		percpu_add_op(, (pcp), val)
-#define raw_cpu_and_1(pcp, val)		percpu_to_op(, "and", (pcp), val)
-#define raw_cpu_and_2(pcp, val)		percpu_to_op(, "and", (pcp), val)
-#define raw_cpu_and_4(pcp, val)		percpu_to_op(, "and", (pcp), val)
-#define raw_cpu_or_1(pcp, val)		percpu_to_op(, "or", (pcp), val)
-#define raw_cpu_or_2(pcp, val)		percpu_to_op(, "or", (pcp), val)
-#define raw_cpu_or_4(pcp, val)		percpu_to_op(, "or", (pcp), val)
+#define raw_cpu_and_1(pcp, val)		percpu_to_op(1, , "and", (pcp), val)
+#define raw_cpu_and_2(pcp, val)		percpu_to_op(2, , "and", (pcp), val)
+#define raw_cpu_and_4(pcp, val)		percpu_to_op(4, , "and", (pcp), val)
+#define raw_cpu_or_1(pcp, val)		percpu_to_op(1, , "or", (pcp), val)
+#define raw_cpu_or_2(pcp, val)		percpu_to_op(2, , "or", (pcp), val)
+#define raw_cpu_or_4(pcp, val)		percpu_to_op(4, , "or", (pcp), val)
 
 /*
  * raw_cpu_xchg() can use a load-store since it is not required to be
@@ -456,18 +436,18 @@ do {									\
 #define this_cpu_read_1(pcp)		percpu_from_op(volatile, "mov", pcp)
 #define this_cpu_read_2(pcp)		percpu_from_op(volatile, "mov", pcp)
 #define this_cpu_read_4(pcp)		percpu_from_op(volatile, "mov", pcp)
-#define this_cpu_write_1(pcp, val)	percpu_to_op(volatile, "mov", (pcp), val)
-#define this_cpu_write_2(pcp, val)	percpu_to_op(volatile, "mov", (pcp), val)
-#define this_cpu_write_4(pcp, val)	percpu_to_op(volatile, "mov", (pcp), val)
+#define this_cpu_write_1(pcp, val)	percpu_to_op(1, volatile, "mov", (pcp), val)
+#define this_cpu_write_2(pcp, val)	percpu_to_op(2, volatile, "mov", (pcp), val)
+#define this_cpu_write_4(pcp, val)	percpu_to_op(4, volatile, "mov", (pcp), val)
 #define this_cpu_add_1(pcp, val)	percpu_add_op(volatile, (pcp), val)
 #define this_cpu_add_2(pcp, val)	percpu_add_op(volatile, (pcp), val)
 #define this_cpu_add_4(pcp, val)	percpu_add_op(volatile, (pcp), val)
-#define this_cpu_and_1(pcp, val)	percpu_to_op(volatile, "and", (pcp), val)
-#define this_cpu_and_2(pcp, val)	percpu_to_op(volatile, "and", (pcp), val)
-#define this_cpu_and_4(pcp, val)	percpu_to_op(volatile, "and", (pcp), val)
-#define this_cpu_or_1(pcp, val)		percpu_to_op(volatile, "or", (pcp), val)
-#define this_cpu_or_2(pcp, val)		percpu_to_op(volatile, "or", (pcp), val)
-#define this_cpu_or_4(pcp, val)		percpu_to_op(volatile, "or", (pcp), val)
+#define this_cpu_and_1(pcp, val)	percpu_to_op(1, volatile, "and", (pcp), val)
+#define this_cpu_and_2(pcp, val)	percpu_to_op(2, volatile, "and", (pcp), val)
+#define this_cpu_and_4(pcp, val)	percpu_to_op(4, volatile, "and", (pcp), val)
+#define this_cpu_or_1(pcp, val)		percpu_to_op(1, volatile, "or", (pcp), val)
+#define this_cpu_or_2(pcp, val)		percpu_to_op(2, volatile, "or", (pcp), val)
+#define this_cpu_or_4(pcp, val)		percpu_to_op(4, volatile, "or", (pcp), val)
 #define this_cpu_xchg_1(pcp, nval)	percpu_xchg_op(volatile, pcp, nval)
 #define this_cpu_xchg_2(pcp, nval)	percpu_xchg_op(volatile, pcp, nval)
 #define this_cpu_xchg_4(pcp, nval)	percpu_xchg_op(volatile, pcp, nval)
@@ -509,19 +489,19 @@ do {									\
  */
 #ifdef CONFIG_X86_64
 #define raw_cpu_read_8(pcp)			percpu_from_op(, "mov", pcp)
-#define raw_cpu_write_8(pcp, val)		percpu_to_op(, "mov", (pcp), val)
+#define raw_cpu_write_8(pcp, val)		percpu_to_op(8, , "mov", (pcp), val)
 #define raw_cpu_add_8(pcp, val)			percpu_add_op(, (pcp), val)
-#define raw_cpu_and_8(pcp, val)			percpu_to_op(, "and", (pcp), val)
-#define raw_cpu_or_8(pcp, val)			percpu_to_op(, "or", (pcp), val)
+#define raw_cpu_and_8(pcp, val)			percpu_to_op(8, , "and", (pcp), val)
+#define raw_cpu_or_8(pcp, val)			percpu_to_op(8, , "or", (pcp), val)
 #define raw_cpu_add_return_8(pcp, val)		percpu_add_return_op(, pcp, val)
 #define raw_cpu_xchg_8(pcp, nval)		raw_percpu_xchg_op(pcp, nval)
 #define raw_cpu_cmpxchg_8(pcp, oval, nval)	percpu_cmpxchg_op(, pcp, oval, nval)
 
 #define this_cpu_read_8(pcp)			percpu_from_op(volatile, "mov", pcp)
-#define this_cpu_write_8(pcp, val)		percpu_to_op(volatile, "mov", (pcp), val)
+#define this_cpu_write_8(pcp, val)		percpu_to_op(8, volatile, "mov", (pcp), val)
 #define this_cpu_add_8(pcp, val)		percpu_add_op(volatile, (pcp), val)
-#define this_cpu_and_8(pcp, val)		percpu_to_op(volatile, "and", (pcp), val)
-#define this_cpu_or_8(pcp, val)			percpu_to_op(volatile, "or", (pcp), val)
+#define this_cpu_and_8(pcp, val)		percpu_to_op(8, volatile, "and", (pcp), val)
+#define this_cpu_or_8(pcp, val)			percpu_to_op(8, volatile, "or", (pcp), val)
 #define this_cpu_add_return_8(pcp, val)		percpu_add_return_op(volatile, pcp, val)
 #define this_cpu_xchg_8(pcp, nval)		percpu_xchg_op(volatile, pcp, nval)
 #define this_cpu_cmpxchg_8(pcp, oval, nval)	percpu_cmpxchg_op(volatile, pcp, oval, nval)
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 03/10] x86/percpu: Clean up percpu_from_op()
  2020-05-30 22:11 [PATCH v2 00/10] x86: Clean up percpu operations Brian Gerst
  2020-05-30 22:11 ` [PATCH v2 01/10] x86/percpu: Introduce size abstraction macros Brian Gerst
  2020-05-30 22:11 ` [PATCH v2 02/10] x86/percpu: Clean up percpu_to_op() Brian Gerst
@ 2020-05-30 22:11 ` Brian Gerst
  2020-05-30 22:11 ` [PATCH v2 04/10] x86/percpu: Clean up percpu_add_op() Brian Gerst
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 32+ messages in thread
From: Brian Gerst @ 2020-05-30 22:11 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin,
	Andy Lutomirski, Peter Zijlstra, Nick Desaulniers, Brian Gerst

The core percpu macros already have a switch on the data size, so the switch
in the x86 code is redundant and produces more dead code.

Also use appropriate types for the width of the instructions.  This avoids
errors when compiling with Clang.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
---
 arch/x86/include/asm/percpu.h | 50 +++++++++++------------------------
 1 file changed, 15 insertions(+), 35 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index fb280fba94c5..a40d2e055f58 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -190,33 +190,13 @@ do {									\
 	}								\
 } while (0)
 
-#define percpu_from_op(qual, op, var)			\
-({							\
-	typeof(var) pfo_ret__;				\
-	switch (sizeof(var)) {				\
-	case 1:						\
-		asm qual (op "b "__percpu_arg(1)",%0"	\
-		    : "=q" (pfo_ret__)			\
-		    : "m" (var));			\
-		break;					\
-	case 2:						\
-		asm qual (op "w "__percpu_arg(1)",%0"	\
-		    : "=r" (pfo_ret__)			\
-		    : "m" (var));			\
-		break;					\
-	case 4:						\
-		asm qual (op "l "__percpu_arg(1)",%0"	\
-		    : "=r" (pfo_ret__)			\
-		    : "m" (var));			\
-		break;					\
-	case 8:						\
-		asm qual (op "q "__percpu_arg(1)",%0"	\
-		    : "=r" (pfo_ret__)			\
-		    : "m" (var));			\
-		break;					\
-	default: __bad_percpu_size();			\
-	}						\
-	pfo_ret__;					\
+#define percpu_from_op(size, qual, op, _var)				\
+({									\
+	__pcpu_type_##size pfo_val__;					\
+	asm qual (__pcpu_op2_##size(op, __percpu_arg([var]), "%[val]")	\
+	    : [val] __pcpu_reg_##size("=", pfo_val__)			\
+	    : [var] "m" (_var));					\
+	(typeof(_var))(unsigned long) pfo_val__;			\
 })
 
 #define percpu_stable_op(op, var)			\
@@ -401,9 +381,9 @@ do {									\
  */
 #define this_cpu_read_stable(var)	percpu_stable_op("mov", var)
 
-#define raw_cpu_read_1(pcp)		percpu_from_op(, "mov", pcp)
-#define raw_cpu_read_2(pcp)		percpu_from_op(, "mov", pcp)
-#define raw_cpu_read_4(pcp)		percpu_from_op(, "mov", pcp)
+#define raw_cpu_read_1(pcp)		percpu_from_op(1, , "mov", pcp)
+#define raw_cpu_read_2(pcp)		percpu_from_op(2, , "mov", pcp)
+#define raw_cpu_read_4(pcp)		percpu_from_op(4, , "mov", pcp)
 
 #define raw_cpu_write_1(pcp, val)	percpu_to_op(1, , "mov", (pcp), val)
 #define raw_cpu_write_2(pcp, val)	percpu_to_op(2, , "mov", (pcp), val)
@@ -433,9 +413,9 @@ do {									\
 #define raw_cpu_xchg_2(pcp, val)	raw_percpu_xchg_op(pcp, val)
 #define raw_cpu_xchg_4(pcp, val)	raw_percpu_xchg_op(pcp, val)
 
-#define this_cpu_read_1(pcp)		percpu_from_op(volatile, "mov", pcp)
-#define this_cpu_read_2(pcp)		percpu_from_op(volatile, "mov", pcp)
-#define this_cpu_read_4(pcp)		percpu_from_op(volatile, "mov", pcp)
+#define this_cpu_read_1(pcp)		percpu_from_op(1, volatile, "mov", pcp)
+#define this_cpu_read_2(pcp)		percpu_from_op(2, volatile, "mov", pcp)
+#define this_cpu_read_4(pcp)		percpu_from_op(4, volatile, "mov", pcp)
 #define this_cpu_write_1(pcp, val)	percpu_to_op(1, volatile, "mov", (pcp), val)
 #define this_cpu_write_2(pcp, val)	percpu_to_op(2, volatile, "mov", (pcp), val)
 #define this_cpu_write_4(pcp, val)	percpu_to_op(4, volatile, "mov", (pcp), val)
@@ -488,7 +468,7 @@ do {									\
  * 32 bit must fall back to generic operations.
  */
 #ifdef CONFIG_X86_64
-#define raw_cpu_read_8(pcp)			percpu_from_op(, "mov", pcp)
+#define raw_cpu_read_8(pcp)			percpu_from_op(8, , "mov", pcp)
 #define raw_cpu_write_8(pcp, val)		percpu_to_op(8, , "mov", (pcp), val)
 #define raw_cpu_add_8(pcp, val)			percpu_add_op(, (pcp), val)
 #define raw_cpu_and_8(pcp, val)			percpu_to_op(8, , "and", (pcp), val)
@@ -497,7 +477,7 @@ do {									\
 #define raw_cpu_xchg_8(pcp, nval)		raw_percpu_xchg_op(pcp, nval)
 #define raw_cpu_cmpxchg_8(pcp, oval, nval)	percpu_cmpxchg_op(, pcp, oval, nval)
 
-#define this_cpu_read_8(pcp)			percpu_from_op(volatile, "mov", pcp)
+#define this_cpu_read_8(pcp)			percpu_from_op(8, volatile, "mov", pcp)
 #define this_cpu_write_8(pcp, val)		percpu_to_op(8, volatile, "mov", (pcp), val)
 #define this_cpu_add_8(pcp, val)		percpu_add_op(volatile, (pcp), val)
 #define this_cpu_and_8(pcp, val)		percpu_to_op(8, volatile, "and", (pcp), val)
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 04/10] x86/percpu: Clean up percpu_add_op()
  2020-05-30 22:11 [PATCH v2 00/10] x86: Clean up percpu operations Brian Gerst
                   ` (2 preceding siblings ...)
  2020-05-30 22:11 ` [PATCH v2 03/10] x86/percpu: Clean up percpu_from_op() Brian Gerst
@ 2020-05-30 22:11 ` Brian Gerst
  2020-06-01 20:18   ` Nick Desaulniers
  2020-05-30 22:11 ` [PATCH v2 05/10] x86/percpu: Remove "e" constraint from XADD Brian Gerst
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 32+ messages in thread
From: Brian Gerst @ 2020-05-30 22:11 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin,
	Andy Lutomirski, Peter Zijlstra, Nick Desaulniers, Brian Gerst

The core percpu macros already have a switch on the data size, so the switch
in the x86 code is redundant and produces more dead code.

Also use appropriate types for the width of the instructions.  This avoids
errors when compiling with Clang.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
---
 arch/x86/include/asm/percpu.h | 99 ++++++++---------------------------
 1 file changed, 22 insertions(+), 77 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index a40d2e055f58..2a24f3c795eb 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -130,64 +130,32 @@ do {									\
 	    : [val] __pcpu_reg_imm_##size(pto_val__));			\
 } while (0)
 
+#define percpu_unary_op(size, qual, op, _var)				\
+({									\
+	asm qual (__pcpu_op1_##size(op, __percpu_arg([var]))		\
+	    : [var] "+m" (_var));					\
+})
+
 /*
  * Generate a percpu add to memory instruction and optimize code
  * if one is added or subtracted.
  */
-#define percpu_add_op(qual, var, val)					\
+#define percpu_add_op(size, qual, var, val)				\
 do {									\
-	typedef typeof(var) pao_T__;					\
 	const int pao_ID__ = (__builtin_constant_p(val) &&		\
 			      ((val) == 1 || (val) == -1)) ?		\
 				(int)(val) : 0;				\
 	if (0) {							\
-		pao_T__ pao_tmp__;					\
+		typeof(var) pao_tmp__;					\
 		pao_tmp__ = (val);					\
 		(void)pao_tmp__;					\
 	}								\
-	switch (sizeof(var)) {						\
-	case 1:								\
-		if (pao_ID__ == 1)					\
-			asm qual ("incb "__percpu_arg(0) : "+m" (var));	\
-		else if (pao_ID__ == -1)				\
-			asm qual ("decb "__percpu_arg(0) : "+m" (var));	\
-		else							\
-			asm qual ("addb %1, "__percpu_arg(0)		\
-			    : "+m" (var)				\
-			    : "qi" ((pao_T__)(val)));			\
-		break;							\
-	case 2:								\
-		if (pao_ID__ == 1)					\
-			asm qual ("incw "__percpu_arg(0) : "+m" (var));	\
-		else if (pao_ID__ == -1)				\
-			asm qual ("decw "__percpu_arg(0) : "+m" (var));	\
-		else							\
-			asm qual ("addw %1, "__percpu_arg(0)		\
-			    : "+m" (var)				\
-			    : "ri" ((pao_T__)(val)));			\
-		break;							\
-	case 4:								\
-		if (pao_ID__ == 1)					\
-			asm qual ("incl "__percpu_arg(0) : "+m" (var));	\
-		else if (pao_ID__ == -1)				\
-			asm qual ("decl "__percpu_arg(0) : "+m" (var));	\
-		else							\
-			asm qual ("addl %1, "__percpu_arg(0)		\
-			    : "+m" (var)				\
-			    : "ri" ((pao_T__)(val)));			\
-		break;							\
-	case 8:								\
-		if (pao_ID__ == 1)					\
-			asm qual ("incq "__percpu_arg(0) : "+m" (var));	\
-		else if (pao_ID__ == -1)				\
-			asm qual ("decq "__percpu_arg(0) : "+m" (var));	\
-		else							\
-			asm qual ("addq %1, "__percpu_arg(0)		\
-			    : "+m" (var)				\
-			    : "re" ((pao_T__)(val)));			\
-		break;							\
-	default: __bad_percpu_size();					\
-	}								\
+	if (pao_ID__ == 1)						\
+		percpu_unary_op(size, qual, "inc", var);		\
+	else if (pao_ID__ == -1)					\
+		percpu_unary_op(size, qual, "dec", var);		\
+	else								\
+		percpu_to_op(size, qual, "add", var, val);		\
 } while (0)
 
 #define percpu_from_op(size, qual, op, _var)				\
@@ -228,29 +196,6 @@ do {									\
 	pfo_ret__;					\
 })
 
-#define percpu_unary_op(qual, op, var)			\
-({							\
-	switch (sizeof(var)) {				\
-	case 1:						\
-		asm qual (op "b "__percpu_arg(0)	\
-		    : "+m" (var));			\
-		break;					\
-	case 2:						\
-		asm qual (op "w "__percpu_arg(0)	\
-		    : "+m" (var));			\
-		break;					\
-	case 4:						\
-		asm qual (op "l "__percpu_arg(0)	\
-		    : "+m" (var));			\
-		break;					\
-	case 8:						\
-		asm qual (op "q "__percpu_arg(0)	\
-		    : "+m" (var));			\
-		break;					\
-	default: __bad_percpu_size();			\
-	}						\
-})
-
 /*
  * Add return operation
  */
@@ -388,9 +333,9 @@ do {									\
 #define raw_cpu_write_1(pcp, val)	percpu_to_op(1, , "mov", (pcp), val)
 #define raw_cpu_write_2(pcp, val)	percpu_to_op(2, , "mov", (pcp), val)
 #define raw_cpu_write_4(pcp, val)	percpu_to_op(4, , "mov", (pcp), val)
-#define raw_cpu_add_1(pcp, val)		percpu_add_op(, (pcp), val)
-#define raw_cpu_add_2(pcp, val)		percpu_add_op(, (pcp), val)
-#define raw_cpu_add_4(pcp, val)		percpu_add_op(, (pcp), val)
+#define raw_cpu_add_1(pcp, val)		percpu_add_op(1, , (pcp), val)
+#define raw_cpu_add_2(pcp, val)		percpu_add_op(2, , (pcp), val)
+#define raw_cpu_add_4(pcp, val)		percpu_add_op(4, , (pcp), val)
 #define raw_cpu_and_1(pcp, val)		percpu_to_op(1, , "and", (pcp), val)
 #define raw_cpu_and_2(pcp, val)		percpu_to_op(2, , "and", (pcp), val)
 #define raw_cpu_and_4(pcp, val)		percpu_to_op(4, , "and", (pcp), val)
@@ -419,9 +364,9 @@ do {									\
 #define this_cpu_write_1(pcp, val)	percpu_to_op(1, volatile, "mov", (pcp), val)
 #define this_cpu_write_2(pcp, val)	percpu_to_op(2, volatile, "mov", (pcp), val)
 #define this_cpu_write_4(pcp, val)	percpu_to_op(4, volatile, "mov", (pcp), val)
-#define this_cpu_add_1(pcp, val)	percpu_add_op(volatile, (pcp), val)
-#define this_cpu_add_2(pcp, val)	percpu_add_op(volatile, (pcp), val)
-#define this_cpu_add_4(pcp, val)	percpu_add_op(volatile, (pcp), val)
+#define this_cpu_add_1(pcp, val)	percpu_add_op(1, volatile, (pcp), val)
+#define this_cpu_add_2(pcp, val)	percpu_add_op(2, volatile, (pcp), val)
+#define this_cpu_add_4(pcp, val)	percpu_add_op(4, volatile, (pcp), val)
 #define this_cpu_and_1(pcp, val)	percpu_to_op(1, volatile, "and", (pcp), val)
 #define this_cpu_and_2(pcp, val)	percpu_to_op(2, volatile, "and", (pcp), val)
 #define this_cpu_and_4(pcp, val)	percpu_to_op(4, volatile, "and", (pcp), val)
@@ -470,7 +415,7 @@ do {									\
 #ifdef CONFIG_X86_64
 #define raw_cpu_read_8(pcp)			percpu_from_op(8, , "mov", pcp)
 #define raw_cpu_write_8(pcp, val)		percpu_to_op(8, , "mov", (pcp), val)
-#define raw_cpu_add_8(pcp, val)			percpu_add_op(, (pcp), val)
+#define raw_cpu_add_8(pcp, val)			percpu_add_op(8, , (pcp), val)
 #define raw_cpu_and_8(pcp, val)			percpu_to_op(8, , "and", (pcp), val)
 #define raw_cpu_or_8(pcp, val)			percpu_to_op(8, , "or", (pcp), val)
 #define raw_cpu_add_return_8(pcp, val)		percpu_add_return_op(, pcp, val)
@@ -479,7 +424,7 @@ do {									\
 
 #define this_cpu_read_8(pcp)			percpu_from_op(8, volatile, "mov", pcp)
 #define this_cpu_write_8(pcp, val)		percpu_to_op(8, volatile, "mov", (pcp), val)
-#define this_cpu_add_8(pcp, val)		percpu_add_op(volatile, (pcp), val)
+#define this_cpu_add_8(pcp, val)		percpu_add_op(8, volatile, (pcp), val)
 #define this_cpu_and_8(pcp, val)		percpu_to_op(8, volatile, "and", (pcp), val)
 #define this_cpu_or_8(pcp, val)			percpu_to_op(8, volatile, "or", (pcp), val)
 #define this_cpu_add_return_8(pcp, val)		percpu_add_return_op(volatile, pcp, val)
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 05/10] x86/percpu: Remove "e" constraint from XADD
  2020-05-30 22:11 [PATCH v2 00/10] x86: Clean up percpu operations Brian Gerst
                   ` (3 preceding siblings ...)
  2020-05-30 22:11 ` [PATCH v2 04/10] x86/percpu: Clean up percpu_add_op() Brian Gerst
@ 2020-05-30 22:11 ` Brian Gerst
  2020-06-01 18:50   ` Nick Desaulniers
  2020-05-30 22:11 ` [PATCH v2 06/10] x86/percpu: Clean up percpu_add_return_op() Brian Gerst
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 32+ messages in thread
From: Brian Gerst @ 2020-05-30 22:11 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin,
	Andy Lutomirski, Peter Zijlstra, Nick Desaulniers, Brian Gerst

The "e" constraint represents a constant, but the XADD instruction doesn't
accept immediate operands.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
---
 arch/x86/include/asm/percpu.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index 2a24f3c795eb..9bb5440d98d3 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -220,7 +220,7 @@ do {									\
 		break;							\
 	case 8:								\
 		asm qual ("xaddq %0, "__percpu_arg(1)			\
-			    : "+re" (paro_ret__), "+m" (var)		\
+			    : "+r" (paro_ret__), "+m" (var)		\
 			    : : "memory");				\
 		break;							\
 	default: __bad_percpu_size();					\
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 06/10] x86/percpu: Clean up percpu_add_return_op()
  2020-05-30 22:11 [PATCH v2 00/10] x86: Clean up percpu operations Brian Gerst
                   ` (4 preceding siblings ...)
  2020-05-30 22:11 ` [PATCH v2 05/10] x86/percpu: Remove "e" constraint from XADD Brian Gerst
@ 2020-05-30 22:11 ` Brian Gerst
  2020-06-01 19:47   ` Nick Desaulniers
  2020-05-30 22:11 ` [PATCH v2 07/10] x86/percpu: Clean up percpu_xchg_op() Brian Gerst
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 32+ messages in thread
From: Brian Gerst @ 2020-05-30 22:11 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin,
	Andy Lutomirski, Peter Zijlstra, Nick Desaulniers, Brian Gerst

The core percpu macros already have a switch on the data size, so the switch
in the x86 code is redundant and produces more dead code.

Also use appropriate types for the width of the instructions.  This avoids
errors when compiling with Clang.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
---
 arch/x86/include/asm/percpu.h | 51 +++++++++++------------------------
 1 file changed, 16 insertions(+), 35 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index 9bb5440d98d3..0776a11e7e11 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -199,34 +199,15 @@ do {									\
 /*
  * Add return operation
  */
-#define percpu_add_return_op(qual, var, val)				\
+#define percpu_add_return_op(size, qual, _var, _val)			\
 ({									\
-	typeof(var) paro_ret__ = val;					\
-	switch (sizeof(var)) {						\
-	case 1:								\
-		asm qual ("xaddb %0, "__percpu_arg(1)			\
-			    : "+q" (paro_ret__), "+m" (var)		\
-			    : : "memory");				\
-		break;							\
-	case 2:								\
-		asm qual ("xaddw %0, "__percpu_arg(1)			\
-			    : "+r" (paro_ret__), "+m" (var)		\
-			    : : "memory");				\
-		break;							\
-	case 4:								\
-		asm qual ("xaddl %0, "__percpu_arg(1)			\
-			    : "+r" (paro_ret__), "+m" (var)		\
-			    : : "memory");				\
-		break;							\
-	case 8:								\
-		asm qual ("xaddq %0, "__percpu_arg(1)			\
-			    : "+r" (paro_ret__), "+m" (var)		\
-			    : : "memory");				\
-		break;							\
-	default: __bad_percpu_size();					\
-	}								\
-	paro_ret__ += val;						\
-	paro_ret__;							\
+	__pcpu_type_##size paro_tmp__ = __pcpu_cast_##size(_val);	\
+	asm qual (__pcpu_op2_##size("xadd", "%[tmp]",			\
+				     __percpu_arg([var]))		\
+		  : [tmp] __pcpu_reg_##size("+", paro_tmp__),		\
+		    [var] "+m" (_var)					\
+		  : : "memory");					\
+	(typeof(_var))(unsigned long) (paro_tmp__ + _val);		\
 })
 
 /*
@@ -377,16 +358,16 @@ do {									\
 #define this_cpu_xchg_2(pcp, nval)	percpu_xchg_op(volatile, pcp, nval)
 #define this_cpu_xchg_4(pcp, nval)	percpu_xchg_op(volatile, pcp, nval)
 
-#define raw_cpu_add_return_1(pcp, val)		percpu_add_return_op(, pcp, val)
-#define raw_cpu_add_return_2(pcp, val)		percpu_add_return_op(, pcp, val)
-#define raw_cpu_add_return_4(pcp, val)		percpu_add_return_op(, pcp, val)
+#define raw_cpu_add_return_1(pcp, val)		percpu_add_return_op(1, , pcp, val)
+#define raw_cpu_add_return_2(pcp, val)		percpu_add_return_op(2, , pcp, val)
+#define raw_cpu_add_return_4(pcp, val)		percpu_add_return_op(4, , pcp, val)
 #define raw_cpu_cmpxchg_1(pcp, oval, nval)	percpu_cmpxchg_op(, pcp, oval, nval)
 #define raw_cpu_cmpxchg_2(pcp, oval, nval)	percpu_cmpxchg_op(, pcp, oval, nval)
 #define raw_cpu_cmpxchg_4(pcp, oval, nval)	percpu_cmpxchg_op(, pcp, oval, nval)
 
-#define this_cpu_add_return_1(pcp, val)		percpu_add_return_op(volatile, pcp, val)
-#define this_cpu_add_return_2(pcp, val)		percpu_add_return_op(volatile, pcp, val)
-#define this_cpu_add_return_4(pcp, val)		percpu_add_return_op(volatile, pcp, val)
+#define this_cpu_add_return_1(pcp, val)		percpu_add_return_op(1, volatile, pcp, val)
+#define this_cpu_add_return_2(pcp, val)		percpu_add_return_op(2, volatile, pcp, val)
+#define this_cpu_add_return_4(pcp, val)		percpu_add_return_op(4, volatile, pcp, val)
 #define this_cpu_cmpxchg_1(pcp, oval, nval)	percpu_cmpxchg_op(volatile, pcp, oval, nval)
 #define this_cpu_cmpxchg_2(pcp, oval, nval)	percpu_cmpxchg_op(volatile, pcp, oval, nval)
 #define this_cpu_cmpxchg_4(pcp, oval, nval)	percpu_cmpxchg_op(volatile, pcp, oval, nval)
@@ -418,7 +399,7 @@ do {									\
 #define raw_cpu_add_8(pcp, val)			percpu_add_op(8, , (pcp), val)
 #define raw_cpu_and_8(pcp, val)			percpu_to_op(8, , "and", (pcp), val)
 #define raw_cpu_or_8(pcp, val)			percpu_to_op(8, , "or", (pcp), val)
-#define raw_cpu_add_return_8(pcp, val)		percpu_add_return_op(, pcp, val)
+#define raw_cpu_add_return_8(pcp, val)		percpu_add_return_op(8, , pcp, val)
 #define raw_cpu_xchg_8(pcp, nval)		raw_percpu_xchg_op(pcp, nval)
 #define raw_cpu_cmpxchg_8(pcp, oval, nval)	percpu_cmpxchg_op(, pcp, oval, nval)
 
@@ -427,7 +408,7 @@ do {									\
 #define this_cpu_add_8(pcp, val)		percpu_add_op(8, volatile, (pcp), val)
 #define this_cpu_and_8(pcp, val)		percpu_to_op(8, volatile, "and", (pcp), val)
 #define this_cpu_or_8(pcp, val)			percpu_to_op(8, volatile, "or", (pcp), val)
-#define this_cpu_add_return_8(pcp, val)		percpu_add_return_op(volatile, pcp, val)
+#define this_cpu_add_return_8(pcp, val)		percpu_add_return_op(8, volatile, pcp, val)
 #define this_cpu_xchg_8(pcp, nval)		percpu_xchg_op(volatile, pcp, nval)
 #define this_cpu_cmpxchg_8(pcp, oval, nval)	percpu_cmpxchg_op(volatile, pcp, oval, nval)
 
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 07/10] x86/percpu: Clean up percpu_xchg_op()
  2020-05-30 22:11 [PATCH v2 00/10] x86: Clean up percpu operations Brian Gerst
                   ` (5 preceding siblings ...)
  2020-05-30 22:11 ` [PATCH v2 06/10] x86/percpu: Clean up percpu_add_return_op() Brian Gerst
@ 2020-05-30 22:11 ` Brian Gerst
  2020-05-30 22:11 ` [PATCH v2 08/10] x86/percpu: Clean up percpu_cmpxchg_op() Brian Gerst
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 32+ messages in thread
From: Brian Gerst @ 2020-05-30 22:11 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin,
	Andy Lutomirski, Peter Zijlstra, Nick Desaulniers, Brian Gerst

The core percpu macros already have a switch on the data size, so the switch
in the x86 code is redundant and produces more dead code.

Also use appropriate types for the width of the instructions.  This avoids
errors when compiling with Clang.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
---
 arch/x86/include/asm/percpu.h | 61 +++++++++++------------------------
 1 file changed, 18 insertions(+), 43 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index 0776a11e7e11..ac6d7e76c0d4 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -215,46 +215,21 @@ do {									\
  * expensive due to the implied lock prefix.  The processor cannot prefetch
  * cachelines if xchg is used.
  */
-#define percpu_xchg_op(qual, var, nval)					\
+#define percpu_xchg_op(size, qual, _var, _nval)				\
 ({									\
-	typeof(var) pxo_ret__;						\
-	typeof(var) pxo_new__ = (nval);					\
-	switch (sizeof(var)) {						\
-	case 1:								\
-		asm qual ("\n\tmov "__percpu_arg(1)",%%al"		\
-		    "\n1:\tcmpxchgb %2, "__percpu_arg(1)		\
-		    "\n\tjnz 1b"					\
-			    : "=&a" (pxo_ret__), "+m" (var)		\
-			    : "q" (pxo_new__)				\
-			    : "memory");				\
-		break;							\
-	case 2:								\
-		asm qual ("\n\tmov "__percpu_arg(1)",%%ax"		\
-		    "\n1:\tcmpxchgw %2, "__percpu_arg(1)		\
-		    "\n\tjnz 1b"					\
-			    : "=&a" (pxo_ret__), "+m" (var)		\
-			    : "r" (pxo_new__)				\
-			    : "memory");				\
-		break;							\
-	case 4:								\
-		asm qual ("\n\tmov "__percpu_arg(1)",%%eax"		\
-		    "\n1:\tcmpxchgl %2, "__percpu_arg(1)		\
-		    "\n\tjnz 1b"					\
-			    : "=&a" (pxo_ret__), "+m" (var)		\
-			    : "r" (pxo_new__)				\
-			    : "memory");				\
-		break;							\
-	case 8:								\
-		asm qual ("\n\tmov "__percpu_arg(1)",%%rax"		\
-		    "\n1:\tcmpxchgq %2, "__percpu_arg(1)		\
-		    "\n\tjnz 1b"					\
-			    : "=&a" (pxo_ret__), "+m" (var)		\
-			    : "r" (pxo_new__)				\
-			    : "memory");				\
-		break;							\
-	default: __bad_percpu_size();					\
-	}								\
-	pxo_ret__;							\
+	__pcpu_type_##size pxo_old__;					\
+	__pcpu_type_##size pxo_new__ = __pcpu_cast_##size(_nval);	\
+	asm qual (__pcpu_op2_##size("mov", __percpu_arg([var]),		\
+				    "%[oval]")				\
+		  "\n1:\t"						\
+		  __pcpu_op2_##size("cmpxchg", "%[nval]",		\
+				    __percpu_arg([var]))		\
+		  "\n\tjnz 1b"						\
+		  : [oval] "=&a" (pxo_old__),				\
+		    [var] "+m" (_var)					\
+		  : [nval] __pcpu_reg_##size(, pxo_new__)		\
+		  : "memory");						\
+	(typeof(_var))(unsigned long) pxo_old__;			\
 })
 
 /*
@@ -354,9 +329,9 @@ do {									\
 #define this_cpu_or_1(pcp, val)		percpu_to_op(1, volatile, "or", (pcp), val)
 #define this_cpu_or_2(pcp, val)		percpu_to_op(2, volatile, "or", (pcp), val)
 #define this_cpu_or_4(pcp, val)		percpu_to_op(4, volatile, "or", (pcp), val)
-#define this_cpu_xchg_1(pcp, nval)	percpu_xchg_op(volatile, pcp, nval)
-#define this_cpu_xchg_2(pcp, nval)	percpu_xchg_op(volatile, pcp, nval)
-#define this_cpu_xchg_4(pcp, nval)	percpu_xchg_op(volatile, pcp, nval)
+#define this_cpu_xchg_1(pcp, nval)	percpu_xchg_op(1, volatile, pcp, nval)
+#define this_cpu_xchg_2(pcp, nval)	percpu_xchg_op(2, volatile, pcp, nval)
+#define this_cpu_xchg_4(pcp, nval)	percpu_xchg_op(4, volatile, pcp, nval)
 
 #define raw_cpu_add_return_1(pcp, val)		percpu_add_return_op(1, , pcp, val)
 #define raw_cpu_add_return_2(pcp, val)		percpu_add_return_op(2, , pcp, val)
@@ -409,7 +384,7 @@ do {									\
 #define this_cpu_and_8(pcp, val)		percpu_to_op(8, volatile, "and", (pcp), val)
 #define this_cpu_or_8(pcp, val)			percpu_to_op(8, volatile, "or", (pcp), val)
 #define this_cpu_add_return_8(pcp, val)		percpu_add_return_op(8, volatile, pcp, val)
-#define this_cpu_xchg_8(pcp, nval)		percpu_xchg_op(volatile, pcp, nval)
+#define this_cpu_xchg_8(pcp, nval)		percpu_xchg_op(8, volatile, pcp, nval)
 #define this_cpu_cmpxchg_8(pcp, oval, nval)	percpu_cmpxchg_op(volatile, pcp, oval, nval)
 
 /*
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 08/10] x86/percpu: Clean up percpu_cmpxchg_op()
  2020-05-30 22:11 [PATCH v2 00/10] x86: Clean up percpu operations Brian Gerst
                   ` (6 preceding siblings ...)
  2020-05-30 22:11 ` [PATCH v2 07/10] x86/percpu: Clean up percpu_xchg_op() Brian Gerst
@ 2020-05-30 22:11 ` Brian Gerst
  2020-05-30 22:11 ` [PATCH v2 09/10] x86/percpu: Clean up percpu_stable_op() Brian Gerst
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 32+ messages in thread
From: Brian Gerst @ 2020-05-30 22:11 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin,
	Andy Lutomirski, Peter Zijlstra, Nick Desaulniers, Brian Gerst

The core percpu macros already have a switch on the data size, so the switch
in the x86 code is redundant and produces more dead code.

Also use appropriate types for the width of the instructions.  This avoids
errors when compiling with Clang.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
---
 arch/x86/include/asm/percpu.h | 58 +++++++++++------------------------
 1 file changed, 18 insertions(+), 40 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index ac6d7e76c0d4..7efc0b5c4ff0 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -236,39 +236,17 @@ do {									\
  * cmpxchg has no such implied lock semantics as a result it is much
  * more efficient for cpu local operations.
  */
-#define percpu_cmpxchg_op(qual, var, oval, nval)			\
+#define percpu_cmpxchg_op(size, qual, _var, _oval, _nval)		\
 ({									\
-	typeof(var) pco_ret__;						\
-	typeof(var) pco_old__ = (oval);					\
-	typeof(var) pco_new__ = (nval);					\
-	switch (sizeof(var)) {						\
-	case 1:								\
-		asm qual ("cmpxchgb %2, "__percpu_arg(1)		\
-			    : "=a" (pco_ret__), "+m" (var)		\
-			    : "q" (pco_new__), "0" (pco_old__)		\
-			    : "memory");				\
-		break;							\
-	case 2:								\
-		asm qual ("cmpxchgw %2, "__percpu_arg(1)		\
-			    : "=a" (pco_ret__), "+m" (var)		\
-			    : "r" (pco_new__), "0" (pco_old__)		\
-			    : "memory");				\
-		break;							\
-	case 4:								\
-		asm qual ("cmpxchgl %2, "__percpu_arg(1)		\
-			    : "=a" (pco_ret__), "+m" (var)		\
-			    : "r" (pco_new__), "0" (pco_old__)		\
-			    : "memory");				\
-		break;							\
-	case 8:								\
-		asm qual ("cmpxchgq %2, "__percpu_arg(1)		\
-			    : "=a" (pco_ret__), "+m" (var)		\
-			    : "r" (pco_new__), "0" (pco_old__)		\
-			    : "memory");				\
-		break;							\
-	default: __bad_percpu_size();					\
-	}								\
-	pco_ret__;							\
+	__pcpu_type_##size pco_old__ = __pcpu_cast_##size(_oval);	\
+	__pcpu_type_##size pco_new__ = __pcpu_cast_##size(_nval);	\
+	asm qual (__pcpu_op2_##size("cmpxchg", "%[nval]",		\
+				    __percpu_arg([var]))		\
+		  : [oval] "+a" (pco_old__),				\
+		    [var] "+m" (_var)					\
+		  : [nval] __pcpu_reg_##size(, pco_new__)		\
+		  : "memory");						\
+	(typeof(_var))(unsigned long) pco_old__;			\
 })
 
 /*
@@ -336,16 +314,16 @@ do {									\
 #define raw_cpu_add_return_1(pcp, val)		percpu_add_return_op(1, , pcp, val)
 #define raw_cpu_add_return_2(pcp, val)		percpu_add_return_op(2, , pcp, val)
 #define raw_cpu_add_return_4(pcp, val)		percpu_add_return_op(4, , pcp, val)
-#define raw_cpu_cmpxchg_1(pcp, oval, nval)	percpu_cmpxchg_op(, pcp, oval, nval)
-#define raw_cpu_cmpxchg_2(pcp, oval, nval)	percpu_cmpxchg_op(, pcp, oval, nval)
-#define raw_cpu_cmpxchg_4(pcp, oval, nval)	percpu_cmpxchg_op(, pcp, oval, nval)
+#define raw_cpu_cmpxchg_1(pcp, oval, nval)	percpu_cmpxchg_op(1, , pcp, oval, nval)
+#define raw_cpu_cmpxchg_2(pcp, oval, nval)	percpu_cmpxchg_op(2, , pcp, oval, nval)
+#define raw_cpu_cmpxchg_4(pcp, oval, nval)	percpu_cmpxchg_op(4, , pcp, oval, nval)
 
 #define this_cpu_add_return_1(pcp, val)		percpu_add_return_op(1, volatile, pcp, val)
 #define this_cpu_add_return_2(pcp, val)		percpu_add_return_op(2, volatile, pcp, val)
 #define this_cpu_add_return_4(pcp, val)		percpu_add_return_op(4, volatile, pcp, val)
-#define this_cpu_cmpxchg_1(pcp, oval, nval)	percpu_cmpxchg_op(volatile, pcp, oval, nval)
-#define this_cpu_cmpxchg_2(pcp, oval, nval)	percpu_cmpxchg_op(volatile, pcp, oval, nval)
-#define this_cpu_cmpxchg_4(pcp, oval, nval)	percpu_cmpxchg_op(volatile, pcp, oval, nval)
+#define this_cpu_cmpxchg_1(pcp, oval, nval)	percpu_cmpxchg_op(1, volatile, pcp, oval, nval)
+#define this_cpu_cmpxchg_2(pcp, oval, nval)	percpu_cmpxchg_op(2, volatile, pcp, oval, nval)
+#define this_cpu_cmpxchg_4(pcp, oval, nval)	percpu_cmpxchg_op(4, volatile, pcp, oval, nval)
 
 #ifdef CONFIG_X86_CMPXCHG64
 #define percpu_cmpxchg8b_double(pcp1, pcp2, o1, o2, n1, n2)		\
@@ -376,7 +354,7 @@ do {									\
 #define raw_cpu_or_8(pcp, val)			percpu_to_op(8, , "or", (pcp), val)
 #define raw_cpu_add_return_8(pcp, val)		percpu_add_return_op(8, , pcp, val)
 #define raw_cpu_xchg_8(pcp, nval)		raw_percpu_xchg_op(pcp, nval)
-#define raw_cpu_cmpxchg_8(pcp, oval, nval)	percpu_cmpxchg_op(, pcp, oval, nval)
+#define raw_cpu_cmpxchg_8(pcp, oval, nval)	percpu_cmpxchg_op(8, , pcp, oval, nval)
 
 #define this_cpu_read_8(pcp)			percpu_from_op(8, volatile, "mov", pcp)
 #define this_cpu_write_8(pcp, val)		percpu_to_op(8, volatile, "mov", (pcp), val)
@@ -385,7 +363,7 @@ do {									\
 #define this_cpu_or_8(pcp, val)			percpu_to_op(8, volatile, "or", (pcp), val)
 #define this_cpu_add_return_8(pcp, val)		percpu_add_return_op(8, volatile, pcp, val)
 #define this_cpu_xchg_8(pcp, nval)		percpu_xchg_op(8, volatile, pcp, nval)
-#define this_cpu_cmpxchg_8(pcp, oval, nval)	percpu_cmpxchg_op(volatile, pcp, oval, nval)
+#define this_cpu_cmpxchg_8(pcp, oval, nval)	percpu_cmpxchg_op(8, volatile, pcp, oval, nval)
 
 /*
  * Pretty complex macro to generate cmpxchg16 instruction.  The instruction
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 09/10] x86/percpu: Clean up percpu_stable_op()
  2020-05-30 22:11 [PATCH v2 00/10] x86: Clean up percpu operations Brian Gerst
                   ` (7 preceding siblings ...)
  2020-05-30 22:11 ` [PATCH v2 08/10] x86/percpu: Clean up percpu_cmpxchg_op() Brian Gerst
@ 2020-05-30 22:11 ` Brian Gerst
  2020-06-01 20:43   ` Nick Desaulniers
  2020-05-30 22:11 ` [PATCH v2 10/10] x86/percpu: Remove unused PER_CPU() macro Brian Gerst
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 32+ messages in thread
From: Brian Gerst @ 2020-05-30 22:11 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin,
	Andy Lutomirski, Peter Zijlstra, Nick Desaulniers, Brian Gerst

Use __pcpu_size_call_return() to simplify this_cpu_read_stable().
Also remove __bad_percpu_size() which is now unused.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
---
 arch/x86/include/asm/percpu.h | 41 ++++++++++-------------------------
 1 file changed, 12 insertions(+), 29 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index 7efc0b5c4ff0..cf2b9c2a241e 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -85,7 +85,6 @@
 
 /* For arch-specific code, we can use direct single-insn ops (they
  * don't give an lvalue though). */
-extern void __bad_percpu_size(void);
 
 #define __pcpu_type_1 u8
 #define __pcpu_type_2 u16
@@ -167,33 +166,13 @@ do {									\
 	(typeof(_var))(unsigned long) pfo_val__;			\
 })
 
-#define percpu_stable_op(op, var)			\
-({							\
-	typeof(var) pfo_ret__;				\
-	switch (sizeof(var)) {				\
-	case 1:						\
-		asm(op "b "__percpu_arg(P1)",%0"	\
-		    : "=q" (pfo_ret__)			\
-		    : "p" (&(var)));			\
-		break;					\
-	case 2:						\
-		asm(op "w "__percpu_arg(P1)",%0"	\
-		    : "=r" (pfo_ret__)			\
-		    : "p" (&(var)));			\
-		break;					\
-	case 4:						\
-		asm(op "l "__percpu_arg(P1)",%0"	\
-		    : "=r" (pfo_ret__)			\
-		    : "p" (&(var)));			\
-		break;					\
-	case 8:						\
-		asm(op "q "__percpu_arg(P1)",%0"	\
-		    : "=r" (pfo_ret__)			\
-		    : "p" (&(var)));			\
-		break;					\
-	default: __bad_percpu_size();			\
-	}						\
-	pfo_ret__;					\
+#define percpu_stable_op(size, op, _var)				\
+({									\
+	__pcpu_type_##size pfo_val__;					\
+	asm(__pcpu_op2_##size(op, __percpu_arg(P[var]), "%[val]")	\
+	    : [val] __pcpu_reg_##size("=", pfo_val__)			\
+	    : [var] "p" (&(_var)));					\
+	(typeof(_var))(unsigned long) pfo_val__;			\
 })
 
 /*
@@ -258,7 +237,11 @@ do {									\
  * per-thread variables implemented as per-cpu variables and thus
  * stable for the duration of the respective task.
  */
-#define this_cpu_read_stable(var)	percpu_stable_op("mov", var)
+#define this_cpu_read_stable_1(pcp)	percpu_stable_op(1, "mov", pcp)
+#define this_cpu_read_stable_2(pcp)	percpu_stable_op(2, "mov", pcp)
+#define this_cpu_read_stable_4(pcp)	percpu_stable_op(4, "mov", pcp)
+#define this_cpu_read_stable_8(pcp)	percpu_stable_op(8, "mov", pcp)
+#define this_cpu_read_stable(pcp)	__pcpu_size_call_return(this_cpu_read_stable_, pcp)
 
 #define raw_cpu_read_1(pcp)		percpu_from_op(1, , "mov", pcp)
 #define raw_cpu_read_2(pcp)		percpu_from_op(2, , "mov", pcp)
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 10/10] x86/percpu: Remove unused PER_CPU() macro
  2020-05-30 22:11 [PATCH v2 00/10] x86: Clean up percpu operations Brian Gerst
                   ` (8 preceding siblings ...)
  2020-05-30 22:11 ` [PATCH v2 09/10] x86/percpu: Clean up percpu_stable_op() Brian Gerst
@ 2020-05-30 22:11 ` Brian Gerst
  2020-06-01 20:26   ` Nick Desaulniers
  2020-06-01 21:00 ` [PATCH v2 00/10] x86: Clean up percpu operations Nick Desaulniers
  2020-07-09 10:30 ` Peter Zijlstra
  11 siblings, 1 reply; 32+ messages in thread
From: Brian Gerst @ 2020-05-30 22:11 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin,
	Andy Lutomirski, Peter Zijlstra, Nick Desaulniers, Brian Gerst

Also remove now unused __percpu_mov_op.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
---
 arch/x86/include/asm/percpu.h | 18 ------------------
 1 file changed, 18 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index cf2b9c2a241e..a3c33b79fb86 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -4,33 +4,15 @@
 
 #ifdef CONFIG_X86_64
 #define __percpu_seg		gs
-#define __percpu_mov_op		movq
 #else
 #define __percpu_seg		fs
-#define __percpu_mov_op		movl
 #endif
 
 #ifdef __ASSEMBLY__
 
-/*
- * PER_CPU finds an address of a per-cpu variable.
- *
- * Args:
- *    var - variable name
- *    reg - 32bit register
- *
- * The resulting address is stored in the "reg" argument.
- *
- * Example:
- *    PER_CPU(cpu_gdt_descr, %ebx)
- */
 #ifdef CONFIG_SMP
-#define PER_CPU(var, reg)						\
-	__percpu_mov_op %__percpu_seg:this_cpu_off, reg;		\
-	lea var(reg), reg
 #define PER_CPU_VAR(var)	%__percpu_seg:var
 #else /* ! SMP */
-#define PER_CPU(var, reg)	__percpu_mov_op $var, reg
 #define PER_CPU_VAR(var)	var
 #endif	/* SMP */
 
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 05/10] x86/percpu: Remove "e" constraint from XADD
  2020-05-30 22:11 ` [PATCH v2 05/10] x86/percpu: Remove "e" constraint from XADD Brian Gerst
@ 2020-06-01 18:50   ` Nick Desaulniers
  0 siblings, 0 replies; 32+ messages in thread
From: Nick Desaulniers @ 2020-06-01 18:50 UTC (permalink / raw)
  To: Brian Gerst
  Cc: LKML, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin,
	Andy Lutomirski, Peter Zijlstra

On Sat, May 30, 2020 at 3:11 PM Brian Gerst <brgerst@gmail.com> wrote:
>
> The "e" constraint represents a constant, but the XADD instruction doesn't
> accept immediate operands.
>
> Signed-off-by: Brian Gerst <brgerst@gmail.com>

Yep, as we discussed in v1.
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>

> ---
>  arch/x86/include/asm/percpu.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
> index 2a24f3c795eb..9bb5440d98d3 100644
> --- a/arch/x86/include/asm/percpu.h
> +++ b/arch/x86/include/asm/percpu.h
> @@ -220,7 +220,7 @@ do {                                                                        \
>                 break;                                                  \
>         case 8:                                                         \
>                 asm qual ("xaddq %0, "__percpu_arg(1)                   \
> -                           : "+re" (paro_ret__), "+m" (var)            \
> +                           : "+r" (paro_ret__), "+m" (var)             \
>                             : : "memory");                              \
>                 break;                                                  \
>         default: __bad_percpu_size();                                   \
> --
> 2.25.4
>


-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 06/10] x86/percpu: Clean up percpu_add_return_op()
  2020-05-30 22:11 ` [PATCH v2 06/10] x86/percpu: Clean up percpu_add_return_op() Brian Gerst
@ 2020-06-01 19:47   ` Nick Desaulniers
  0 siblings, 0 replies; 32+ messages in thread
From: Nick Desaulniers @ 2020-06-01 19:47 UTC (permalink / raw)
  To: Brian Gerst
  Cc: LKML, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin,
	Andy Lutomirski, Peter Zijlstra

On Sat, May 30, 2020 at 3:11 PM Brian Gerst <brgerst@gmail.com> wrote:
>
> The core percpu macros already have a switch on the data size, so the switch
> in the x86 code is redundant and produces more dead code.
>
> Also use appropriate types for the width of the instructions.  This avoids
> errors when compiling with Clang.
>
> Signed-off-by: Brian Gerst <brgerst@gmail.com>

I think it would have been ok to carry forward my reviewed by tag here
from v1, hidden at the bottom of
https://lore.kernel.org/lkml/CAKwvOdn7yC1GVA+6gtNewBSq2BK09y9iNWhv1dPFF5i4kT1+6A@mail.gmail.com/
even though you split removing 'e' into it's own patch.

Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>

> ---
>  arch/x86/include/asm/percpu.h | 51 +++++++++++------------------------
>  1 file changed, 16 insertions(+), 35 deletions(-)
>
> diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
> index 9bb5440d98d3..0776a11e7e11 100644
> --- a/arch/x86/include/asm/percpu.h
> +++ b/arch/x86/include/asm/percpu.h
> @@ -199,34 +199,15 @@ do {                                                                      \
>  /*
>   * Add return operation
>   */
> -#define percpu_add_return_op(qual, var, val)                           \
> +#define percpu_add_return_op(size, qual, _var, _val)                   \
>  ({                                                                     \
> -       typeof(var) paro_ret__ = val;                                   \
> -       switch (sizeof(var)) {                                          \
> -       case 1:                                                         \
> -               asm qual ("xaddb %0, "__percpu_arg(1)                   \
> -                           : "+q" (paro_ret__), "+m" (var)             \
> -                           : : "memory");                              \
> -               break;                                                  \
> -       case 2:                                                         \
> -               asm qual ("xaddw %0, "__percpu_arg(1)                   \
> -                           : "+r" (paro_ret__), "+m" (var)             \
> -                           : : "memory");                              \
> -               break;                                                  \
> -       case 4:                                                         \
> -               asm qual ("xaddl %0, "__percpu_arg(1)                   \
> -                           : "+r" (paro_ret__), "+m" (var)             \
> -                           : : "memory");                              \
> -               break;                                                  \
> -       case 8:                                                         \
> -               asm qual ("xaddq %0, "__percpu_arg(1)                   \
> -                           : "+r" (paro_ret__), "+m" (var)             \
> -                           : : "memory");                              \
> -               break;                                                  \
> -       default: __bad_percpu_size();                                   \
> -       }                                                               \
> -       paro_ret__ += val;                                              \
> -       paro_ret__;                                                     \
> +       __pcpu_type_##size paro_tmp__ = __pcpu_cast_##size(_val);       \
> +       asm qual (__pcpu_op2_##size("xadd", "%[tmp]",                   \
> +                                    __percpu_arg([var]))               \
> +                 : [tmp] __pcpu_reg_##size("+", paro_tmp__),           \
> +                   [var] "+m" (_var)                                   \
> +                 : : "memory");                                        \
> +       (typeof(_var))(unsigned long) (paro_tmp__ + _val);              \
>  })
>
>  /*
> @@ -377,16 +358,16 @@ do {                                                                      \
>  #define this_cpu_xchg_2(pcp, nval)     percpu_xchg_op(volatile, pcp, nval)
>  #define this_cpu_xchg_4(pcp, nval)     percpu_xchg_op(volatile, pcp, nval)
>
> -#define raw_cpu_add_return_1(pcp, val)         percpu_add_return_op(, pcp, val)
> -#define raw_cpu_add_return_2(pcp, val)         percpu_add_return_op(, pcp, val)
> -#define raw_cpu_add_return_4(pcp, val)         percpu_add_return_op(, pcp, val)
> +#define raw_cpu_add_return_1(pcp, val)         percpu_add_return_op(1, , pcp, val)
> +#define raw_cpu_add_return_2(pcp, val)         percpu_add_return_op(2, , pcp, val)
> +#define raw_cpu_add_return_4(pcp, val)         percpu_add_return_op(4, , pcp, val)
>  #define raw_cpu_cmpxchg_1(pcp, oval, nval)     percpu_cmpxchg_op(, pcp, oval, nval)
>  #define raw_cpu_cmpxchg_2(pcp, oval, nval)     percpu_cmpxchg_op(, pcp, oval, nval)
>  #define raw_cpu_cmpxchg_4(pcp, oval, nval)     percpu_cmpxchg_op(, pcp, oval, nval)
>
> -#define this_cpu_add_return_1(pcp, val)                percpu_add_return_op(volatile, pcp, val)
> -#define this_cpu_add_return_2(pcp, val)                percpu_add_return_op(volatile, pcp, val)
> -#define this_cpu_add_return_4(pcp, val)                percpu_add_return_op(volatile, pcp, val)
> +#define this_cpu_add_return_1(pcp, val)                percpu_add_return_op(1, volatile, pcp, val)
> +#define this_cpu_add_return_2(pcp, val)                percpu_add_return_op(2, volatile, pcp, val)
> +#define this_cpu_add_return_4(pcp, val)                percpu_add_return_op(4, volatile, pcp, val)
>  #define this_cpu_cmpxchg_1(pcp, oval, nval)    percpu_cmpxchg_op(volatile, pcp, oval, nval)
>  #define this_cpu_cmpxchg_2(pcp, oval, nval)    percpu_cmpxchg_op(volatile, pcp, oval, nval)
>  #define this_cpu_cmpxchg_4(pcp, oval, nval)    percpu_cmpxchg_op(volatile, pcp, oval, nval)
> @@ -418,7 +399,7 @@ do {                                                                        \
>  #define raw_cpu_add_8(pcp, val)                        percpu_add_op(8, , (pcp), val)
>  #define raw_cpu_and_8(pcp, val)                        percpu_to_op(8, , "and", (pcp), val)
>  #define raw_cpu_or_8(pcp, val)                 percpu_to_op(8, , "or", (pcp), val)
> -#define raw_cpu_add_return_8(pcp, val)         percpu_add_return_op(, pcp, val)
> +#define raw_cpu_add_return_8(pcp, val)         percpu_add_return_op(8, , pcp, val)
>  #define raw_cpu_xchg_8(pcp, nval)              raw_percpu_xchg_op(pcp, nval)
>  #define raw_cpu_cmpxchg_8(pcp, oval, nval)     percpu_cmpxchg_op(, pcp, oval, nval)
>
> @@ -427,7 +408,7 @@ do {                                                                        \
>  #define this_cpu_add_8(pcp, val)               percpu_add_op(8, volatile, (pcp), val)
>  #define this_cpu_and_8(pcp, val)               percpu_to_op(8, volatile, "and", (pcp), val)
>  #define this_cpu_or_8(pcp, val)                        percpu_to_op(8, volatile, "or", (pcp), val)
> -#define this_cpu_add_return_8(pcp, val)                percpu_add_return_op(volatile, pcp, val)
> +#define this_cpu_add_return_8(pcp, val)                percpu_add_return_op(8, volatile, pcp, val)
>  #define this_cpu_xchg_8(pcp, nval)             percpu_xchg_op(volatile, pcp, nval)
>  #define this_cpu_cmpxchg_8(pcp, oval, nval)    percpu_cmpxchg_op(volatile, pcp, oval, nval)
>
> --
> 2.25.4
>


-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 04/10] x86/percpu: Clean up percpu_add_op()
  2020-05-30 22:11 ` [PATCH v2 04/10] x86/percpu: Clean up percpu_add_op() Brian Gerst
@ 2020-06-01 20:18   ` Nick Desaulniers
  0 siblings, 0 replies; 32+ messages in thread
From: Nick Desaulniers @ 2020-06-01 20:18 UTC (permalink / raw)
  To: Brian Gerst
  Cc: LKML, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin,
	Andy Lutomirski, Peter Zijlstra

On Sat, May 30, 2020 at 3:11 PM Brian Gerst <brgerst@gmail.com> wrote:
>
> The core percpu macros already have a switch on the data size, so the switch
> in the x86 code is redundant and produces more dead code.
>
> Also use appropriate types for the width of the instructions.  This avoids
> errors when compiling with Clang.
>
> Signed-off-by: Brian Gerst <brgerst@gmail.com>

Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>

> ---
>  arch/x86/include/asm/percpu.h | 99 ++++++++---------------------------
>  1 file changed, 22 insertions(+), 77 deletions(-)
>
> diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
> index a40d2e055f58..2a24f3c795eb 100644
> --- a/arch/x86/include/asm/percpu.h
> +++ b/arch/x86/include/asm/percpu.h
> @@ -130,64 +130,32 @@ do {                                                                      \
>             : [val] __pcpu_reg_imm_##size(pto_val__));                  \
>  } while (0)
>
> +#define percpu_unary_op(size, qual, op, _var)                          \
> +({                                                                     \
> +       asm qual (__pcpu_op1_##size(op, __percpu_arg([var]))            \
> +           : [var] "+m" (_var));                                       \
> +})
> +
>  /*
>   * Generate a percpu add to memory instruction and optimize code
>   * if one is added or subtracted.
>   */
> -#define percpu_add_op(qual, var, val)                                  \
> +#define percpu_add_op(size, qual, var, val)                            \
>  do {                                                                   \
> -       typedef typeof(var) pao_T__;                                    \
>         const int pao_ID__ = (__builtin_constant_p(val) &&              \
>                               ((val) == 1 || (val) == -1)) ?            \
>                                 (int)(val) : 0;                         \
>         if (0) {                                                        \
> -               pao_T__ pao_tmp__;                                      \
> +               typeof(var) pao_tmp__;                                  \
>                 pao_tmp__ = (val);                                      \
>                 (void)pao_tmp__;                                        \
>         }                                                               \
> -       switch (sizeof(var)) {                                          \
> -       case 1:                                                         \
> -               if (pao_ID__ == 1)                                      \
> -                       asm qual ("incb "__percpu_arg(0) : "+m" (var)); \
> -               else if (pao_ID__ == -1)                                \
> -                       asm qual ("decb "__percpu_arg(0) : "+m" (var)); \
> -               else                                                    \
> -                       asm qual ("addb %1, "__percpu_arg(0)            \
> -                           : "+m" (var)                                \
> -                           : "qi" ((pao_T__)(val)));                   \
> -               break;                                                  \
> -       case 2:                                                         \
> -               if (pao_ID__ == 1)                                      \
> -                       asm qual ("incw "__percpu_arg(0) : "+m" (var)); \
> -               else if (pao_ID__ == -1)                                \
> -                       asm qual ("decw "__percpu_arg(0) : "+m" (var)); \
> -               else                                                    \
> -                       asm qual ("addw %1, "__percpu_arg(0)            \
> -                           : "+m" (var)                                \
> -                           : "ri" ((pao_T__)(val)));                   \
> -               break;                                                  \
> -       case 4:                                                         \
> -               if (pao_ID__ == 1)                                      \
> -                       asm qual ("incl "__percpu_arg(0) : "+m" (var)); \
> -               else if (pao_ID__ == -1)                                \
> -                       asm qual ("decl "__percpu_arg(0) : "+m" (var)); \
> -               else                                                    \
> -                       asm qual ("addl %1, "__percpu_arg(0)            \
> -                           : "+m" (var)                                \
> -                           : "ri" ((pao_T__)(val)));                   \
> -               break;                                                  \
> -       case 8:                                                         \
> -               if (pao_ID__ == 1)                                      \
> -                       asm qual ("incq "__percpu_arg(0) : "+m" (var)); \
> -               else if (pao_ID__ == -1)                                \
> -                       asm qual ("decq "__percpu_arg(0) : "+m" (var)); \
> -               else                                                    \
> -                       asm qual ("addq %1, "__percpu_arg(0)            \
> -                           : "+m" (var)                                \
> -                           : "re" ((pao_T__)(val)));                   \
> -               break;                                                  \
> -       default: __bad_percpu_size();                                   \
> -       }                                                               \
> +       if (pao_ID__ == 1)                                              \
> +               percpu_unary_op(size, qual, "inc", var);                \
> +       else if (pao_ID__ == -1)                                        \
> +               percpu_unary_op(size, qual, "dec", var);                \
> +       else                                                            \
> +               percpu_to_op(size, qual, "add", var, val);              \
>  } while (0)
>
>  #define percpu_from_op(size, qual, op, _var)                           \
> @@ -228,29 +196,6 @@ do {                                                                       \
>         pfo_ret__;                                      \
>  })
>
> -#define percpu_unary_op(qual, op, var)                 \
> -({                                                     \
> -       switch (sizeof(var)) {                          \
> -       case 1:                                         \
> -               asm qual (op "b "__percpu_arg(0)        \
> -                   : "+m" (var));                      \
> -               break;                                  \
> -       case 2:                                         \
> -               asm qual (op "w "__percpu_arg(0)        \
> -                   : "+m" (var));                      \
> -               break;                                  \
> -       case 4:                                         \
> -               asm qual (op "l "__percpu_arg(0)        \
> -                   : "+m" (var));                      \
> -               break;                                  \
> -       case 8:                                         \
> -               asm qual (op "q "__percpu_arg(0)        \
> -                   : "+m" (var));                      \
> -               break;                                  \
> -       default: __bad_percpu_size();                   \
> -       }                                               \
> -})
> -
>  /*
>   * Add return operation
>   */
> @@ -388,9 +333,9 @@ do {                                                                        \
>  #define raw_cpu_write_1(pcp, val)      percpu_to_op(1, , "mov", (pcp), val)
>  #define raw_cpu_write_2(pcp, val)      percpu_to_op(2, , "mov", (pcp), val)
>  #define raw_cpu_write_4(pcp, val)      percpu_to_op(4, , "mov", (pcp), val)
> -#define raw_cpu_add_1(pcp, val)                percpu_add_op(, (pcp), val)
> -#define raw_cpu_add_2(pcp, val)                percpu_add_op(, (pcp), val)
> -#define raw_cpu_add_4(pcp, val)                percpu_add_op(, (pcp), val)
> +#define raw_cpu_add_1(pcp, val)                percpu_add_op(1, , (pcp), val)
> +#define raw_cpu_add_2(pcp, val)                percpu_add_op(2, , (pcp), val)
> +#define raw_cpu_add_4(pcp, val)                percpu_add_op(4, , (pcp), val)
>  #define raw_cpu_and_1(pcp, val)                percpu_to_op(1, , "and", (pcp), val)
>  #define raw_cpu_and_2(pcp, val)                percpu_to_op(2, , "and", (pcp), val)
>  #define raw_cpu_and_4(pcp, val)                percpu_to_op(4, , "and", (pcp), val)
> @@ -419,9 +364,9 @@ do {                                                                        \
>  #define this_cpu_write_1(pcp, val)     percpu_to_op(1, volatile, "mov", (pcp), val)
>  #define this_cpu_write_2(pcp, val)     percpu_to_op(2, volatile, "mov", (pcp), val)
>  #define this_cpu_write_4(pcp, val)     percpu_to_op(4, volatile, "mov", (pcp), val)
> -#define this_cpu_add_1(pcp, val)       percpu_add_op(volatile, (pcp), val)
> -#define this_cpu_add_2(pcp, val)       percpu_add_op(volatile, (pcp), val)
> -#define this_cpu_add_4(pcp, val)       percpu_add_op(volatile, (pcp), val)
> +#define this_cpu_add_1(pcp, val)       percpu_add_op(1, volatile, (pcp), val)
> +#define this_cpu_add_2(pcp, val)       percpu_add_op(2, volatile, (pcp), val)
> +#define this_cpu_add_4(pcp, val)       percpu_add_op(4, volatile, (pcp), val)
>  #define this_cpu_and_1(pcp, val)       percpu_to_op(1, volatile, "and", (pcp), val)
>  #define this_cpu_and_2(pcp, val)       percpu_to_op(2, volatile, "and", (pcp), val)
>  #define this_cpu_and_4(pcp, val)       percpu_to_op(4, volatile, "and", (pcp), val)
> @@ -470,7 +415,7 @@ do {                                                                        \
>  #ifdef CONFIG_X86_64
>  #define raw_cpu_read_8(pcp)                    percpu_from_op(8, , "mov", pcp)
>  #define raw_cpu_write_8(pcp, val)              percpu_to_op(8, , "mov", (pcp), val)
> -#define raw_cpu_add_8(pcp, val)                        percpu_add_op(, (pcp), val)
> +#define raw_cpu_add_8(pcp, val)                        percpu_add_op(8, , (pcp), val)
>  #define raw_cpu_and_8(pcp, val)                        percpu_to_op(8, , "and", (pcp), val)
>  #define raw_cpu_or_8(pcp, val)                 percpu_to_op(8, , "or", (pcp), val)
>  #define raw_cpu_add_return_8(pcp, val)         percpu_add_return_op(, pcp, val)
> @@ -479,7 +424,7 @@ do {                                                                        \
>
>  #define this_cpu_read_8(pcp)                   percpu_from_op(8, volatile, "mov", pcp)
>  #define this_cpu_write_8(pcp, val)             percpu_to_op(8, volatile, "mov", (pcp), val)
> -#define this_cpu_add_8(pcp, val)               percpu_add_op(volatile, (pcp), val)
> +#define this_cpu_add_8(pcp, val)               percpu_add_op(8, volatile, (pcp), val)
>  #define this_cpu_and_8(pcp, val)               percpu_to_op(8, volatile, "and", (pcp), val)
>  #define this_cpu_or_8(pcp, val)                        percpu_to_op(8, volatile, "or", (pcp), val)
>  #define this_cpu_add_return_8(pcp, val)                percpu_add_return_op(volatile, pcp, val)
> --
> 2.25.4
>


-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 10/10] x86/percpu: Remove unused PER_CPU() macro
  2020-05-30 22:11 ` [PATCH v2 10/10] x86/percpu: Remove unused PER_CPU() macro Brian Gerst
@ 2020-06-01 20:26   ` Nick Desaulniers
  0 siblings, 0 replies; 32+ messages in thread
From: Nick Desaulniers @ 2020-06-01 20:26 UTC (permalink / raw)
  To: Brian Gerst
  Cc: LKML, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin,
	Andy Lutomirski, Peter Zijlstra

On Sat, May 30, 2020 at 3:11 PM Brian Gerst <brgerst@gmail.com> wrote:
>
> Also remove now unused __percpu_mov_op.
>
> Signed-off-by: Brian Gerst <brgerst@gmail.com>

This cleanup looks unrelated to the series, and can be sent separately
if needed.
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>

> ---
>  arch/x86/include/asm/percpu.h | 18 ------------------
>  1 file changed, 18 deletions(-)
>
> diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
> index cf2b9c2a241e..a3c33b79fb86 100644
> --- a/arch/x86/include/asm/percpu.h
> +++ b/arch/x86/include/asm/percpu.h
> @@ -4,33 +4,15 @@
>
>  #ifdef CONFIG_X86_64
>  #define __percpu_seg           gs
> -#define __percpu_mov_op                movq
>  #else
>  #define __percpu_seg           fs
> -#define __percpu_mov_op                movl
>  #endif
>
>  #ifdef __ASSEMBLY__
>
> -/*
> - * PER_CPU finds an address of a per-cpu variable.
> - *
> - * Args:
> - *    var - variable name
> - *    reg - 32bit register
> - *
> - * The resulting address is stored in the "reg" argument.
> - *
> - * Example:
> - *    PER_CPU(cpu_gdt_descr, %ebx)
> - */
>  #ifdef CONFIG_SMP
> -#define PER_CPU(var, reg)                                              \
> -       __percpu_mov_op %__percpu_seg:this_cpu_off, reg;                \
> -       lea var(reg), reg
>  #define PER_CPU_VAR(var)       %__percpu_seg:var
>  #else /* ! SMP */
> -#define PER_CPU(var, reg)      __percpu_mov_op $var, reg
>  #define PER_CPU_VAR(var)       var
>  #endif /* SMP */
>
> --
> 2.25.4
>


-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 09/10] x86/percpu: Clean up percpu_stable_op()
  2020-05-30 22:11 ` [PATCH v2 09/10] x86/percpu: Clean up percpu_stable_op() Brian Gerst
@ 2020-06-01 20:43   ` Nick Desaulniers
  2020-06-02 14:19     ` Brian Gerst
  0 siblings, 1 reply; 32+ messages in thread
From: Nick Desaulniers @ 2020-06-01 20:43 UTC (permalink / raw)
  To: Brian Gerst
  Cc: LKML, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin,
	Andy Lutomirski, Peter Zijlstra

On Sat, May 30, 2020 at 3:11 PM Brian Gerst <brgerst@gmail.com> wrote:
>
> Use __pcpu_size_call_return() to simplify this_cpu_read_stable().

Clever! As in this_cpu_read() in include/linux/percpu-defs.h.  Could
be its own patch before this, but it's fine.
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>

> Also remove __bad_percpu_size() which is now unused.
>
> Signed-off-by: Brian Gerst <brgerst@gmail.com>
> ---
>  arch/x86/include/asm/percpu.h | 41 ++++++++++-------------------------
>  1 file changed, 12 insertions(+), 29 deletions(-)
>
> diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
> index 7efc0b5c4ff0..cf2b9c2a241e 100644
> --- a/arch/x86/include/asm/percpu.h
> +++ b/arch/x86/include/asm/percpu.h
> @@ -85,7 +85,6 @@
>
>  /* For arch-specific code, we can use direct single-insn ops (they
>   * don't give an lvalue though). */
> -extern void __bad_percpu_size(void);
>
>  #define __pcpu_type_1 u8
>  #define __pcpu_type_2 u16
> @@ -167,33 +166,13 @@ do {                                                                      \
>         (typeof(_var))(unsigned long) pfo_val__;                        \
>  })
>
> -#define percpu_stable_op(op, var)                      \
> -({                                                     \
> -       typeof(var) pfo_ret__;                          \
> -       switch (sizeof(var)) {                          \
> -       case 1:                                         \
> -               asm(op "b "__percpu_arg(P1)",%0"        \

What does the `P` do here?
https://gcc.gnu.org/onlinedocs/gcc/Simple-Constraints.html#Simple-Constraints
says can be machine dependent integral literal in a certain range.
https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html#Machine-Constraints
doesn't document `P` for x86 though...

> -                   : "=q" (pfo_ret__)                  \
> -                   : "p" (&(var)));                    \
> -               break;                                  \
> -       case 2:                                         \
> -               asm(op "w "__percpu_arg(P1)",%0"        \
> -                   : "=r" (pfo_ret__)                  \
> -                   : "p" (&(var)));                    \
> -               break;                                  \
> -       case 4:                                         \
> -               asm(op "l "__percpu_arg(P1)",%0"        \
> -                   : "=r" (pfo_ret__)                  \
> -                   : "p" (&(var)));                    \
> -               break;                                  \
> -       case 8:                                         \
> -               asm(op "q "__percpu_arg(P1)",%0"        \
> -                   : "=r" (pfo_ret__)                  \
> -                   : "p" (&(var)));                    \
> -               break;                                  \
> -       default: __bad_percpu_size();                   \
> -       }                                               \
> -       pfo_ret__;                                      \
> +#define percpu_stable_op(size, op, _var)                               \
> +({                                                                     \
> +       __pcpu_type_##size pfo_val__;                                   \
> +       asm(__pcpu_op2_##size(op, __percpu_arg(P[var]), "%[val]")       \
> +           : [val] __pcpu_reg_##size("=", pfo_val__)                   \
> +           : [var] "p" (&(_var)));                                     \
> +       (typeof(_var))(unsigned long) pfo_val__;                        \
>  })
>
>  /*
> @@ -258,7 +237,11 @@ do {                                                                       \
>   * per-thread variables implemented as per-cpu variables and thus
>   * stable for the duration of the respective task.
>   */
> -#define this_cpu_read_stable(var)      percpu_stable_op("mov", var)
> +#define this_cpu_read_stable_1(pcp)    percpu_stable_op(1, "mov", pcp)
> +#define this_cpu_read_stable_2(pcp)    percpu_stable_op(2, "mov", pcp)
> +#define this_cpu_read_stable_4(pcp)    percpu_stable_op(4, "mov", pcp)
> +#define this_cpu_read_stable_8(pcp)    percpu_stable_op(8, "mov", pcp)
> +#define this_cpu_read_stable(pcp)      __pcpu_size_call_return(this_cpu_read_stable_, pcp)
>
>  #define raw_cpu_read_1(pcp)            percpu_from_op(1, , "mov", pcp)
>  #define raw_cpu_read_2(pcp)            percpu_from_op(2, , "mov", pcp)
> --
> 2.25.4
>


-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 01/10] x86/percpu: Introduce size abstraction macros
  2020-05-30 22:11 ` [PATCH v2 01/10] x86/percpu: Introduce size abstraction macros Brian Gerst
@ 2020-06-01 20:48   ` Nick Desaulniers
  0 siblings, 0 replies; 32+ messages in thread
From: Nick Desaulniers @ 2020-06-01 20:48 UTC (permalink / raw)
  To: Brian Gerst
  Cc: LKML, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin,
	Andy Lutomirski, Peter Zijlstra

On Sat, May 30, 2020 at 3:11 PM Brian Gerst <brgerst@gmail.com> wrote:
>
> In preparation for cleaning up the percpu operations, define macros for
> abstraction based on the width of the operation.
>
> Signed-off-by: Brian Gerst <brgerst@gmail.com>

Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>

> ---
>  arch/x86/include/asm/percpu.h | 30 ++++++++++++++++++++++++++++++
>  1 file changed, 30 insertions(+)
>
> diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
> index 2278797c769d..19838e4f7a8f 100644
> --- a/arch/x86/include/asm/percpu.h
> +++ b/arch/x86/include/asm/percpu.h
> @@ -87,6 +87,36 @@
>   * don't give an lvalue though). */
>  extern void __bad_percpu_size(void);
>
> +#define __pcpu_type_1 u8
> +#define __pcpu_type_2 u16
> +#define __pcpu_type_4 u32
> +#define __pcpu_type_8 u64
> +
> +#define __pcpu_cast_1(val) ((u8)(((unsigned long) val) & 0xff))
> +#define __pcpu_cast_2(val) ((u16)(((unsigned long) val) & 0xffff))
> +#define __pcpu_cast_4(val) ((u32)(((unsigned long) val) & 0xffffffff))
> +#define __pcpu_cast_8(val) ((u64)(val))
> +
> +#define __pcpu_op1_1(op, dst) op "b " dst
> +#define __pcpu_op1_2(op, dst) op "w " dst
> +#define __pcpu_op1_4(op, dst) op "l " dst
> +#define __pcpu_op1_8(op, dst) op "q " dst
> +
> +#define __pcpu_op2_1(op, src, dst) op "b " src ", " dst
> +#define __pcpu_op2_2(op, src, dst) op "w " src ", " dst
> +#define __pcpu_op2_4(op, src, dst) op "l " src ", " dst
> +#define __pcpu_op2_8(op, src, dst) op "q " src ", " dst
> +
> +#define __pcpu_reg_1(mod, x) mod "q" (x)
> +#define __pcpu_reg_2(mod, x) mod "r" (x)
> +#define __pcpu_reg_4(mod, x) mod "r" (x)
> +#define __pcpu_reg_8(mod, x) mod "r" (x)
> +
> +#define __pcpu_reg_imm_1(x) "qi" (x)
> +#define __pcpu_reg_imm_2(x) "ri" (x)
> +#define __pcpu_reg_imm_4(x) "ri" (x)
> +#define __pcpu_reg_imm_8(x) "re" (x)
> +
>  #define percpu_to_op(qual, op, var, val)               \
>  do {                                                   \
>         typedef typeof(var) pto_T__;                    \
> --
> 2.25.4
>


-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 00/10] x86: Clean up percpu operations
  2020-05-30 22:11 [PATCH v2 00/10] x86: Clean up percpu operations Brian Gerst
                   ` (9 preceding siblings ...)
  2020-05-30 22:11 ` [PATCH v2 10/10] x86/percpu: Remove unused PER_CPU() macro Brian Gerst
@ 2020-06-01 21:00 ` Nick Desaulniers
  2020-07-08 19:36   ` Nick Desaulniers
  2020-07-09 10:30 ` Peter Zijlstra
  11 siblings, 1 reply; 32+ messages in thread
From: Nick Desaulniers @ 2020-06-01 21:00 UTC (permalink / raw)
  To: Brian Gerst
  Cc: LKML, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin,
	Andy Lutomirski, Peter Zijlstra, Dmitry Golovin

On Sat, May 30, 2020 at 3:11 PM Brian Gerst <brgerst@gmail.com> wrote:
>
> The core percpu operations already have a switch on the width of the
> data type, which resulted in an extra amount of dead code being
> generated with the x86 operations having another switch.  This patch set
> rewrites the x86 ops to remove the switch.  Additional cleanups are to
> use named assembly operands, and to cast variables to the width used in
> the assembly to make Clang happy.

Thanks for all of the work that went into this series.  I think I've
reviewed all of them.
With this series plus this hunk:
https://github.com/ClangBuiltLinux/continuous-integration/blob/master/patches/llvm-all/linux-next/x86/x86-support-i386-with-Clang.patch#L219-L237
I can build and boot i386_defconfig with Clang! So for the series:

Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>

>
> Changes from v1:
> - Add separate patch for XADD constraint fix
> - Fixed sparse truncation warning
> - Add cleanup of percpu_stable_op()
> - Add patch to Remove PER_CPU()
>
> Brian Gerst (10):
>   x86/percpu: Introduce size abstraction macros
>   x86/percpu: Clean up percpu_to_op()
>   x86/percpu: Clean up percpu_from_op()
>   x86/percpu: Clean up percpu_add_op()
>   x86/percpu: Remove "e" constraint from XADD
>   x86/percpu: Clean up percpu_add_return_op()
>   x86/percpu: Clean up percpu_xchg_op()
>   x86/percpu: Clean up percpu_cmpxchg_op()
>   x86/percpu: Clean up percpu_stable_op()
>   x86/percpu: Remove unused PER_CPU() macro
>
>  arch/x86/include/asm/percpu.h | 510 ++++++++++++----------------------
>  1 file changed, 172 insertions(+), 338 deletions(-)
>
>
> base-commit: 229aaa8c059f2c908e0561453509f996f2b2d5c4
> --
> 2.25.4
>


-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 09/10] x86/percpu: Clean up percpu_stable_op()
  2020-06-01 20:43   ` Nick Desaulniers
@ 2020-06-02 14:19     ` Brian Gerst
  2020-07-09 10:25       ` Peter Zijlstra
  0 siblings, 1 reply; 32+ messages in thread
From: Brian Gerst @ 2020-06-02 14:19 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: LKML, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin,
	Andy Lutomirski, Peter Zijlstra

On Mon, Jun 1, 2020 at 4:43 PM Nick Desaulniers <ndesaulniers@google.com> wrote:
>
> On Sat, May 30, 2020 at 3:11 PM Brian Gerst <brgerst@gmail.com> wrote:
> >
> > Use __pcpu_size_call_return() to simplify this_cpu_read_stable().
>
> Clever! As in this_cpu_read() in include/linux/percpu-defs.h.  Could
> be its own patch before this, but it's fine.
> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
>
> > Also remove __bad_percpu_size() which is now unused.
> >
> > Signed-off-by: Brian Gerst <brgerst@gmail.com>
> > ---
> >  arch/x86/include/asm/percpu.h | 41 ++++++++++-------------------------
> >  1 file changed, 12 insertions(+), 29 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
> > index 7efc0b5c4ff0..cf2b9c2a241e 100644
> > --- a/arch/x86/include/asm/percpu.h
> > +++ b/arch/x86/include/asm/percpu.h
> > @@ -85,7 +85,6 @@
> >
> >  /* For arch-specific code, we can use direct single-insn ops (they
> >   * don't give an lvalue though). */
> > -extern void __bad_percpu_size(void);
> >
> >  #define __pcpu_type_1 u8
> >  #define __pcpu_type_2 u16
> > @@ -167,33 +166,13 @@ do {                                                                      \
> >         (typeof(_var))(unsigned long) pfo_val__;                        \
> >  })
> >
> > -#define percpu_stable_op(op, var)                      \
> > -({                                                     \
> > -       typeof(var) pfo_ret__;                          \
> > -       switch (sizeof(var)) {                          \
> > -       case 1:                                         \
> > -               asm(op "b "__percpu_arg(P1)",%0"        \
>
> What does the `P` do here?
> https://gcc.gnu.org/onlinedocs/gcc/Simple-Constraints.html#Simple-Constraints
> says can be machine dependent integral literal in a certain range.
> https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html#Machine-Constraints
> doesn't document `P` for x86 though...

https://gcc.gnu.org/onlinedocs/gcc-10.1.0/gcc/Extended-Asm.html#x86-Operand-Modifiers

Removing the 'P' modifier results in this:
        movq %gs:$current_task, %rdx    #, pfo_val__

This is because the 'p' constraint treats a memory address as a
constant.  I tried replacing it with __this_cpu_read(), which since
commit 0b9ccc0a should have similar non-volatile semantics.  But the
compiler still reloaded it on every use, so I left the asm template
as-is for now until that can be resolved.

--
Brian Gerst

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 00/10] x86: Clean up percpu operations
  2020-06-01 21:00 ` [PATCH v2 00/10] x86: Clean up percpu operations Nick Desaulniers
@ 2020-07-08 19:36   ` Nick Desaulniers
  2020-07-09 20:02     ` Nick Desaulniers
  2020-07-13 22:24     ` Nick Desaulniers
  0 siblings, 2 replies; 32+ messages in thread
From: Nick Desaulniers @ 2020-07-08 19:36 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Linus Torvalds
  Cc: LKML, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H . Peter Anvin, Andy Lutomirski, Peter Zijlstra, Brian Gerst,
	Dmitry Golovin, Alistair Delva

On Mon, Jun 1, 2020 at 2:00 PM Nick Desaulniers <ndesaulniers@google.com> wrote:
>
> On Sat, May 30, 2020 at 3:11 PM Brian Gerst <brgerst@gmail.com> wrote:
> >
> > The core percpu operations already have a switch on the width of the
> > data type, which resulted in an extra amount of dead code being
> > generated with the x86 operations having another switch.  This patch set
> > rewrites the x86 ops to remove the switch.  Additional cleanups are to
> > use named assembly operands, and to cast variables to the width used in
> > the assembly to make Clang happy.
>
> Thanks for all of the work that went into this series.  I think I've
> reviewed all of them.
> With this series plus this hunk:
> https://github.com/ClangBuiltLinux/continuous-integration/blob/master/patches/llvm-all/linux-next/x86/x86-support-i386-with-Clang.patch#L219-L237
> I can build and boot i386_defconfig with Clang! So for the series:
>
> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
> Tested-by: Nick Desaulniers <ndesaulniers@google.com>

tglx, Ingo, Boris, Linus,
Do you all have thoughts on this series?  I can understand "let
sleeping dogs lie" but some Android folks are really interested in
i386 testing, and randconfigs/allnoconfigs are doing i386 builds which
are currently broken w/ Clang. This series gets us closer to having
test coverage of this ISA with another toolchain, FWIW.

-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 09/10] x86/percpu: Clean up percpu_stable_op()
  2020-06-02 14:19     ` Brian Gerst
@ 2020-07-09 10:25       ` Peter Zijlstra
  0 siblings, 0 replies; 32+ messages in thread
From: Peter Zijlstra @ 2020-07-09 10:25 UTC (permalink / raw)
  To: Brian Gerst
  Cc: Nick Desaulniers, LKML,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin,
	Andy Lutomirski

On Tue, Jun 02, 2020 at 10:19:51AM -0400, Brian Gerst wrote:
> On Mon, Jun 1, 2020 at 4:43 PM Nick Desaulniers <ndesaulniers@google.com> wrote:
> >
> > On Sat, May 30, 2020 at 3:11 PM Brian Gerst <brgerst@gmail.com> wrote:
> > >
> > > Use __pcpu_size_call_return() to simplify this_cpu_read_stable().
> >
> > Clever! As in this_cpu_read() in include/linux/percpu-defs.h.  Could
> > be its own patch before this, but it's fine.
> > Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
> >
> > > Also remove __bad_percpu_size() which is now unused.
> > >
> > > Signed-off-by: Brian Gerst <brgerst@gmail.com>
> > > ---
> > >  arch/x86/include/asm/percpu.h | 41 ++++++++++-------------------------
> > >  1 file changed, 12 insertions(+), 29 deletions(-)
> > >
> > > diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
> > > index 7efc0b5c4ff0..cf2b9c2a241e 100644
> > > --- a/arch/x86/include/asm/percpu.h
> > > +++ b/arch/x86/include/asm/percpu.h
> > > @@ -85,7 +85,6 @@
> > >
> > >  /* For arch-specific code, we can use direct single-insn ops (they
> > >   * don't give an lvalue though). */
> > > -extern void __bad_percpu_size(void);
> > >
> > >  #define __pcpu_type_1 u8
> > >  #define __pcpu_type_2 u16
> > > @@ -167,33 +166,13 @@ do {                                                                      \
> > >         (typeof(_var))(unsigned long) pfo_val__;                        \
> > >  })
> > >
> > > -#define percpu_stable_op(op, var)                      \
> > > -({                                                     \
> > > -       typeof(var) pfo_ret__;                          \
> > > -       switch (sizeof(var)) {                          \
> > > -       case 1:                                         \
> > > -               asm(op "b "__percpu_arg(P1)",%0"        \
> >
> > What does the `P` do here?
> > https://gcc.gnu.org/onlinedocs/gcc/Simple-Constraints.html#Simple-Constraints
> > says can be machine dependent integral literal in a certain range.
> > https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html#Machine-Constraints
> > doesn't document `P` for x86 though...
> 
> https://gcc.gnu.org/onlinedocs/gcc-10.1.0/gcc/Extended-Asm.html#x86-Operand-Modifiers
> 
> Removing the 'P' modifier results in this:
>         movq %gs:$current_task, %rdx    #, pfo_val__
> 
> This is because the 'p' constraint treats a memory address as a
> constant.  I tried replacing it with __this_cpu_read(), which since
> commit 0b9ccc0a should have similar non-volatile semantics.  But the
> compiler still reloaded it on every use, so I left the asm template
> as-is for now until that can be resolved.

Right, I can into that same issue a while back and gave up staring at
compiler innards. __this_cpu_read() *should* allow re-loads, and it does
in places (we've had bugs because of it), but this_cpu_read_stable() is
somehow far more 'effective'.

It would be good if someone can update the comment with that thing, to
explain matters better.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 02/10] x86/percpu: Clean up percpu_to_op()
  2020-05-30 22:11 ` [PATCH v2 02/10] x86/percpu: Clean up percpu_to_op() Brian Gerst
@ 2020-07-09 10:30   ` Peter Zijlstra
  2020-07-10  4:38     ` Brian Gerst
  0 siblings, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2020-07-09 10:30 UTC (permalink / raw)
  To: Brian Gerst
  Cc: linux-kernel, x86, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H . Peter Anvin, Andy Lutomirski, Nick Desaulniers

On Sat, May 30, 2020 at 06:11:19PM -0400, Brian Gerst wrote:
> +	if (0) {		                                        \
> +		typeof(_var) pto_tmp__;					\
> +		pto_tmp__ = (_val);					\
> +		(void)pto_tmp__;					\
> +	}								\

This is repeated at least once more; and it looks very similar to
__typecheck() and typecheck() but is yet another variant afaict.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 00/10] x86: Clean up percpu operations
  2020-05-30 22:11 [PATCH v2 00/10] x86: Clean up percpu operations Brian Gerst
                   ` (10 preceding siblings ...)
  2020-06-01 21:00 ` [PATCH v2 00/10] x86: Clean up percpu operations Nick Desaulniers
@ 2020-07-09 10:30 ` Peter Zijlstra
  11 siblings, 0 replies; 32+ messages in thread
From: Peter Zijlstra @ 2020-07-09 10:30 UTC (permalink / raw)
  To: Brian Gerst
  Cc: linux-kernel, x86, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H . Peter Anvin, Andy Lutomirski, Nick Desaulniers

On Sat, May 30, 2020 at 06:11:17PM -0400, Brian Gerst wrote:
> The core percpu operations already have a switch on the width of the
> data type, which resulted in an extra amount of dead code being
> generated with the x86 operations having another switch.  This patch set
> rewrites the x86 ops to remove the switch.  Additional cleanups are to
> use named assembly operands, and to cast variables to the width used in
> the assembly to make Clang happy.
> 
> Changes from v1:
> - Add separate patch for XADD constraint fix
> - Fixed sparse truncation warning
> - Add cleanup of percpu_stable_op()
> - Add patch to Remove PER_CPU()
> 
> Brian Gerst (10):
>   x86/percpu: Introduce size abstraction macros
>   x86/percpu: Clean up percpu_to_op()
>   x86/percpu: Clean up percpu_from_op()
>   x86/percpu: Clean up percpu_add_op()
>   x86/percpu: Remove "e" constraint from XADD
>   x86/percpu: Clean up percpu_add_return_op()
>   x86/percpu: Clean up percpu_xchg_op()
>   x86/percpu: Clean up percpu_cmpxchg_op()
>   x86/percpu: Clean up percpu_stable_op()
>   x86/percpu: Remove unused PER_CPU() macro
> 
>  arch/x86/include/asm/percpu.h | 510 ++++++++++++----------------------
>  1 file changed, 172 insertions(+), 338 deletions(-)

Nice!

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 00/10] x86: Clean up percpu operations
  2020-07-08 19:36   ` Nick Desaulniers
@ 2020-07-09 20:02     ` Nick Desaulniers
  2020-07-13 22:24     ` Nick Desaulniers
  1 sibling, 0 replies; 32+ messages in thread
From: Nick Desaulniers @ 2020-07-09 20:02 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: LKML, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H . Peter Anvin, Andy Lutomirski, Peter Zijlstra, Brian Gerst,
	Dmitry Golovin, Alistair Delva, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner

On Wed, Jul 8, 2020 at 12:36 PM Nick Desaulniers
<ndesaulniers@google.com> wrote:
>
> On Mon, Jun 1, 2020 at 2:00 PM Nick Desaulniers <ndesaulniers@google.com> wrote:
> >
> > On Sat, May 30, 2020 at 3:11 PM Brian Gerst <brgerst@gmail.com> wrote:
> > >
> > > The core percpu operations already have a switch on the width of the
> > > data type, which resulted in an extra amount of dead code being
> > > generated with the x86 operations having another switch.  This patch set
> > > rewrites the x86 ops to remove the switch.  Additional cleanups are to
> > > use named assembly operands, and to cast variables to the width used in
> > > the assembly to make Clang happy.
> >
> > Thanks for all of the work that went into this series.  I think I've
> > reviewed all of them.
> > With this series plus this hunk:
> > https://github.com/ClangBuiltLinux/continuous-integration/blob/master/patches/llvm-all/linux-next/x86/x86-support-i386-with-Clang.patch#L219-L237
> > I can build and boot i386_defconfig with Clang! So for the series:
> >
> > Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
> > Tested-by: Nick Desaulniers <ndesaulniers@google.com>
>
> tglx, Ingo, Boris, Linus,
> Do you all have thoughts on this series?  I can understand "let
> sleeping dogs lie" but some Android folks are really interested in
> i386 testing, and randconfigs/allnoconfigs are doing i386 builds which
> are currently broken w/ Clang. This series gets us closer to having
> test coverage of this ISA with another toolchain, FWIW.

Oh, was https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6ec4476ac825
alluding to this, perchance?

-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 02/10] x86/percpu: Clean up percpu_to_op()
  2020-07-09 10:30   ` Peter Zijlstra
@ 2020-07-10  4:38     ` Brian Gerst
  2020-07-10  8:53       ` Peter Zijlstra
  0 siblings, 1 reply; 32+ messages in thread
From: Brian Gerst @ 2020-07-10  4:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linux Kernel Mailing List, the arch/x86 maintainers,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin,
	Andy Lutomirski, Nick Desaulniers

On Thu, Jul 9, 2020 at 6:30 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Sat, May 30, 2020 at 06:11:19PM -0400, Brian Gerst wrote:
> > +     if (0) {                                                        \
> > +             typeof(_var) pto_tmp__;                                 \
> > +             pto_tmp__ = (_val);                                     \
> > +             (void)pto_tmp__;                                        \
> > +     }                                                               \
>
> This is repeated at least once more; and it looks very similar to
> __typecheck() and typecheck() but is yet another variant afaict.

The problem with typecheck() is that it will complain about a mismatch
between unsigned long and u64 (defined as unsigned long long) even
though both are 64-bits wide on x86-64.  Cleaning that mess up is
beyond the scope of this series, so I kept the existing checks.

--
Brian Gerst

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 02/10] x86/percpu: Clean up percpu_to_op()
  2020-07-10  4:38     ` Brian Gerst
@ 2020-07-10  8:53       ` Peter Zijlstra
  2020-07-10 16:56         ` Nick Desaulniers
  0 siblings, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2020-07-10  8:53 UTC (permalink / raw)
  To: Brian Gerst
  Cc: Linux Kernel Mailing List, the arch/x86 maintainers,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin,
	Andy Lutomirski, Nick Desaulniers

On Fri, Jul 10, 2020 at 12:38:23AM -0400, Brian Gerst wrote:
> On Thu, Jul 9, 2020 at 6:30 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Sat, May 30, 2020 at 06:11:19PM -0400, Brian Gerst wrote:
> > > +     if (0) {                                                        \
> > > +             typeof(_var) pto_tmp__;                                 \
> > > +             pto_tmp__ = (_val);                                     \
> > > +             (void)pto_tmp__;                                        \
> > > +     }                                                               \
> >
> > This is repeated at least once more; and it looks very similar to
> > __typecheck() and typecheck() but is yet another variant afaict.
> 
> The problem with typecheck() is that it will complain about a mismatch
> between unsigned long and u64 (defined as unsigned long long) even
> though both are 64-bits wide on x86-64.  Cleaning that mess up is
> beyond the scope of this series, so I kept the existing checks.

Fair enough; thanks for explaining.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 02/10] x86/percpu: Clean up percpu_to_op()
  2020-07-10  8:53       ` Peter Zijlstra
@ 2020-07-10 16:56         ` Nick Desaulniers
  0 siblings, 0 replies; 32+ messages in thread
From: Nick Desaulniers @ 2020-07-10 16:56 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Brian Gerst, Linux Kernel Mailing List, the arch/x86 maintainers,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin,
	Andy Lutomirski

On Fri, Jul 10, 2020 at 1:53 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Fri, Jul 10, 2020 at 12:38:23AM -0400, Brian Gerst wrote:
> > On Thu, Jul 9, 2020 at 6:30 AM Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > On Sat, May 30, 2020 at 06:11:19PM -0400, Brian Gerst wrote:
> > > > +     if (0) {                                                        \
> > > > +             typeof(_var) pto_tmp__;                                 \
> > > > +             pto_tmp__ = (_val);                                     \
> > > > +             (void)pto_tmp__;                                        \
> > > > +     }                                                               \
> > >
> > > This is repeated at least once more; and it looks very similar to
> > > __typecheck() and typecheck() but is yet another variant afaict.
> >
> > The problem with typecheck() is that it will complain about a mismatch
> > between unsigned long and u64 (defined as unsigned long long) even
> > though both are 64-bits wide on x86-64.  Cleaning that mess up is
> > beyond the scope of this series, so I kept the existing checks.
>
> Fair enough; thanks for explaining.

I brought up the same point in v1, for more context:
https://lore.kernel.org/lkml/CAKwvOdnCcpS_9A2y9tMqeiAg2NfcVx=gNeA2V=+zHknit7wGkg@mail.gmail.com/
-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 00/10] x86: Clean up percpu operations
  2020-07-08 19:36   ` Nick Desaulniers
  2020-07-09 20:02     ` Nick Desaulniers
@ 2020-07-13 22:24     ` Nick Desaulniers
  2020-07-13 22:40       ` Linus Torvalds
  1 sibling, 1 reply; 32+ messages in thread
From: Nick Desaulniers @ 2020-07-13 22:24 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Linus Torvalds
  Cc: LKML, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H . Peter Anvin, Andy Lutomirski, Peter Zijlstra, Brian Gerst,
	Dmitry Golovin, Alistair Delva, Steven Rostedt

On Wed, Jul 8, 2020 at 12:36 PM Nick Desaulniers
<ndesaulniers@google.com> wrote:
>
> On Mon, Jun 1, 2020 at 2:00 PM Nick Desaulniers <ndesaulniers@google.com> wrote:
> >
> > On Sat, May 30, 2020 at 3:11 PM Brian Gerst <brgerst@gmail.com> wrote:
> > >
> > > The core percpu operations already have a switch on the width of the
> > > data type, which resulted in an extra amount of dead code being
> > > generated with the x86 operations having another switch.  This patch set
> > > rewrites the x86 ops to remove the switch.  Additional cleanups are to
> > > use named assembly operands, and to cast variables to the width used in
> > > the assembly to make Clang happy.
> >
> > Thanks for all of the work that went into this series.  I think I've
> > reviewed all of them.
> > With this series plus this hunk:
> > https://github.com/ClangBuiltLinux/continuous-integration/blob/master/patches/llvm-all/linux-next/x86/x86-support-i386-with-Clang.patch#L219-L237
> > I can build and boot i386_defconfig with Clang! So for the series:
> >
> > Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
> > Tested-by: Nick Desaulniers <ndesaulniers@google.com>
>
> tglx, Ingo, Boris, Linus,
> Do you all have thoughts on this series?  I can understand "let
> sleeping dogs lie" but some Android folks are really interested in
> i386 testing, and randconfigs/allnoconfigs are doing i386 builds which
> are currently broken w/ Clang. This series gets us closer to having
> test coverage of this ISA with another toolchain, FWIW.

I'm trying to organize an LLVM micro conference at plumbers.  I think
a session on "clang and remaining i386 blockers" might be of interest,
which would cover why the existing code pattern is problematic for
compiler portability.  This series in particular would be brought up.

Are you all planning on attending plumbers this year?  Might such a
session be of interest?

Otherwise, is there any additional feedback on this series or is it good to go?
-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 00/10] x86: Clean up percpu operations
  2020-07-13 22:24     ` Nick Desaulniers
@ 2020-07-13 22:40       ` Linus Torvalds
  2020-07-13 22:58         ` Nick Desaulniers
  0 siblings, 1 reply; 32+ messages in thread
From: Linus Torvalds @ 2020-07-13 22:40 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, LKML,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H . Peter Anvin, Andy Lutomirski, Peter Zijlstra, Brian Gerst,
	Dmitry Golovin, Alistair Delva, Steven Rostedt

On Mon, Jul 13, 2020 at 3:24 PM Nick Desaulniers
<ndesaulniers@google.com> wrote:
>
> Otherwise, is there any additional feedback on this series or is it good to go?

I've lost sight of the series. I'm sure it is fine, but maybe you can
resend it to me (in private, if it's already been going out on the
mailing lists and everybody else is completely fed up with it).

And no, pointing to the "plus this hunk" with a web link isn't what I
was looking for ;)

             Linus

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 00/10] x86: Clean up percpu operations
  2020-07-13 22:40       ` Linus Torvalds
@ 2020-07-13 22:58         ` Nick Desaulniers
  2020-07-14  0:31           ` Nick Desaulniers
  0 siblings, 1 reply; 32+ messages in thread
From: Nick Desaulniers @ 2020-07-13 22:58 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, LKML,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H . Peter Anvin, Andy Lutomirski, Peter Zijlstra, Brian Gerst,
	Dmitry Golovin, Alistair Delva, Steven Rostedt

On Mon, Jul 13, 2020 at 3:40 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Mon, Jul 13, 2020 at 3:24 PM Nick Desaulniers
> <ndesaulniers@google.com> wrote:
> >
> > Otherwise, is there any additional feedback on this series or is it good to go?
>
> I've lost sight of the series. I'm sure it is fine, but maybe you can
> resend it to me (in private, if it's already been going out on the
> mailing lists and everybody else is completely fed up with it).

Is there a fast way that maintainters amend ACKs to each commit in a series?

>
> And no, pointing to the "plus this hunk" with a web link isn't what I
> was looking for ;)

So you're not accepting pull requests yet on github? I jest.
-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 00/10] x86: Clean up percpu operations
  2020-07-13 22:58         ` Nick Desaulniers
@ 2020-07-14  0:31           ` Nick Desaulniers
  2020-07-14  1:29             ` Linus Torvalds
  0 siblings, 1 reply; 32+ messages in thread
From: Nick Desaulniers @ 2020-07-14  0:31 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, LKML,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H . Peter Anvin, Andy Lutomirski, Peter Zijlstra, Brian Gerst,
	Dmitry Golovin, Alistair Delva, Steven Rostedt,
	clang-built-linux

On Mon, Jul 13, 2020 at 3:58 PM Nick Desaulniers
<ndesaulniers@google.com> wrote:
>
> On Mon, Jul 13, 2020 at 3:40 PM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > On Mon, Jul 13, 2020 at 3:24 PM Nick Desaulniers
> > <ndesaulniers@google.com> wrote:
> > >
> > > Otherwise, is there any additional feedback on this series or is it good to go?
> >
> > I've lost sight of the series. I'm sure it is fine, but maybe you can
> > resend it to me (in private, if it's already been going out on the
> > mailing lists and everybody else is completely fed up with it).
>
> Is there a fast way that maintainters amend ACKs to each commit in a series?

For future travelers (more so myself, since I don't sync my shell
history between machines, and I'm a big fan of aggressively sharing
knowledge. See also the section "Information as Power" and the
anecdote about tank manuals:
https://www.meforum.org/441/why-arabs-lose-wars). `b4` has a pretty
cool feature.  When I was fetching this series, it was warning:
```
NOTE: Some trailers were sent to the cover letter:
      Tested-by: Nick Desaulniers <ndesaulniers@google.com>
      Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
NOTE: Rerun with -t to apply them to all patches
```
So I did:
```
$ b4 am -t https://lore.kernel.org/lkml/CAKwvOdmsap8iB+H5JXiHYwSJFrtQ_krjNH7eQCGe7p-LjK7ftA@mail.gmail.com/T/\#t
-o - | git am
$ git filter-branch -f --msg-filter 'cat - && echo "Signed-off-by:
Nick Desaulniers <ndesaulniers@google.com>"" $@";' HEAD~10..HEAD

>
> >
> > And no, pointing to the "plus this hunk" with a web link isn't what I
> > was looking for ;)
>
> So you're not accepting pull requests yet on github? I jest.

Actually, looks like a lot of merged PRs come from github!  Grizzly
Adams *did* have a beard! https://www.youtube.com/watch?v=pdwJC9HvKLU

Sent as a series of emails via:
$ git format-patch -o linus_i386 HEAD~11
$ git send-email --to="Linus Torvalds <torvalds@linux-foundation.org>"
--suppress-cc=all linus_i386

Though, I'm sure a pull request would have been more formal.
-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 00/10] x86: Clean up percpu operations
  2020-07-14  0:31           ` Nick Desaulniers
@ 2020-07-14  1:29             ` Linus Torvalds
  0 siblings, 0 replies; 32+ messages in thread
From: Linus Torvalds @ 2020-07-14  1:29 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, LKML,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H . Peter Anvin, Andy Lutomirski, Peter Zijlstra, Brian Gerst,
	Dmitry Golovin, Alistair Delva, Steven Rostedt,
	clang-built-linux

On Mon, Jul 13, 2020 at 5:31 PM Nick Desaulniers
<ndesaulniers@google.com> wrote:
>
> So I did:

Ack, I have the series, and it looks fine to me.

> Actually, looks like a lot of merged PRs come from github!  Grizzly
> Adams *did* have a beard!

Oh, I think github is an _excellent_ hosting service.

I just can't stand their web workflow for creating commits and merging code.

At least at some point it was really easy to generate totally horrible
commit messages using the web interface without the proper header line
etc. I _think_ that got fixed. But the merge codeflow still doesn't
work at all for the kernel.

(To be fair, I've used it for _other_ projects, and there it can work
really really well, together with all the test infrastructure etc).

They also have a very spammy email system where people can add me to
their projects and "invite" me, and I get email about it.

But again - as a hosting site for kernel pulls, it's one of the better ones.

So you definitely don't need a kernel.org account to send me pull
requests, github is in fact much better than some others (infradead is
very slow to pull for me, for example).

A kernel.org pull doesn't strictly need a signed tag - I do kernel.org
pulls over ssh, and I trust the user security there. Any other hosting
service I _do_ require that the pull requests are proper signed tags,
though.

But even that difference is largely gone: for kernel.org pulls I've
very heavily favored signed tags, and I think almost all of them are
using them by now.

               Linus

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2020-07-14  1:30 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-30 22:11 [PATCH v2 00/10] x86: Clean up percpu operations Brian Gerst
2020-05-30 22:11 ` [PATCH v2 01/10] x86/percpu: Introduce size abstraction macros Brian Gerst
2020-06-01 20:48   ` Nick Desaulniers
2020-05-30 22:11 ` [PATCH v2 02/10] x86/percpu: Clean up percpu_to_op() Brian Gerst
2020-07-09 10:30   ` Peter Zijlstra
2020-07-10  4:38     ` Brian Gerst
2020-07-10  8:53       ` Peter Zijlstra
2020-07-10 16:56         ` Nick Desaulniers
2020-05-30 22:11 ` [PATCH v2 03/10] x86/percpu: Clean up percpu_from_op() Brian Gerst
2020-05-30 22:11 ` [PATCH v2 04/10] x86/percpu: Clean up percpu_add_op() Brian Gerst
2020-06-01 20:18   ` Nick Desaulniers
2020-05-30 22:11 ` [PATCH v2 05/10] x86/percpu: Remove "e" constraint from XADD Brian Gerst
2020-06-01 18:50   ` Nick Desaulniers
2020-05-30 22:11 ` [PATCH v2 06/10] x86/percpu: Clean up percpu_add_return_op() Brian Gerst
2020-06-01 19:47   ` Nick Desaulniers
2020-05-30 22:11 ` [PATCH v2 07/10] x86/percpu: Clean up percpu_xchg_op() Brian Gerst
2020-05-30 22:11 ` [PATCH v2 08/10] x86/percpu: Clean up percpu_cmpxchg_op() Brian Gerst
2020-05-30 22:11 ` [PATCH v2 09/10] x86/percpu: Clean up percpu_stable_op() Brian Gerst
2020-06-01 20:43   ` Nick Desaulniers
2020-06-02 14:19     ` Brian Gerst
2020-07-09 10:25       ` Peter Zijlstra
2020-05-30 22:11 ` [PATCH v2 10/10] x86/percpu: Remove unused PER_CPU() macro Brian Gerst
2020-06-01 20:26   ` Nick Desaulniers
2020-06-01 21:00 ` [PATCH v2 00/10] x86: Clean up percpu operations Nick Desaulniers
2020-07-08 19:36   ` Nick Desaulniers
2020-07-09 20:02     ` Nick Desaulniers
2020-07-13 22:24     ` Nick Desaulniers
2020-07-13 22:40       ` Linus Torvalds
2020-07-13 22:58         ` Nick Desaulniers
2020-07-14  0:31           ` Nick Desaulniers
2020-07-14  1:29             ` Linus Torvalds
2020-07-09 10:30 ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).