All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32
@ 2021-04-13 16:38 ` Christophe Leroy
  0 siblings, 0 replies; 48+ messages in thread
From: Christophe Leroy @ 2021-04-13 16:38 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
  Cc: linux-kernel, linuxppc-dev

powerpc BUG_ON() and WARN_ON() are based on using twnei instruction.

For catching simple conditions like a variable having value 0, this
is efficient because it does the test and the trap at the same time.
But most conditions used with BUG_ON or WARN_ON are more complex and
forces GCC to format the condition into a 0 or 1 value in a register.
This will usually require 2 to 3 instructions.

The most efficient solution would be to use __builtin_trap() because
GCC is able to optimise the use of the different trap instructions
based on the requested condition, but this is complex if not
impossible for the following reasons:
- __builtin_trap() is a non-recoverable instruction, so it can't be
used for WARN_ON
- Knowing which line of code generated the trap would require the
analysis of DWARF information. This is not a feature we have today.

As mentioned in commit 8d4fbcfbe0a4 ("Fix WARN_ON() on bitfield ops")
the way WARN_ON() is implemented is suboptimal. That commit also
mentions an issue with 'long long' condition. It fixed it for
WARN_ON() but the same problem still exists today with BUG_ON() on
PPC32. It will be fixed by using the generic implementation.

By using the generic implementation, gcc will naturally generate a
branch to the unconditional trap generated by BUG().

As modern powerpc implement zero-cycle branch,
that's even more efficient.

And for the functions using WARN_ON() and its return, the test
on return from WARN_ON() is now also used for the WARN_ON() itself.

On PPC64 we don't want it because we want to be able to use CFAR
register to track how we entered the code that trapped. The CFAR
register would be clobbered by the branch.

A simple test function:

	unsigned long test9w(unsigned long a, unsigned long b)
	{
		if (WARN_ON(!b))
			return 0;
		return a / b;
	}

Before the patch:

	0000046c <test9w>:
	 46c:	7c 89 00 34 	cntlzw  r9,r4
	 470:	55 29 d9 7e 	rlwinm  r9,r9,27,5,31
	 474:	0f 09 00 00 	twnei   r9,0
	 478:	2c 04 00 00 	cmpwi   r4,0
	 47c:	41 82 00 0c 	beq     488 <test9w+0x1c>
	 480:	7c 63 23 96 	divwu   r3,r3,r4
	 484:	4e 80 00 20 	blr

	 488:	38 60 00 00 	li      r3,0
	 48c:	4e 80 00 20 	blr

After the patch:

	00000468 <test9w>:
	 468:	2c 04 00 00 	cmpwi   r4,0
	 46c:	41 82 00 0c 	beq     478 <test9w+0x10>
	 470:	7c 63 23 96 	divwu   r3,r3,r4
	 474:	4e 80 00 20 	blr

	 478:	0f e0 00 00 	twui    r0,0
	 47c:	38 60 00 00 	li      r3,0
	 480:	4e 80 00 20 	blr

So we see before the patch we need 3 instructions on the likely path
to handle the WARN_ON(). With the patch the trap goes on the unlikely
path.

See below the difference at the entry of system_call_exception where
we have several BUG_ON(), allthough less impressing.

With the patch:

	00000000 <system_call_exception>:
	   0:	81 6a 00 84 	lwz     r11,132(r10)
	   4:	90 6a 00 88 	stw     r3,136(r10)
	   8:	71 60 00 02 	andi.   r0,r11,2
	   c:	41 82 00 70 	beq     7c <system_call_exception+0x7c>
	  10:	71 60 40 00 	andi.   r0,r11,16384
	  14:	41 82 00 6c 	beq     80 <system_call_exception+0x80>
	  18:	71 6b 80 00 	andi.   r11,r11,32768
	  1c:	41 82 00 68 	beq     84 <system_call_exception+0x84>
	  20:	94 21 ff e0 	stwu    r1,-32(r1)
	  24:	93 e1 00 1c 	stw     r31,28(r1)
	  28:	7d 8c 42 e6 	mftb    r12
	...
	  7c:	0f e0 00 00 	twui    r0,0
	  80:	0f e0 00 00 	twui    r0,0
	  84:	0f e0 00 00 	twui    r0,0

Without the patch:

	00000000 <system_call_exception>:
	   0:	94 21 ff e0 	stwu    r1,-32(r1)
	   4:	93 e1 00 1c 	stw     r31,28(r1)
	   8:	90 6a 00 88 	stw     r3,136(r10)
	   c:	81 6a 00 84 	lwz     r11,132(r10)
	  10:	69 60 00 02 	xori    r0,r11,2
	  14:	54 00 ff fe 	rlwinm  r0,r0,31,31,31
	  18:	0f 00 00 00 	twnei   r0,0
	  1c:	69 60 40 00 	xori    r0,r11,16384
	  20:	54 00 97 fe 	rlwinm  r0,r0,18,31,31
	  24:	0f 00 00 00 	twnei   r0,0
	  28:	69 6b 80 00 	xori    r11,r11,32768
	  2c:	55 6b 8f fe 	rlwinm  r11,r11,17,31,31
	  30:	0f 0b 00 00 	twnei   r11,0
	  34:	7d 8c 42 e6 	mftb    r12

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 arch/powerpc/include/asm/bug.h | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
index d1635ffbb179..101dea4eec8d 100644
--- a/arch/powerpc/include/asm/bug.h
+++ b/arch/powerpc/include/asm/bug.h
@@ -68,7 +68,11 @@
 	BUG_ENTRY("twi 31, 0, 0", 0);				\
 	unreachable();						\
 } while (0)
+#define HAVE_ARCH_BUG
+
+#define __WARN_FLAGS(flags) BUG_ENTRY("twi 31, 0, 0", BUGFLAG_WARNING | (flags))
 
+#ifdef CONFIG_PPC64
 #define BUG_ON(x) do {						\
 	if (__builtin_constant_p(x)) {				\
 		if (x)						\
@@ -78,8 +82,6 @@
 	}							\
 } while (0)
 
-#define __WARN_FLAGS(flags) BUG_ENTRY("twi 31, 0, 0", BUGFLAG_WARNING | (flags))
-
 #define WARN_ON(x) ({						\
 	int __ret_warn_on = !!(x);				\
 	if (__builtin_constant_p(__ret_warn_on)) {		\
@@ -93,9 +95,10 @@
 	unlikely(__ret_warn_on);				\
 })
 
-#define HAVE_ARCH_BUG
 #define HAVE_ARCH_BUG_ON
 #define HAVE_ARCH_WARN_ON
+#endif
+
 #endif /* __ASSEMBLY __ */
 #else
 #ifdef __ASSEMBLY__
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32
@ 2021-04-13 16:38 ` Christophe Leroy
  0 siblings, 0 replies; 48+ messages in thread
From: Christophe Leroy @ 2021-04-13 16:38 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
  Cc: linuxppc-dev, linux-kernel

powerpc BUG_ON() and WARN_ON() are based on using twnei instruction.

For catching simple conditions like a variable having value 0, this
is efficient because it does the test and the trap at the same time.
But most conditions used with BUG_ON or WARN_ON are more complex and
forces GCC to format the condition into a 0 or 1 value in a register.
This will usually require 2 to 3 instructions.

The most efficient solution would be to use __builtin_trap() because
GCC is able to optimise the use of the different trap instructions
based on the requested condition, but this is complex if not
impossible for the following reasons:
- __builtin_trap() is a non-recoverable instruction, so it can't be
used for WARN_ON
- Knowing which line of code generated the trap would require the
analysis of DWARF information. This is not a feature we have today.

As mentioned in commit 8d4fbcfbe0a4 ("Fix WARN_ON() on bitfield ops")
the way WARN_ON() is implemented is suboptimal. That commit also
mentions an issue with 'long long' condition. It fixed it for
WARN_ON() but the same problem still exists today with BUG_ON() on
PPC32. It will be fixed by using the generic implementation.

By using the generic implementation, gcc will naturally generate a
branch to the unconditional trap generated by BUG().

As modern powerpc implement zero-cycle branch,
that's even more efficient.

And for the functions using WARN_ON() and its return, the test
on return from WARN_ON() is now also used for the WARN_ON() itself.

On PPC64 we don't want it because we want to be able to use CFAR
register to track how we entered the code that trapped. The CFAR
register would be clobbered by the branch.

A simple test function:

	unsigned long test9w(unsigned long a, unsigned long b)
	{
		if (WARN_ON(!b))
			return 0;
		return a / b;
	}

Before the patch:

	0000046c <test9w>:
	 46c:	7c 89 00 34 	cntlzw  r9,r4
	 470:	55 29 d9 7e 	rlwinm  r9,r9,27,5,31
	 474:	0f 09 00 00 	twnei   r9,0
	 478:	2c 04 00 00 	cmpwi   r4,0
	 47c:	41 82 00 0c 	beq     488 <test9w+0x1c>
	 480:	7c 63 23 96 	divwu   r3,r3,r4
	 484:	4e 80 00 20 	blr

	 488:	38 60 00 00 	li      r3,0
	 48c:	4e 80 00 20 	blr

After the patch:

	00000468 <test9w>:
	 468:	2c 04 00 00 	cmpwi   r4,0
	 46c:	41 82 00 0c 	beq     478 <test9w+0x10>
	 470:	7c 63 23 96 	divwu   r3,r3,r4
	 474:	4e 80 00 20 	blr

	 478:	0f e0 00 00 	twui    r0,0
	 47c:	38 60 00 00 	li      r3,0
	 480:	4e 80 00 20 	blr

So we see before the patch we need 3 instructions on the likely path
to handle the WARN_ON(). With the patch the trap goes on the unlikely
path.

See below the difference at the entry of system_call_exception where
we have several BUG_ON(), allthough less impressing.

With the patch:

	00000000 <system_call_exception>:
	   0:	81 6a 00 84 	lwz     r11,132(r10)
	   4:	90 6a 00 88 	stw     r3,136(r10)
	   8:	71 60 00 02 	andi.   r0,r11,2
	   c:	41 82 00 70 	beq     7c <system_call_exception+0x7c>
	  10:	71 60 40 00 	andi.   r0,r11,16384
	  14:	41 82 00 6c 	beq     80 <system_call_exception+0x80>
	  18:	71 6b 80 00 	andi.   r11,r11,32768
	  1c:	41 82 00 68 	beq     84 <system_call_exception+0x84>
	  20:	94 21 ff e0 	stwu    r1,-32(r1)
	  24:	93 e1 00 1c 	stw     r31,28(r1)
	  28:	7d 8c 42 e6 	mftb    r12
	...
	  7c:	0f e0 00 00 	twui    r0,0
	  80:	0f e0 00 00 	twui    r0,0
	  84:	0f e0 00 00 	twui    r0,0

Without the patch:

	00000000 <system_call_exception>:
	   0:	94 21 ff e0 	stwu    r1,-32(r1)
	   4:	93 e1 00 1c 	stw     r31,28(r1)
	   8:	90 6a 00 88 	stw     r3,136(r10)
	   c:	81 6a 00 84 	lwz     r11,132(r10)
	  10:	69 60 00 02 	xori    r0,r11,2
	  14:	54 00 ff fe 	rlwinm  r0,r0,31,31,31
	  18:	0f 00 00 00 	twnei   r0,0
	  1c:	69 60 40 00 	xori    r0,r11,16384
	  20:	54 00 97 fe 	rlwinm  r0,r0,18,31,31
	  24:	0f 00 00 00 	twnei   r0,0
	  28:	69 6b 80 00 	xori    r11,r11,32768
	  2c:	55 6b 8f fe 	rlwinm  r11,r11,17,31,31
	  30:	0f 0b 00 00 	twnei   r11,0
	  34:	7d 8c 42 e6 	mftb    r12

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 arch/powerpc/include/asm/bug.h | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
index d1635ffbb179..101dea4eec8d 100644
--- a/arch/powerpc/include/asm/bug.h
+++ b/arch/powerpc/include/asm/bug.h
@@ -68,7 +68,11 @@
 	BUG_ENTRY("twi 31, 0, 0", 0);				\
 	unreachable();						\
 } while (0)
+#define HAVE_ARCH_BUG
+
+#define __WARN_FLAGS(flags) BUG_ENTRY("twi 31, 0, 0", BUGFLAG_WARNING | (flags))
 
+#ifdef CONFIG_PPC64
 #define BUG_ON(x) do {						\
 	if (__builtin_constant_p(x)) {				\
 		if (x)						\
@@ -78,8 +82,6 @@
 	}							\
 } while (0)
 
-#define __WARN_FLAGS(flags) BUG_ENTRY("twi 31, 0, 0", BUGFLAG_WARNING | (flags))
-
 #define WARN_ON(x) ({						\
 	int __ret_warn_on = !!(x);				\
 	if (__builtin_constant_p(__ret_warn_on)) {		\
@@ -93,9 +95,10 @@
 	unlikely(__ret_warn_on);				\
 })
 
-#define HAVE_ARCH_BUG
 #define HAVE_ARCH_BUG_ON
 #define HAVE_ARCH_WARN_ON
+#endif
+
 #endif /* __ASSEMBLY __ */
 #else
 #ifdef __ASSEMBLY__
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto
  2021-04-13 16:38 ` Christophe Leroy
@ 2021-04-13 16:38   ` Christophe Leroy
  -1 siblings, 0 replies; 48+ messages in thread
From: Christophe Leroy @ 2021-04-13 16:38 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
  Cc: linux-kernel, linuxppc-dev

Using asm goto in __WARN_FLAGS() and WARN_ON() allows more
flexibility to GCC.

For that add an entry to the exception table so that
program_check_exception() knowns where to resume execution
after a WARNING.

Here are two exemples. The first one is done on PPC32 (which
benefits from the previous patch), the second is on PPC64.

	unsigned long test(struct pt_regs *regs)
	{
		int ret;

		WARN_ON(regs->msr & MSR_PR);

		return regs->gpr[3];
	}

	unsigned long test9w(unsigned long a, unsigned long b)
	{
		if (WARN_ON(!b))
			return 0;
		return a / b;
	}

Before the patch:

	000003a8 <test>:
	 3a8:	81 23 00 84 	lwz     r9,132(r3)
	 3ac:	71 29 40 00 	andi.   r9,r9,16384
	 3b0:	40 82 00 0c 	bne     3bc <test+0x14>
	 3b4:	80 63 00 0c 	lwz     r3,12(r3)
	 3b8:	4e 80 00 20 	blr

	 3bc:	0f e0 00 00 	twui    r0,0
	 3c0:	80 63 00 0c 	lwz     r3,12(r3)
	 3c4:	4e 80 00 20 	blr

	0000000000000bf0 <.test9w>:
	 bf0:	7c 89 00 74 	cntlzd  r9,r4
	 bf4:	79 29 d1 82 	rldicl  r9,r9,58,6
	 bf8:	0b 09 00 00 	tdnei   r9,0
	 bfc:	2c 24 00 00 	cmpdi   r4,0
	 c00:	41 82 00 0c 	beq     c0c <.test9w+0x1c>
	 c04:	7c 63 23 92 	divdu   r3,r3,r4
	 c08:	4e 80 00 20 	blr

	 c0c:	38 60 00 00 	li      r3,0
	 c10:	4e 80 00 20 	blr

After the patch:

	000003a8 <test>:
	 3a8:	81 23 00 84 	lwz     r9,132(r3)
	 3ac:	71 29 40 00 	andi.   r9,r9,16384
	 3b0:	40 82 00 0c 	bne     3bc <test+0x14>
	 3b4:	80 63 00 0c 	lwz     r3,12(r3)
	 3b8:	4e 80 00 20 	blr

	 3bc:	0f e0 00 00 	twui    r0,0

	0000000000000c50 <.test9w>:
	 c50:	7c 89 00 74 	cntlzd  r9,r4
	 c54:	79 29 d1 82 	rldicl  r9,r9,58,6
	 c58:	0b 09 00 00 	tdnei   r9,0
	 c5c:	7c 63 23 92 	divdu   r3,r3,r4
	 c60:	4e 80 00 20 	blr

	 c70:	38 60 00 00 	li      r3,0
	 c74:	4e 80 00 20 	blr

In the first exemple, we see GCC doesn't need to duplicate what
happens after the trap.

In the second exemple, we see that GCC doesn't need to emit a test
and a branch in the likely path in addition to the trap.

We've got some WARN_ON() in .softirqentry.text section so it needs
to be added in the OTHER_TEXT_SECTIONS in modpost.c

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
v2: Fix build failure when CONFIG_BUG is not selected.
---
 arch/powerpc/include/asm/book3s/64/kup.h |  2 +-
 arch/powerpc/include/asm/bug.h           | 54 ++++++++++++++++++++----
 arch/powerpc/include/asm/extable.h       | 14 ++++++
 arch/powerpc/include/asm/ppc_asm.h       | 11 +----
 arch/powerpc/kernel/entry_64.S           |  2 +-
 arch/powerpc/kernel/exceptions-64e.S     |  2 +-
 arch/powerpc/kernel/misc_32.S            |  2 +-
 arch/powerpc/kernel/traps.c              |  9 +++-
 scripts/mod/modpost.c                    |  2 +-
 9 files changed, 72 insertions(+), 26 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/kup.h b/arch/powerpc/include/asm/book3s/64/kup.h
index 9700da3a4093..a22839cba32e 100644
--- a/arch/powerpc/include/asm/book3s/64/kup.h
+++ b/arch/powerpc/include/asm/book3s/64/kup.h
@@ -90,7 +90,7 @@
 	/* Prevent access to userspace using any key values */
 	LOAD_REG_IMMEDIATE(\gpr2, AMR_KUAP_BLOCKED)
 999:	tdne	\gpr1, \gpr2
-	EMIT_BUG_ENTRY 999b, __FILE__, __LINE__, (BUGFLAG_WARNING | BUGFLAG_ONCE)
+	EMIT_WARN_ENTRY 999b, __FILE__, __LINE__, (BUGFLAG_WARNING | BUGFLAG_ONCE)
 	END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_BOOK3S_KUAP, 67)
 #endif
 .endm
diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
index 101dea4eec8d..e22dc503fb2f 100644
--- a/arch/powerpc/include/asm/bug.h
+++ b/arch/powerpc/include/asm/bug.h
@@ -4,6 +4,7 @@
 #ifdef __KERNEL__
 
 #include <asm/asm-compat.h>
+#include <asm/extable.h>
 
 #ifdef CONFIG_BUG
 
@@ -30,6 +31,11 @@
 .endm
 #endif /* verbose */
 
+.macro EMIT_WARN_ENTRY addr,file,line,flags
+	EX_TABLE(\addr,\addr+4)
+	EMIT_BUG_ENTRY \addr,\file,\line,\flags
+.endm
+
 #else /* !__ASSEMBLY__ */
 /* _EMIT_BUG_ENTRY expects args %0,%1,%2,%3 to be FILE, LINE, flags and
    sizeof(struct bug_entry), respectively */
@@ -58,6 +64,16 @@
 		  "i" (sizeof(struct bug_entry)),	\
 		  ##__VA_ARGS__)
 
+#define WARN_ENTRY(insn, flags, label, ...)		\
+	asm_volatile_goto(				\
+		"1:	" insn "\n"			\
+		EX_TABLE(1b, %l[label])			\
+		_EMIT_BUG_ENTRY				\
+		: : "i" (__FILE__), "i" (__LINE__),	\
+		  "i" (flags),				\
+		  "i" (sizeof(struct bug_entry)),	\
+		  ##__VA_ARGS__ : : label)
+
 /*
  * BUG_ON() and WARN_ON() do their best to cooperate with compile-time
  * optimisations. However depending on the complexity of the condition
@@ -70,7 +86,15 @@
 } while (0)
 #define HAVE_ARCH_BUG
 
-#define __WARN_FLAGS(flags) BUG_ENTRY("twi 31, 0, 0", BUGFLAG_WARNING | (flags))
+#define __WARN_FLAGS(flags) do {				\
+	__label__ __label_warn_on;				\
+								\
+	WARN_ENTRY("twi 31, 0, 0", BUGFLAG_WARNING | (flags), __label_warn_on); \
+	unreachable();						\
+								\
+__label_warn_on:						\
+	break;							\
+} while (0)
 
 #ifdef CONFIG_PPC64
 #define BUG_ON(x) do {						\
@@ -83,15 +107,24 @@
 } while (0)
 
 #define WARN_ON(x) ({						\
-	int __ret_warn_on = !!(x);				\
-	if (__builtin_constant_p(__ret_warn_on)) {		\
-		if (__ret_warn_on)				\
+	bool __ret_warn_on = false;				\
+	do {							\
+		if (__builtin_constant_p((x))) {		\
+			if (!(x)) 				\
+				break; 				\
 			__WARN();				\
-	} else {						\
-		BUG_ENTRY(PPC_TLNEI " %4, 0",			\
-			  BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN),	\
-			  "r" (__ret_warn_on));	\
-	}							\
+			__ret_warn_on = true;			\
+		} else {					\
+			__label__ __label_warn_on;		\
+								\
+			WARN_ENTRY(PPC_TLNEI " %4, 0",		\
+				   BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN),	\
+				   __label_warn_on, "r" (x));	\
+			break;					\
+__label_warn_on:						\
+			__ret_warn_on = true;			\
+		}						\
+	} while (0);						\
 	unlikely(__ret_warn_on);				\
 })
 
@@ -104,8 +137,11 @@
 #ifdef __ASSEMBLY__
 .macro EMIT_BUG_ENTRY addr,file,line,flags
 .endm
+.macro EMIT_WARN_ENTRY addr,file,line,flags
+.endm
 #else /* !__ASSEMBLY__ */
 #define _EMIT_BUG_ENTRY
+#define _EMIT_WARN_ENTRY
 #endif
 #endif /* CONFIG_BUG */
 
diff --git a/arch/powerpc/include/asm/extable.h b/arch/powerpc/include/asm/extable.h
index eb91b2d2935a..26ce2e5c0fa8 100644
--- a/arch/powerpc/include/asm/extable.h
+++ b/arch/powerpc/include/asm/extable.h
@@ -17,6 +17,8 @@
 
 #define ARCH_HAS_RELATIVE_EXTABLE
 
+#ifndef __ASSEMBLY__
+
 struct exception_table_entry {
 	int insn;
 	int fixup;
@@ -28,3 +30,15 @@ static inline unsigned long extable_fixup(const struct exception_table_entry *x)
 }
 
 #endif
+
+/*
+ * Helper macro for exception table entries
+ */
+#define EX_TABLE(_fault, _target)		\
+	stringify_in_c(.section __ex_table,"a";)\
+	stringify_in_c(.balign 4;)		\
+	stringify_in_c(.long (_fault) - . ;)	\
+	stringify_in_c(.long (_target) - . ;)	\
+	stringify_in_c(.previous)
+
+#endif
diff --git a/arch/powerpc/include/asm/ppc_asm.h b/arch/powerpc/include/asm/ppc_asm.h
index 8998122fc7e2..a74e1561535a 100644
--- a/arch/powerpc/include/asm/ppc_asm.h
+++ b/arch/powerpc/include/asm/ppc_asm.h
@@ -10,6 +10,7 @@
 #include <asm/ppc-opcode.h>
 #include <asm/firmware.h>
 #include <asm/feature-fixups.h>
+#include <asm/extable.h>
 
 #ifdef __ASSEMBLY__
 
@@ -772,16 +773,6 @@ END_FTR_SECTION_NESTED(CPU_FTR_CELL_TB_BUG, CPU_FTR_CELL_TB_BUG, 96)
 
 #endif /*  __ASSEMBLY__ */
 
-/*
- * Helper macro for exception table entries
- */
-#define EX_TABLE(_fault, _target)		\
-	stringify_in_c(.section __ex_table,"a";)\
-	stringify_in_c(.balign 4;)		\
-	stringify_in_c(.long (_fault) - . ;)	\
-	stringify_in_c(.long (_target) - . ;)	\
-	stringify_in_c(.previous)
-
 #ifdef CONFIG_PPC_FSL_BOOK3E
 #define BTB_FLUSH(reg)			\
 	lis reg,BUCSR_INIT@h;		\
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 6c4d9e276c4d..faa64cc90f02 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -835,7 +835,7 @@ _GLOBAL(enter_rtas)
 	 */
 	lbz	r0,PACAIRQSOFTMASK(r13)
 1:	tdeqi	r0,IRQS_ENABLED
-	EMIT_BUG_ENTRY 1b,__FILE__,__LINE__,BUGFLAG_WARNING
+	EMIT_WARN_ENTRY 1b,__FILE__,__LINE__,BUGFLAG_WARNING
 #endif
 
 	/* Hard-disable interrupts */
diff --git a/arch/powerpc/kernel/exceptions-64e.S b/arch/powerpc/kernel/exceptions-64e.S
index e8eb9992a270..3f8c51390a04 100644
--- a/arch/powerpc/kernel/exceptions-64e.S
+++ b/arch/powerpc/kernel/exceptions-64e.S
@@ -1249,7 +1249,7 @@ fast_exception_return:
 	/* The interrupt should not have soft enabled. */
 	lbz	r7,PACAIRQSOFTMASK(r13)
 1:	tdeqi	r7,IRQS_ENABLED
-	EMIT_BUG_ENTRY 1b,__FILE__,__LINE__,BUGFLAG_WARNING
+	EMIT_WARN_ENTRY 1b,__FILE__,__LINE__,BUGFLAG_WARNING
 #endif
 	b	fast_exception_return
 
diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S
index 6a076bef2932..21390f3119a9 100644
--- a/arch/powerpc/kernel/misc_32.S
+++ b/arch/powerpc/kernel/misc_32.S
@@ -237,7 +237,7 @@ _GLOBAL(copy_page)
 	addi	r3,r3,-4
 
 0:	twnei	r5, 0	/* WARN if r3 is not cache aligned */
-	EMIT_BUG_ENTRY 0b,__FILE__,__LINE__, BUGFLAG_WARNING
+	EMIT_WARN_ENTRY 0b,__FILE__,__LINE__, BUGFLAG_WARNING
 
 	addi	r4,r4,-4
 
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index efba99870691..ee40a637313d 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -1467,8 +1467,13 @@ static void do_program_check(struct pt_regs *regs)
 
 		if (!(regs->msr & MSR_PR) &&  /* not user-mode */
 		    report_bug(bugaddr, regs) == BUG_TRAP_TYPE_WARN) {
-			regs->nip += 4;
-			return;
+			const struct exception_table_entry *entry;
+
+			entry = search_exception_tables(bugaddr);
+			if (entry) {
+				regs->nip = extable_fixup(entry) + regs->nip - bugaddr;
+				return;
+			}
 		}
 		_exception(SIGTRAP, regs, TRAP_BRKPT, regs->nip);
 		return;
diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
index 24725e50c7b4..34745f239208 100644
--- a/scripts/mod/modpost.c
+++ b/scripts/mod/modpost.c
@@ -926,7 +926,7 @@ static void check_section(const char *modname, struct elf_info *elf,
 		".kprobes.text", ".cpuidle.text", ".noinstr.text"
 #define OTHER_TEXT_SECTIONS ".ref.text", ".head.text", ".spinlock.text", \
 		".fixup", ".entry.text", ".exception.text", ".text.*", \
-		".coldtext"
+		".coldtext .softirqentry.text"
 
 #define INIT_SECTIONS      ".init.*"
 #define MEM_INIT_SECTIONS  ".meminit.*"
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto
@ 2021-04-13 16:38   ` Christophe Leroy
  0 siblings, 0 replies; 48+ messages in thread
From: Christophe Leroy @ 2021-04-13 16:38 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
  Cc: linuxppc-dev, linux-kernel

Using asm goto in __WARN_FLAGS() and WARN_ON() allows more
flexibility to GCC.

For that add an entry to the exception table so that
program_check_exception() knowns where to resume execution
after a WARNING.

Here are two exemples. The first one is done on PPC32 (which
benefits from the previous patch), the second is on PPC64.

	unsigned long test(struct pt_regs *regs)
	{
		int ret;

		WARN_ON(regs->msr & MSR_PR);

		return regs->gpr[3];
	}

	unsigned long test9w(unsigned long a, unsigned long b)
	{
		if (WARN_ON(!b))
			return 0;
		return a / b;
	}

Before the patch:

	000003a8 <test>:
	 3a8:	81 23 00 84 	lwz     r9,132(r3)
	 3ac:	71 29 40 00 	andi.   r9,r9,16384
	 3b0:	40 82 00 0c 	bne     3bc <test+0x14>
	 3b4:	80 63 00 0c 	lwz     r3,12(r3)
	 3b8:	4e 80 00 20 	blr

	 3bc:	0f e0 00 00 	twui    r0,0
	 3c0:	80 63 00 0c 	lwz     r3,12(r3)
	 3c4:	4e 80 00 20 	blr

	0000000000000bf0 <.test9w>:
	 bf0:	7c 89 00 74 	cntlzd  r9,r4
	 bf4:	79 29 d1 82 	rldicl  r9,r9,58,6
	 bf8:	0b 09 00 00 	tdnei   r9,0
	 bfc:	2c 24 00 00 	cmpdi   r4,0
	 c00:	41 82 00 0c 	beq     c0c <.test9w+0x1c>
	 c04:	7c 63 23 92 	divdu   r3,r3,r4
	 c08:	4e 80 00 20 	blr

	 c0c:	38 60 00 00 	li      r3,0
	 c10:	4e 80 00 20 	blr

After the patch:

	000003a8 <test>:
	 3a8:	81 23 00 84 	lwz     r9,132(r3)
	 3ac:	71 29 40 00 	andi.   r9,r9,16384
	 3b0:	40 82 00 0c 	bne     3bc <test+0x14>
	 3b4:	80 63 00 0c 	lwz     r3,12(r3)
	 3b8:	4e 80 00 20 	blr

	 3bc:	0f e0 00 00 	twui    r0,0

	0000000000000c50 <.test9w>:
	 c50:	7c 89 00 74 	cntlzd  r9,r4
	 c54:	79 29 d1 82 	rldicl  r9,r9,58,6
	 c58:	0b 09 00 00 	tdnei   r9,0
	 c5c:	7c 63 23 92 	divdu   r3,r3,r4
	 c60:	4e 80 00 20 	blr

	 c70:	38 60 00 00 	li      r3,0
	 c74:	4e 80 00 20 	blr

In the first exemple, we see GCC doesn't need to duplicate what
happens after the trap.

In the second exemple, we see that GCC doesn't need to emit a test
and a branch in the likely path in addition to the trap.

We've got some WARN_ON() in .softirqentry.text section so it needs
to be added in the OTHER_TEXT_SECTIONS in modpost.c

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
v2: Fix build failure when CONFIG_BUG is not selected.
---
 arch/powerpc/include/asm/book3s/64/kup.h |  2 +-
 arch/powerpc/include/asm/bug.h           | 54 ++++++++++++++++++++----
 arch/powerpc/include/asm/extable.h       | 14 ++++++
 arch/powerpc/include/asm/ppc_asm.h       | 11 +----
 arch/powerpc/kernel/entry_64.S           |  2 +-
 arch/powerpc/kernel/exceptions-64e.S     |  2 +-
 arch/powerpc/kernel/misc_32.S            |  2 +-
 arch/powerpc/kernel/traps.c              |  9 +++-
 scripts/mod/modpost.c                    |  2 +-
 9 files changed, 72 insertions(+), 26 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/kup.h b/arch/powerpc/include/asm/book3s/64/kup.h
index 9700da3a4093..a22839cba32e 100644
--- a/arch/powerpc/include/asm/book3s/64/kup.h
+++ b/arch/powerpc/include/asm/book3s/64/kup.h
@@ -90,7 +90,7 @@
 	/* Prevent access to userspace using any key values */
 	LOAD_REG_IMMEDIATE(\gpr2, AMR_KUAP_BLOCKED)
 999:	tdne	\gpr1, \gpr2
-	EMIT_BUG_ENTRY 999b, __FILE__, __LINE__, (BUGFLAG_WARNING | BUGFLAG_ONCE)
+	EMIT_WARN_ENTRY 999b, __FILE__, __LINE__, (BUGFLAG_WARNING | BUGFLAG_ONCE)
 	END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_BOOK3S_KUAP, 67)
 #endif
 .endm
diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
index 101dea4eec8d..e22dc503fb2f 100644
--- a/arch/powerpc/include/asm/bug.h
+++ b/arch/powerpc/include/asm/bug.h
@@ -4,6 +4,7 @@
 #ifdef __KERNEL__
 
 #include <asm/asm-compat.h>
+#include <asm/extable.h>
 
 #ifdef CONFIG_BUG
 
@@ -30,6 +31,11 @@
 .endm
 #endif /* verbose */
 
+.macro EMIT_WARN_ENTRY addr,file,line,flags
+	EX_TABLE(\addr,\addr+4)
+	EMIT_BUG_ENTRY \addr,\file,\line,\flags
+.endm
+
 #else /* !__ASSEMBLY__ */
 /* _EMIT_BUG_ENTRY expects args %0,%1,%2,%3 to be FILE, LINE, flags and
    sizeof(struct bug_entry), respectively */
@@ -58,6 +64,16 @@
 		  "i" (sizeof(struct bug_entry)),	\
 		  ##__VA_ARGS__)
 
+#define WARN_ENTRY(insn, flags, label, ...)		\
+	asm_volatile_goto(				\
+		"1:	" insn "\n"			\
+		EX_TABLE(1b, %l[label])			\
+		_EMIT_BUG_ENTRY				\
+		: : "i" (__FILE__), "i" (__LINE__),	\
+		  "i" (flags),				\
+		  "i" (sizeof(struct bug_entry)),	\
+		  ##__VA_ARGS__ : : label)
+
 /*
  * BUG_ON() and WARN_ON() do their best to cooperate with compile-time
  * optimisations. However depending on the complexity of the condition
@@ -70,7 +86,15 @@
 } while (0)
 #define HAVE_ARCH_BUG
 
-#define __WARN_FLAGS(flags) BUG_ENTRY("twi 31, 0, 0", BUGFLAG_WARNING | (flags))
+#define __WARN_FLAGS(flags) do {				\
+	__label__ __label_warn_on;				\
+								\
+	WARN_ENTRY("twi 31, 0, 0", BUGFLAG_WARNING | (flags), __label_warn_on); \
+	unreachable();						\
+								\
+__label_warn_on:						\
+	break;							\
+} while (0)
 
 #ifdef CONFIG_PPC64
 #define BUG_ON(x) do {						\
@@ -83,15 +107,24 @@
 } while (0)
 
 #define WARN_ON(x) ({						\
-	int __ret_warn_on = !!(x);				\
-	if (__builtin_constant_p(__ret_warn_on)) {		\
-		if (__ret_warn_on)				\
+	bool __ret_warn_on = false;				\
+	do {							\
+		if (__builtin_constant_p((x))) {		\
+			if (!(x)) 				\
+				break; 				\
 			__WARN();				\
-	} else {						\
-		BUG_ENTRY(PPC_TLNEI " %4, 0",			\
-			  BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN),	\
-			  "r" (__ret_warn_on));	\
-	}							\
+			__ret_warn_on = true;			\
+		} else {					\
+			__label__ __label_warn_on;		\
+								\
+			WARN_ENTRY(PPC_TLNEI " %4, 0",		\
+				   BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN),	\
+				   __label_warn_on, "r" (x));	\
+			break;					\
+__label_warn_on:						\
+			__ret_warn_on = true;			\
+		}						\
+	} while (0);						\
 	unlikely(__ret_warn_on);				\
 })
 
@@ -104,8 +137,11 @@
 #ifdef __ASSEMBLY__
 .macro EMIT_BUG_ENTRY addr,file,line,flags
 .endm
+.macro EMIT_WARN_ENTRY addr,file,line,flags
+.endm
 #else /* !__ASSEMBLY__ */
 #define _EMIT_BUG_ENTRY
+#define _EMIT_WARN_ENTRY
 #endif
 #endif /* CONFIG_BUG */
 
diff --git a/arch/powerpc/include/asm/extable.h b/arch/powerpc/include/asm/extable.h
index eb91b2d2935a..26ce2e5c0fa8 100644
--- a/arch/powerpc/include/asm/extable.h
+++ b/arch/powerpc/include/asm/extable.h
@@ -17,6 +17,8 @@
 
 #define ARCH_HAS_RELATIVE_EXTABLE
 
+#ifndef __ASSEMBLY__
+
 struct exception_table_entry {
 	int insn;
 	int fixup;
@@ -28,3 +30,15 @@ static inline unsigned long extable_fixup(const struct exception_table_entry *x)
 }
 
 #endif
+
+/*
+ * Helper macro for exception table entries
+ */
+#define EX_TABLE(_fault, _target)		\
+	stringify_in_c(.section __ex_table,"a";)\
+	stringify_in_c(.balign 4;)		\
+	stringify_in_c(.long (_fault) - . ;)	\
+	stringify_in_c(.long (_target) - . ;)	\
+	stringify_in_c(.previous)
+
+#endif
diff --git a/arch/powerpc/include/asm/ppc_asm.h b/arch/powerpc/include/asm/ppc_asm.h
index 8998122fc7e2..a74e1561535a 100644
--- a/arch/powerpc/include/asm/ppc_asm.h
+++ b/arch/powerpc/include/asm/ppc_asm.h
@@ -10,6 +10,7 @@
 #include <asm/ppc-opcode.h>
 #include <asm/firmware.h>
 #include <asm/feature-fixups.h>
+#include <asm/extable.h>
 
 #ifdef __ASSEMBLY__
 
@@ -772,16 +773,6 @@ END_FTR_SECTION_NESTED(CPU_FTR_CELL_TB_BUG, CPU_FTR_CELL_TB_BUG, 96)
 
 #endif /*  __ASSEMBLY__ */
 
-/*
- * Helper macro for exception table entries
- */
-#define EX_TABLE(_fault, _target)		\
-	stringify_in_c(.section __ex_table,"a";)\
-	stringify_in_c(.balign 4;)		\
-	stringify_in_c(.long (_fault) - . ;)	\
-	stringify_in_c(.long (_target) - . ;)	\
-	stringify_in_c(.previous)
-
 #ifdef CONFIG_PPC_FSL_BOOK3E
 #define BTB_FLUSH(reg)			\
 	lis reg,BUCSR_INIT@h;		\
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 6c4d9e276c4d..faa64cc90f02 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -835,7 +835,7 @@ _GLOBAL(enter_rtas)
 	 */
 	lbz	r0,PACAIRQSOFTMASK(r13)
 1:	tdeqi	r0,IRQS_ENABLED
-	EMIT_BUG_ENTRY 1b,__FILE__,__LINE__,BUGFLAG_WARNING
+	EMIT_WARN_ENTRY 1b,__FILE__,__LINE__,BUGFLAG_WARNING
 #endif
 
 	/* Hard-disable interrupts */
diff --git a/arch/powerpc/kernel/exceptions-64e.S b/arch/powerpc/kernel/exceptions-64e.S
index e8eb9992a270..3f8c51390a04 100644
--- a/arch/powerpc/kernel/exceptions-64e.S
+++ b/arch/powerpc/kernel/exceptions-64e.S
@@ -1249,7 +1249,7 @@ fast_exception_return:
 	/* The interrupt should not have soft enabled. */
 	lbz	r7,PACAIRQSOFTMASK(r13)
 1:	tdeqi	r7,IRQS_ENABLED
-	EMIT_BUG_ENTRY 1b,__FILE__,__LINE__,BUGFLAG_WARNING
+	EMIT_WARN_ENTRY 1b,__FILE__,__LINE__,BUGFLAG_WARNING
 #endif
 	b	fast_exception_return
 
diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S
index 6a076bef2932..21390f3119a9 100644
--- a/arch/powerpc/kernel/misc_32.S
+++ b/arch/powerpc/kernel/misc_32.S
@@ -237,7 +237,7 @@ _GLOBAL(copy_page)
 	addi	r3,r3,-4
 
 0:	twnei	r5, 0	/* WARN if r3 is not cache aligned */
-	EMIT_BUG_ENTRY 0b,__FILE__,__LINE__, BUGFLAG_WARNING
+	EMIT_WARN_ENTRY 0b,__FILE__,__LINE__, BUGFLAG_WARNING
 
 	addi	r4,r4,-4
 
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index efba99870691..ee40a637313d 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -1467,8 +1467,13 @@ static void do_program_check(struct pt_regs *regs)
 
 		if (!(regs->msr & MSR_PR) &&  /* not user-mode */
 		    report_bug(bugaddr, regs) == BUG_TRAP_TYPE_WARN) {
-			regs->nip += 4;
-			return;
+			const struct exception_table_entry *entry;
+
+			entry = search_exception_tables(bugaddr);
+			if (entry) {
+				regs->nip = extable_fixup(entry) + regs->nip - bugaddr;
+				return;
+			}
 		}
 		_exception(SIGTRAP, regs, TRAP_BRKPT, regs->nip);
 		return;
diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
index 24725e50c7b4..34745f239208 100644
--- a/scripts/mod/modpost.c
+++ b/scripts/mod/modpost.c
@@ -926,7 +926,7 @@ static void check_section(const char *modname, struct elf_info *elf,
 		".kprobes.text", ".cpuidle.text", ".noinstr.text"
 #define OTHER_TEXT_SECTIONS ".ref.text", ".head.text", ".spinlock.text", \
 		".fixup", ".entry.text", ".exception.text", ".text.*", \
-		".coldtext"
+		".coldtext .softirqentry.text"
 
 #define INIT_SECTIONS      ".init.*"
 #define MEM_INIT_SECTIONS  ".meminit.*"
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32
  2021-04-13 16:38 ` Christophe Leroy
@ 2021-08-13  6:08   ` Nicholas Piggin
  -1 siblings, 0 replies; 48+ messages in thread
From: Nicholas Piggin @ 2021-08-13  6:08 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Christophe Leroy, Michael Ellerman,
	Paul Mackerras
  Cc: linux-kernel, linuxppc-dev

Excerpts from Christophe Leroy's message of April 14, 2021 2:38 am:
> powerpc BUG_ON() and WARN_ON() are based on using twnei instruction.
> 
> For catching simple conditions like a variable having value 0, this
> is efficient because it does the test and the trap at the same time.
> But most conditions used with BUG_ON or WARN_ON are more complex and
> forces GCC to format the condition into a 0 or 1 value in a register.
> This will usually require 2 to 3 instructions.
> 
> The most efficient solution would be to use __builtin_trap() because
> GCC is able to optimise the use of the different trap instructions
> based on the requested condition, but this is complex if not
> impossible for the following reasons:
> - __builtin_trap() is a non-recoverable instruction, so it can't be
> used for WARN_ON
> - Knowing which line of code generated the trap would require the
> analysis of DWARF information. This is not a feature we have today.
> 
> As mentioned in commit 8d4fbcfbe0a4 ("Fix WARN_ON() on bitfield ops")
> the way WARN_ON() is implemented is suboptimal. That commit also
> mentions an issue with 'long long' condition. It fixed it for
> WARN_ON() but the same problem still exists today with BUG_ON() on
> PPC32. It will be fixed by using the generic implementation.
> 
> By using the generic implementation, gcc will naturally generate a
> branch to the unconditional trap generated by BUG().
> 
> As modern powerpc implement zero-cycle branch,
> that's even more efficient.
> 
> And for the functions using WARN_ON() and its return, the test
> on return from WARN_ON() is now also used for the WARN_ON() itself.
> 
> On PPC64 we don't want it because we want to be able to use CFAR
> register to track how we entered the code that trapped. The CFAR
> register would be clobbered by the branch.
> 
> A simple test function:
> 
> 	unsigned long test9w(unsigned long a, unsigned long b)
> 	{
> 		if (WARN_ON(!b))
> 			return 0;
> 		return a / b;
> 	}
> 
> Before the patch:
> 
> 	0000046c <test9w>:
> 	 46c:	7c 89 00 34 	cntlzw  r9,r4
> 	 470:	55 29 d9 7e 	rlwinm  r9,r9,27,5,31
> 	 474:	0f 09 00 00 	twnei   r9,0
> 	 478:	2c 04 00 00 	cmpwi   r4,0
> 	 47c:	41 82 00 0c 	beq     488 <test9w+0x1c>
> 	 480:	7c 63 23 96 	divwu   r3,r3,r4
> 	 484:	4e 80 00 20 	blr
> 
> 	 488:	38 60 00 00 	li      r3,0
> 	 48c:	4e 80 00 20 	blr
> 
> After the patch:
> 
> 	00000468 <test9w>:
> 	 468:	2c 04 00 00 	cmpwi   r4,0
> 	 46c:	41 82 00 0c 	beq     478 <test9w+0x10>
> 	 470:	7c 63 23 96 	divwu   r3,r3,r4
> 	 474:	4e 80 00 20 	blr
> 
> 	 478:	0f e0 00 00 	twui    r0,0
> 	 47c:	38 60 00 00 	li      r3,0
> 	 480:	4e 80 00 20 	blr

That's clearly better because we have a branch anyway.

> 
> So we see before the patch we need 3 instructions on the likely path
> to handle the WARN_ON(). With the patch the trap goes on the unlikely
> path.
> 
> See below the difference at the entry of system_call_exception where
> we have several BUG_ON(), allthough less impressing.
> 
> With the patch:
> 
> 	00000000 <system_call_exception>:
> 	   0:	81 6a 00 84 	lwz     r11,132(r10)
> 	   4:	90 6a 00 88 	stw     r3,136(r10)
> 	   8:	71 60 00 02 	andi.   r0,r11,2
> 	   c:	41 82 00 70 	beq     7c <system_call_exception+0x7c>
> 	  10:	71 60 40 00 	andi.   r0,r11,16384
> 	  14:	41 82 00 6c 	beq     80 <system_call_exception+0x80>
> 	  18:	71 6b 80 00 	andi.   r11,r11,32768
> 	  1c:	41 82 00 68 	beq     84 <system_call_exception+0x84>
> 	  20:	94 21 ff e0 	stwu    r1,-32(r1)
> 	  24:	93 e1 00 1c 	stw     r31,28(r1)
> 	  28:	7d 8c 42 e6 	mftb    r12
> 	...
> 	  7c:	0f e0 00 00 	twui    r0,0
> 	  80:	0f e0 00 00 	twui    r0,0
> 	  84:	0f e0 00 00 	twui    r0,0
> 
> Without the patch:
> 
> 	00000000 <system_call_exception>:
> 	   0:	94 21 ff e0 	stwu    r1,-32(r1)
> 	   4:	93 e1 00 1c 	stw     r31,28(r1)
> 	   8:	90 6a 00 88 	stw     r3,136(r10)
> 	   c:	81 6a 00 84 	lwz     r11,132(r10)
> 	  10:	69 60 00 02 	xori    r0,r11,2
> 	  14:	54 00 ff fe 	rlwinm  r0,r0,31,31,31
> 	  18:	0f 00 00 00 	twnei   r0,0
> 	  1c:	69 60 40 00 	xori    r0,r11,16384
> 	  20:	54 00 97 fe 	rlwinm  r0,r0,18,31,31
> 	  24:	0f 00 00 00 	twnei   r0,0
> 	  28:	69 6b 80 00 	xori    r11,r11,32768
> 	  2c:	55 6b 8f fe 	rlwinm  r11,r11,17,31,31
> 	  30:	0f 0b 00 00 	twnei   r11,0
> 	  34:	7d 8c 42 e6 	mftb    r12

This one possibly the branches end up in predictors, whereas conditional 
trap is always just speculated not to hit. Branches may also have a
throughput limit on execution whereas trap could be more (1 per cycle
vs 4 per cycle on POWER9).

On typical ppc32 CPUs, maybe it's a more obvious win. As you say there
is the CFAR issue as well which makes it a problem for 64s. It would
have been nice if it could use the same code though.

Maybe one day gcc's __builtin_trap() will become smart enough around
conditional statements that it it generates better code and tries to
avoid branches.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32
@ 2021-08-13  6:08   ` Nicholas Piggin
  0 siblings, 0 replies; 48+ messages in thread
From: Nicholas Piggin @ 2021-08-13  6:08 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Christophe Leroy, Michael Ellerman,
	Paul Mackerras
  Cc: linuxppc-dev, linux-kernel

Excerpts from Christophe Leroy's message of April 14, 2021 2:38 am:
> powerpc BUG_ON() and WARN_ON() are based on using twnei instruction.
> 
> For catching simple conditions like a variable having value 0, this
> is efficient because it does the test and the trap at the same time.
> But most conditions used with BUG_ON or WARN_ON are more complex and
> forces GCC to format the condition into a 0 or 1 value in a register.
> This will usually require 2 to 3 instructions.
> 
> The most efficient solution would be to use __builtin_trap() because
> GCC is able to optimise the use of the different trap instructions
> based on the requested condition, but this is complex if not
> impossible for the following reasons:
> - __builtin_trap() is a non-recoverable instruction, so it can't be
> used for WARN_ON
> - Knowing which line of code generated the trap would require the
> analysis of DWARF information. This is not a feature we have today.
> 
> As mentioned in commit 8d4fbcfbe0a4 ("Fix WARN_ON() on bitfield ops")
> the way WARN_ON() is implemented is suboptimal. That commit also
> mentions an issue with 'long long' condition. It fixed it for
> WARN_ON() but the same problem still exists today with BUG_ON() on
> PPC32. It will be fixed by using the generic implementation.
> 
> By using the generic implementation, gcc will naturally generate a
> branch to the unconditional trap generated by BUG().
> 
> As modern powerpc implement zero-cycle branch,
> that's even more efficient.
> 
> And for the functions using WARN_ON() and its return, the test
> on return from WARN_ON() is now also used for the WARN_ON() itself.
> 
> On PPC64 we don't want it because we want to be able to use CFAR
> register to track how we entered the code that trapped. The CFAR
> register would be clobbered by the branch.
> 
> A simple test function:
> 
> 	unsigned long test9w(unsigned long a, unsigned long b)
> 	{
> 		if (WARN_ON(!b))
> 			return 0;
> 		return a / b;
> 	}
> 
> Before the patch:
> 
> 	0000046c <test9w>:
> 	 46c:	7c 89 00 34 	cntlzw  r9,r4
> 	 470:	55 29 d9 7e 	rlwinm  r9,r9,27,5,31
> 	 474:	0f 09 00 00 	twnei   r9,0
> 	 478:	2c 04 00 00 	cmpwi   r4,0
> 	 47c:	41 82 00 0c 	beq     488 <test9w+0x1c>
> 	 480:	7c 63 23 96 	divwu   r3,r3,r4
> 	 484:	4e 80 00 20 	blr
> 
> 	 488:	38 60 00 00 	li      r3,0
> 	 48c:	4e 80 00 20 	blr
> 
> After the patch:
> 
> 	00000468 <test9w>:
> 	 468:	2c 04 00 00 	cmpwi   r4,0
> 	 46c:	41 82 00 0c 	beq     478 <test9w+0x10>
> 	 470:	7c 63 23 96 	divwu   r3,r3,r4
> 	 474:	4e 80 00 20 	blr
> 
> 	 478:	0f e0 00 00 	twui    r0,0
> 	 47c:	38 60 00 00 	li      r3,0
> 	 480:	4e 80 00 20 	blr

That's clearly better because we have a branch anyway.

> 
> So we see before the patch we need 3 instructions on the likely path
> to handle the WARN_ON(). With the patch the trap goes on the unlikely
> path.
> 
> See below the difference at the entry of system_call_exception where
> we have several BUG_ON(), allthough less impressing.
> 
> With the patch:
> 
> 	00000000 <system_call_exception>:
> 	   0:	81 6a 00 84 	lwz     r11,132(r10)
> 	   4:	90 6a 00 88 	stw     r3,136(r10)
> 	   8:	71 60 00 02 	andi.   r0,r11,2
> 	   c:	41 82 00 70 	beq     7c <system_call_exception+0x7c>
> 	  10:	71 60 40 00 	andi.   r0,r11,16384
> 	  14:	41 82 00 6c 	beq     80 <system_call_exception+0x80>
> 	  18:	71 6b 80 00 	andi.   r11,r11,32768
> 	  1c:	41 82 00 68 	beq     84 <system_call_exception+0x84>
> 	  20:	94 21 ff e0 	stwu    r1,-32(r1)
> 	  24:	93 e1 00 1c 	stw     r31,28(r1)
> 	  28:	7d 8c 42 e6 	mftb    r12
> 	...
> 	  7c:	0f e0 00 00 	twui    r0,0
> 	  80:	0f e0 00 00 	twui    r0,0
> 	  84:	0f e0 00 00 	twui    r0,0
> 
> Without the patch:
> 
> 	00000000 <system_call_exception>:
> 	   0:	94 21 ff e0 	stwu    r1,-32(r1)
> 	   4:	93 e1 00 1c 	stw     r31,28(r1)
> 	   8:	90 6a 00 88 	stw     r3,136(r10)
> 	   c:	81 6a 00 84 	lwz     r11,132(r10)
> 	  10:	69 60 00 02 	xori    r0,r11,2
> 	  14:	54 00 ff fe 	rlwinm  r0,r0,31,31,31
> 	  18:	0f 00 00 00 	twnei   r0,0
> 	  1c:	69 60 40 00 	xori    r0,r11,16384
> 	  20:	54 00 97 fe 	rlwinm  r0,r0,18,31,31
> 	  24:	0f 00 00 00 	twnei   r0,0
> 	  28:	69 6b 80 00 	xori    r11,r11,32768
> 	  2c:	55 6b 8f fe 	rlwinm  r11,r11,17,31,31
> 	  30:	0f 0b 00 00 	twnei   r11,0
> 	  34:	7d 8c 42 e6 	mftb    r12

This one possibly the branches end up in predictors, whereas conditional 
trap is always just speculated not to hit. Branches may also have a
throughput limit on execution whereas trap could be more (1 per cycle
vs 4 per cycle on POWER9).

On typical ppc32 CPUs, maybe it's a more obvious win. As you say there
is the CFAR issue as well which makes it a problem for 64s. It would
have been nice if it could use the same code though.

Maybe one day gcc's __builtin_trap() will become smart enough around
conditional statements that it it generates better code and tries to
avoid branches.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto
  2021-04-13 16:38   ` Christophe Leroy
@ 2021-08-13  6:19     ` Nicholas Piggin
  -1 siblings, 0 replies; 48+ messages in thread
From: Nicholas Piggin @ 2021-08-13  6:19 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Christophe Leroy, Michael Ellerman,
	Paul Mackerras
  Cc: linux-kernel, linuxppc-dev

Excerpts from Christophe Leroy's message of April 14, 2021 2:38 am:
> Using asm goto in __WARN_FLAGS() and WARN_ON() allows more
> flexibility to GCC.
> 
> For that add an entry to the exception table so that
> program_check_exception() knowns where to resume execution
> after a WARNING.

Nice idea. How much does it bloat the exception table?
How easy would it be to make a different exception table for
unimportant stuff like WARN_ON faults?

> 
> Here are two exemples. The first one is done on PPC32 (which
> benefits from the previous patch), the second is on PPC64.
> 
> 	unsigned long test(struct pt_regs *regs)
> 	{
> 		int ret;
> 
> 		WARN_ON(regs->msr & MSR_PR);
> 
> 		return regs->gpr[3];
> 	}
> 
> 	unsigned long test9w(unsigned long a, unsigned long b)
> 	{
> 		if (WARN_ON(!b))
> 			return 0;
> 		return a / b;
> 	}
> 
> Before the patch:
> 
> 	000003a8 <test>:
> 	 3a8:	81 23 00 84 	lwz     r9,132(r3)
> 	 3ac:	71 29 40 00 	andi.   r9,r9,16384
> 	 3b0:	40 82 00 0c 	bne     3bc <test+0x14>
> 	 3b4:	80 63 00 0c 	lwz     r3,12(r3)
> 	 3b8:	4e 80 00 20 	blr
> 
> 	 3bc:	0f e0 00 00 	twui    r0,0
> 	 3c0:	80 63 00 0c 	lwz     r3,12(r3)
> 	 3c4:	4e 80 00 20 	blr
> 
> 	0000000000000bf0 <.test9w>:
> 	 bf0:	7c 89 00 74 	cntlzd  r9,r4
> 	 bf4:	79 29 d1 82 	rldicl  r9,r9,58,6
> 	 bf8:	0b 09 00 00 	tdnei   r9,0
> 	 bfc:	2c 24 00 00 	cmpdi   r4,0
> 	 c00:	41 82 00 0c 	beq     c0c <.test9w+0x1c>
> 	 c04:	7c 63 23 92 	divdu   r3,r3,r4
> 	 c08:	4e 80 00 20 	blr
> 
> 	 c0c:	38 60 00 00 	li      r3,0
> 	 c10:	4e 80 00 20 	blr
> 
> After the patch:
> 
> 	000003a8 <test>:
> 	 3a8:	81 23 00 84 	lwz     r9,132(r3)
> 	 3ac:	71 29 40 00 	andi.   r9,r9,16384
> 	 3b0:	40 82 00 0c 	bne     3bc <test+0x14>
> 	 3b4:	80 63 00 0c 	lwz     r3,12(r3)
> 	 3b8:	4e 80 00 20 	blr
> 
> 	 3bc:	0f e0 00 00 	twui    r0,0
> 
> 	0000000000000c50 <.test9w>:
> 	 c50:	7c 89 00 74 	cntlzd  r9,r4
> 	 c54:	79 29 d1 82 	rldicl  r9,r9,58,6
> 	 c58:	0b 09 00 00 	tdnei   r9,0
> 	 c5c:	7c 63 23 92 	divdu   r3,r3,r4
> 	 c60:	4e 80 00 20 	blr
> 
> 	 c70:	38 60 00 00 	li      r3,0
> 	 c74:	4e 80 00 20 	blr

One thing that would be nice for WARN_ON_ONCE is to patch out the trap
after the program interrupt. I've seen sometimes the WARN_ON_ONCE starts
firing a lot and slows down the kernel. That gets harder to do if the
trap is now part of the control flow.

I guess that's less important case for performance though. And in theory
you might be able to replace the trap with an equivalent cmp and branch
(but that's probably going too over engineering it).

Thanks,
Nick


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto
@ 2021-08-13  6:19     ` Nicholas Piggin
  0 siblings, 0 replies; 48+ messages in thread
From: Nicholas Piggin @ 2021-08-13  6:19 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Christophe Leroy, Michael Ellerman,
	Paul Mackerras
  Cc: linuxppc-dev, linux-kernel

Excerpts from Christophe Leroy's message of April 14, 2021 2:38 am:
> Using asm goto in __WARN_FLAGS() and WARN_ON() allows more
> flexibility to GCC.
> 
> For that add an entry to the exception table so that
> program_check_exception() knowns where to resume execution
> after a WARNING.

Nice idea. How much does it bloat the exception table?
How easy would it be to make a different exception table for
unimportant stuff like WARN_ON faults?

> 
> Here are two exemples. The first one is done on PPC32 (which
> benefits from the previous patch), the second is on PPC64.
> 
> 	unsigned long test(struct pt_regs *regs)
> 	{
> 		int ret;
> 
> 		WARN_ON(regs->msr & MSR_PR);
> 
> 		return regs->gpr[3];
> 	}
> 
> 	unsigned long test9w(unsigned long a, unsigned long b)
> 	{
> 		if (WARN_ON(!b))
> 			return 0;
> 		return a / b;
> 	}
> 
> Before the patch:
> 
> 	000003a8 <test>:
> 	 3a8:	81 23 00 84 	lwz     r9,132(r3)
> 	 3ac:	71 29 40 00 	andi.   r9,r9,16384
> 	 3b0:	40 82 00 0c 	bne     3bc <test+0x14>
> 	 3b4:	80 63 00 0c 	lwz     r3,12(r3)
> 	 3b8:	4e 80 00 20 	blr
> 
> 	 3bc:	0f e0 00 00 	twui    r0,0
> 	 3c0:	80 63 00 0c 	lwz     r3,12(r3)
> 	 3c4:	4e 80 00 20 	blr
> 
> 	0000000000000bf0 <.test9w>:
> 	 bf0:	7c 89 00 74 	cntlzd  r9,r4
> 	 bf4:	79 29 d1 82 	rldicl  r9,r9,58,6
> 	 bf8:	0b 09 00 00 	tdnei   r9,0
> 	 bfc:	2c 24 00 00 	cmpdi   r4,0
> 	 c00:	41 82 00 0c 	beq     c0c <.test9w+0x1c>
> 	 c04:	7c 63 23 92 	divdu   r3,r3,r4
> 	 c08:	4e 80 00 20 	blr
> 
> 	 c0c:	38 60 00 00 	li      r3,0
> 	 c10:	4e 80 00 20 	blr
> 
> After the patch:
> 
> 	000003a8 <test>:
> 	 3a8:	81 23 00 84 	lwz     r9,132(r3)
> 	 3ac:	71 29 40 00 	andi.   r9,r9,16384
> 	 3b0:	40 82 00 0c 	bne     3bc <test+0x14>
> 	 3b4:	80 63 00 0c 	lwz     r3,12(r3)
> 	 3b8:	4e 80 00 20 	blr
> 
> 	 3bc:	0f e0 00 00 	twui    r0,0
> 
> 	0000000000000c50 <.test9w>:
> 	 c50:	7c 89 00 74 	cntlzd  r9,r4
> 	 c54:	79 29 d1 82 	rldicl  r9,r9,58,6
> 	 c58:	0b 09 00 00 	tdnei   r9,0
> 	 c5c:	7c 63 23 92 	divdu   r3,r3,r4
> 	 c60:	4e 80 00 20 	blr
> 
> 	 c70:	38 60 00 00 	li      r3,0
> 	 c74:	4e 80 00 20 	blr

One thing that would be nice for WARN_ON_ONCE is to patch out the trap
after the program interrupt. I've seen sometimes the WARN_ON_ONCE starts
firing a lot and slows down the kernel. That gets harder to do if the
trap is now part of the control flow.

I guess that's less important case for performance though. And in theory
you might be able to replace the trap with an equivalent cmp and branch
(but that's probably going too over engineering it).

Thanks,
Nick


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto
  2021-04-13 16:38   ` Christophe Leroy
@ 2021-08-15  3:49     ` Michael Ellerman
  -1 siblings, 0 replies; 48+ messages in thread
From: Michael Ellerman @ 2021-08-15  3:49 UTC (permalink / raw)
  To: Christophe Leroy, Benjamin Herrenschmidt, Paul Mackerras
  Cc: linux-kernel, linuxppc-dev

Christophe Leroy <christophe.leroy@csgroup.eu> writes:
> diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
> index 24725e50c7b4..34745f239208 100644
> --- a/scripts/mod/modpost.c
> +++ b/scripts/mod/modpost.c
> @@ -926,7 +926,7 @@ static void check_section(const char *modname, struct elf_info *elf,
>  		".kprobes.text", ".cpuidle.text", ".noinstr.text"
>  #define OTHER_TEXT_SECTIONS ".ref.text", ".head.text", ".spinlock.text", \
>  		".fixup", ".entry.text", ".exception.text", ".text.*", \
> -		".coldtext"
> +		".coldtext .softirqentry.text"

This wasn't working, I updated it to:

               ".coldtext", ".softirqentry.text"

Which works.

cheers

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto
@ 2021-08-15  3:49     ` Michael Ellerman
  0 siblings, 0 replies; 48+ messages in thread
From: Michael Ellerman @ 2021-08-15  3:49 UTC (permalink / raw)
  To: Christophe Leroy, Benjamin Herrenschmidt, Paul Mackerras
  Cc: linuxppc-dev, linux-kernel

Christophe Leroy <christophe.leroy@csgroup.eu> writes:
> diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
> index 24725e50c7b4..34745f239208 100644
> --- a/scripts/mod/modpost.c
> +++ b/scripts/mod/modpost.c
> @@ -926,7 +926,7 @@ static void check_section(const char *modname, struct elf_info *elf,
>  		".kprobes.text", ".cpuidle.text", ".noinstr.text"
>  #define OTHER_TEXT_SECTIONS ".ref.text", ".head.text", ".spinlock.text", \
>  		".fixup", ".entry.text", ".exception.text", ".text.*", \
> -		".coldtext"
> +		".coldtext .softirqentry.text"

This wasn't working, I updated it to:

               ".coldtext", ".softirqentry.text"

Which works.

cheers

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32
  2021-04-13 16:38 ` Christophe Leroy
@ 2021-08-18 13:38   ` Michael Ellerman
  -1 siblings, 0 replies; 48+ messages in thread
From: Michael Ellerman @ 2021-08-18 13:38 UTC (permalink / raw)
  To: Christophe Leroy, Benjamin Herrenschmidt, Michael Ellerman,
	Paul Mackerras
  Cc: linux-kernel, linuxppc-dev

On Tue, 13 Apr 2021 16:38:09 +0000 (UTC), Christophe Leroy wrote:
> powerpc BUG_ON() and WARN_ON() are based on using twnei instruction.
> 
> For catching simple conditions like a variable having value 0, this
> is efficient because it does the test and the trap at the same time.
> But most conditions used with BUG_ON or WARN_ON are more complex and
> forces GCC to format the condition into a 0 or 1 value in a register.
> This will usually require 2 to 3 instructions.
> 
> [...]

Applied to powerpc/next.

[1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32
      https://git.kernel.org/powerpc/c/db87a7199229b75c9996bf78117eceb81854fce2
[2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto
      https://git.kernel.org/powerpc/c/1e688dd2a3d6759d416616ff07afc4bb836c4213

cheers

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32
@ 2021-08-18 13:38   ` Michael Ellerman
  0 siblings, 0 replies; 48+ messages in thread
From: Michael Ellerman @ 2021-08-18 13:38 UTC (permalink / raw)
  To: Christophe Leroy, Benjamin Herrenschmidt, Michael Ellerman,
	Paul Mackerras
  Cc: linuxppc-dev, linux-kernel

On Tue, 13 Apr 2021 16:38:09 +0000 (UTC), Christophe Leroy wrote:
> powerpc BUG_ON() and WARN_ON() are based on using twnei instruction.
> 
> For catching simple conditions like a variable having value 0, this
> is efficient because it does the test and the trap at the same time.
> But most conditions used with BUG_ON or WARN_ON are more complex and
> forces GCC to format the condition into a 0 or 1 value in a register.
> This will usually require 2 to 3 instructions.
> 
> [...]

Applied to powerpc/next.

[1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32
      https://git.kernel.org/powerpc/c/db87a7199229b75c9996bf78117eceb81854fce2
[2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto
      https://git.kernel.org/powerpc/c/1e688dd2a3d6759d416616ff07afc4bb836c4213

cheers

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32
  2021-08-13  6:08   ` Nicholas Piggin
@ 2021-08-18 15:06     ` Segher Boessenkool
  -1 siblings, 0 replies; 48+ messages in thread
From: Segher Boessenkool @ 2021-08-18 15:06 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: linux-kernel, Paul Mackerras, linuxppc-dev

On Fri, Aug 13, 2021 at 04:08:13PM +1000, Nicholas Piggin wrote:
> This one possibly the branches end up in predictors, whereas conditional 
> trap is always just speculated not to hit. Branches may also have a
> throughput limit on execution whereas trap could be more (1 per cycle
> vs 4 per cycle on POWER9).

I thought only *taken* branches are just one per cycle?  And those
branches are only taken for the exceptional condition (or the case where
we do not care about performance, anyway, if we do have an error most of
the time ;-) )

> On typical ppc32 CPUs, maybe it's a more obvious win. As you say there
> is the CFAR issue as well which makes it a problem for 64s. It would
> have been nice if it could use the same code though.

On 64-bit the code looks better for the no-error path as well.

> Maybe one day gcc's __builtin_trap() will become smart enough around
> conditional statements that it it generates better code and tries to
> avoid branches.

Internally *all* traps are conditional, in GCC.  It also can optimise
them quite well.  There must be something in the kernel macros that
prevents good optimisation.


Segher

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32
@ 2021-08-18 15:06     ` Segher Boessenkool
  0 siblings, 0 replies; 48+ messages in thread
From: Segher Boessenkool @ 2021-08-18 15:06 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Benjamin Herrenschmidt, Christophe Leroy, Michael Ellerman,
	Paul Mackerras, linuxppc-dev, linux-kernel

On Fri, Aug 13, 2021 at 04:08:13PM +1000, Nicholas Piggin wrote:
> This one possibly the branches end up in predictors, whereas conditional 
> trap is always just speculated not to hit. Branches may also have a
> throughput limit on execution whereas trap could be more (1 per cycle
> vs 4 per cycle on POWER9).

I thought only *taken* branches are just one per cycle?  And those
branches are only taken for the exceptional condition (or the case where
we do not care about performance, anyway, if we do have an error most of
the time ;-) )

> On typical ppc32 CPUs, maybe it's a more obvious win. As you say there
> is the CFAR issue as well which makes it a problem for 64s. It would
> have been nice if it could use the same code though.

On 64-bit the code looks better for the no-error path as well.

> Maybe one day gcc's __builtin_trap() will become smart enough around
> conditional statements that it it generates better code and tries to
> avoid branches.

Internally *all* traps are conditional, in GCC.  It also can optimise
them quite well.  There must be something in the kernel macros that
prevents good optimisation.


Segher

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto
  2021-04-13 16:38   ` Christophe Leroy
@ 2021-08-25 21:25     ` Nathan Chancellor
  -1 siblings, 0 replies; 48+ messages in thread
From: Nathan Chancellor @ 2021-08-25 21:25 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	linux-kernel, linuxppc-dev, clang-built-linux, llvm

Hi Christophe,

On Tue, Apr 13, 2021 at 04:38:10PM +0000, Christophe Leroy wrote:
> Using asm goto in __WARN_FLAGS() and WARN_ON() allows more
> flexibility to GCC.
> 
> For that add an entry to the exception table so that
> program_check_exception() knowns where to resume execution
> after a WARNING.
> 
> Here are two exemples. The first one is done on PPC32 (which
> benefits from the previous patch), the second is on PPC64.
> 
> 	unsigned long test(struct pt_regs *regs)
> 	{
> 		int ret;
> 
> 		WARN_ON(regs->msr & MSR_PR);
> 
> 		return regs->gpr[3];
> 	}
> 
> 	unsigned long test9w(unsigned long a, unsigned long b)
> 	{
> 		if (WARN_ON(!b))
> 			return 0;
> 		return a / b;
> 	}
> 
> Before the patch:
> 
> 	000003a8 <test>:
> 	 3a8:	81 23 00 84 	lwz     r9,132(r3)
> 	 3ac:	71 29 40 00 	andi.   r9,r9,16384
> 	 3b0:	40 82 00 0c 	bne     3bc <test+0x14>
> 	 3b4:	80 63 00 0c 	lwz     r3,12(r3)
> 	 3b8:	4e 80 00 20 	blr
> 
> 	 3bc:	0f e0 00 00 	twui    r0,0
> 	 3c0:	80 63 00 0c 	lwz     r3,12(r3)
> 	 3c4:	4e 80 00 20 	blr
> 
> 	0000000000000bf0 <.test9w>:
> 	 bf0:	7c 89 00 74 	cntlzd  r9,r4
> 	 bf4:	79 29 d1 82 	rldicl  r9,r9,58,6
> 	 bf8:	0b 09 00 00 	tdnei   r9,0
> 	 bfc:	2c 24 00 00 	cmpdi   r4,0
> 	 c00:	41 82 00 0c 	beq     c0c <.test9w+0x1c>
> 	 c04:	7c 63 23 92 	divdu   r3,r3,r4
> 	 c08:	4e 80 00 20 	blr
> 
> 	 c0c:	38 60 00 00 	li      r3,0
> 	 c10:	4e 80 00 20 	blr
> 
> After the patch:
> 
> 	000003a8 <test>:
> 	 3a8:	81 23 00 84 	lwz     r9,132(r3)
> 	 3ac:	71 29 40 00 	andi.   r9,r9,16384
> 	 3b0:	40 82 00 0c 	bne     3bc <test+0x14>
> 	 3b4:	80 63 00 0c 	lwz     r3,12(r3)
> 	 3b8:	4e 80 00 20 	blr
> 
> 	 3bc:	0f e0 00 00 	twui    r0,0
> 
> 	0000000000000c50 <.test9w>:
> 	 c50:	7c 89 00 74 	cntlzd  r9,r4
> 	 c54:	79 29 d1 82 	rldicl  r9,r9,58,6
> 	 c58:	0b 09 00 00 	tdnei   r9,0
> 	 c5c:	7c 63 23 92 	divdu   r3,r3,r4
> 	 c60:	4e 80 00 20 	blr
> 
> 	 c70:	38 60 00 00 	li      r3,0
> 	 c74:	4e 80 00 20 	blr
> 
> In the first exemple, we see GCC doesn't need to duplicate what
> happens after the trap.
> 
> In the second exemple, we see that GCC doesn't need to emit a test
> and a branch in the likely path in addition to the trap.
> 
> We've got some WARN_ON() in .softirqentry.text section so it needs
> to be added in the OTHER_TEXT_SECTIONS in modpost.c
> 
> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>

This patch as commit 1e688dd2a3d6 ("powerpc/bug: Provide better
flexibility to WARN_ON/__WARN_FLAGS() with asm goto") cause a WARN_ON in
klist_add_tail to trigger over and over on boot when compiling with
clang:

[    2.177416][    T1] WARNING: CPU: 0 PID: 1 at lib/klist.c:62 .klist_add_tail+0x3c/0x110
[    2.177456][    T1] Modules linked in:
[    2.177481][    T1] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W         5.14.0-rc7-next-20210825 #1
[    2.177520][    T1] NIP:  c0000000007ff81c LR: c00000000090a038 CTR: 0000000000000000
[    2.177557][    T1] REGS: c0000000073c32a0 TRAP: 0700   Tainted: G        W          (5.14.0-rc7-next-20210825)
[    2.177593][    T1] MSR:  8000000002029032 <SF,VEC,EE,ME,IR,DR,RI>  CR: 22000a40  XER: 00000000
[    2.177667][    T1] CFAR: c00000000090a034 IRQMASK: 0
[    2.177667][    T1] GPR00: c00000000090a038 c0000000073c3540 c000000001be3200 0000000000000001
[    2.177667][    T1] GPR04: c0000000072d65c0 0000000000000000 c0000000091ba798 c0000000091bb0a0
[    2.177667][    T1] GPR08: 0000000000000001 0000000000000000 c000000008581918 fffffffffffffc00
[    2.177667][    T1] GPR12: 0000000044000240 c000000001dd0000 c000000000012300 0000000000000000
[    2.177667][    T1] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[    2.177667][    T1] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[    2.177667][    T1] GPR24: 0000000000000000 c0000000017e3200 0000000000000000 c000000001a0e778
[    2.177667][    T1] GPR28: c0000000072d65b0 c0000000072d65a8 c000000007de72c8 c0000000073c35d0
[    2.178019][    T1] NIP [c0000000007ff81c] .klist_add_tail+0x3c/0x110
[    2.178058][    T1] LR [c00000000090a038] .bus_add_driver+0x148/0x290
[    2.178088][    T1] Call Trace:
[    2.178105][    T1] [c0000000073c3540] [c0000000073c35d0] 0xc0000000073c35d0 (unreliable)
[    2.178150][    T1] [c0000000073c35d0] [c00000000090a038] .bus_add_driver+0x148/0x290
[    2.178190][    T1] [c0000000073c3670] [c00000000090fae8] .driver_register+0xb8/0x190
[    2.178234][    T1] [c0000000073c3700] [c000000000be55c0] .__hid_register_driver+0x70/0xd0
[    2.178275][    T1] [c0000000073c37a0] [c00000000116955c] .redragon_driver_init+0x34/0x58
[    2.178314][    T1] [c0000000073c3820] [c000000000011ae0] .do_one_initcall+0x130/0x3b0
[    2.178357][    T1] [c0000000073c3bb0] [c0000000011065e0] .do_initcall_level+0xd8/0x188
[    2.178403][    T1] [c0000000073c3c50] [c0000000011064a8] .do_initcalls+0x7c/0xdc
[    2.178445][    T1] [c0000000073c3ce0] [c000000001106238] .kernel_init_freeable+0x178/0x21c
[    2.178491][    T1] [c0000000073c3d90] [c000000000012334] .kernel_init+0x34/0x220
[    2.178530][    T1] [c0000000073c3e10] [c00000000000cf50] .ret_from_kernel_thread+0x58/0x60
[    2.178569][    T1] Instruction dump:
[    2.178592][    T1] fba10078 7c7d1b78 38600001 fb810070 3b9d0008 fbc10080 7c9e2378 389d0018
[    2.178662][    T1] fb9d0008 fb9d0010 90640000 fbdd0000 <0b1e0000> e87e0018 28230000 41820024
[    2.178728][    T1] ---[ end trace 52ed3431f58f1847 ]---

Is this a bug with clang or is there something wrong with the patch? The
vmlinux image is available at [1] if you want to inspect it and our QEMU
command and the warning at boot can be viewed at [2]. If there is any
other information I can provide, please let me know.

[1] https://builds.tuxbuild.com/1xDcmp3Tvno0TTGxDVPedRKIKM2/
[2] https://github.com/ClangBuiltLinux/continuous-integration2/commit/cee159b66a58eb57fa2359e7888074b9da24126c/checks/3422232736/logs

Cheers,
Nathan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto
@ 2021-08-25 21:25     ` Nathan Chancellor
  0 siblings, 0 replies; 48+ messages in thread
From: Nathan Chancellor @ 2021-08-25 21:25 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: linux-kernel, clang-built-linux, Paul Mackerras, llvm, linuxppc-dev

Hi Christophe,

On Tue, Apr 13, 2021 at 04:38:10PM +0000, Christophe Leroy wrote:
> Using asm goto in __WARN_FLAGS() and WARN_ON() allows more
> flexibility to GCC.
> 
> For that add an entry to the exception table so that
> program_check_exception() knowns where to resume execution
> after a WARNING.
> 
> Here are two exemples. The first one is done on PPC32 (which
> benefits from the previous patch), the second is on PPC64.
> 
> 	unsigned long test(struct pt_regs *regs)
> 	{
> 		int ret;
> 
> 		WARN_ON(regs->msr & MSR_PR);
> 
> 		return regs->gpr[3];
> 	}
> 
> 	unsigned long test9w(unsigned long a, unsigned long b)
> 	{
> 		if (WARN_ON(!b))
> 			return 0;
> 		return a / b;
> 	}
> 
> Before the patch:
> 
> 	000003a8 <test>:
> 	 3a8:	81 23 00 84 	lwz     r9,132(r3)
> 	 3ac:	71 29 40 00 	andi.   r9,r9,16384
> 	 3b0:	40 82 00 0c 	bne     3bc <test+0x14>
> 	 3b4:	80 63 00 0c 	lwz     r3,12(r3)
> 	 3b8:	4e 80 00 20 	blr
> 
> 	 3bc:	0f e0 00 00 	twui    r0,0
> 	 3c0:	80 63 00 0c 	lwz     r3,12(r3)
> 	 3c4:	4e 80 00 20 	blr
> 
> 	0000000000000bf0 <.test9w>:
> 	 bf0:	7c 89 00 74 	cntlzd  r9,r4
> 	 bf4:	79 29 d1 82 	rldicl  r9,r9,58,6
> 	 bf8:	0b 09 00 00 	tdnei   r9,0
> 	 bfc:	2c 24 00 00 	cmpdi   r4,0
> 	 c00:	41 82 00 0c 	beq     c0c <.test9w+0x1c>
> 	 c04:	7c 63 23 92 	divdu   r3,r3,r4
> 	 c08:	4e 80 00 20 	blr
> 
> 	 c0c:	38 60 00 00 	li      r3,0
> 	 c10:	4e 80 00 20 	blr
> 
> After the patch:
> 
> 	000003a8 <test>:
> 	 3a8:	81 23 00 84 	lwz     r9,132(r3)
> 	 3ac:	71 29 40 00 	andi.   r9,r9,16384
> 	 3b0:	40 82 00 0c 	bne     3bc <test+0x14>
> 	 3b4:	80 63 00 0c 	lwz     r3,12(r3)
> 	 3b8:	4e 80 00 20 	blr
> 
> 	 3bc:	0f e0 00 00 	twui    r0,0
> 
> 	0000000000000c50 <.test9w>:
> 	 c50:	7c 89 00 74 	cntlzd  r9,r4
> 	 c54:	79 29 d1 82 	rldicl  r9,r9,58,6
> 	 c58:	0b 09 00 00 	tdnei   r9,0
> 	 c5c:	7c 63 23 92 	divdu   r3,r3,r4
> 	 c60:	4e 80 00 20 	blr
> 
> 	 c70:	38 60 00 00 	li      r3,0
> 	 c74:	4e 80 00 20 	blr
> 
> In the first exemple, we see GCC doesn't need to duplicate what
> happens after the trap.
> 
> In the second exemple, we see that GCC doesn't need to emit a test
> and a branch in the likely path in addition to the trap.
> 
> We've got some WARN_ON() in .softirqentry.text section so it needs
> to be added in the OTHER_TEXT_SECTIONS in modpost.c
> 
> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>

This patch as commit 1e688dd2a3d6 ("powerpc/bug: Provide better
flexibility to WARN_ON/__WARN_FLAGS() with asm goto") cause a WARN_ON in
klist_add_tail to trigger over and over on boot when compiling with
clang:

[    2.177416][    T1] WARNING: CPU: 0 PID: 1 at lib/klist.c:62 .klist_add_tail+0x3c/0x110
[    2.177456][    T1] Modules linked in:
[    2.177481][    T1] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W         5.14.0-rc7-next-20210825 #1
[    2.177520][    T1] NIP:  c0000000007ff81c LR: c00000000090a038 CTR: 0000000000000000
[    2.177557][    T1] REGS: c0000000073c32a0 TRAP: 0700   Tainted: G        W          (5.14.0-rc7-next-20210825)
[    2.177593][    T1] MSR:  8000000002029032 <SF,VEC,EE,ME,IR,DR,RI>  CR: 22000a40  XER: 00000000
[    2.177667][    T1] CFAR: c00000000090a034 IRQMASK: 0
[    2.177667][    T1] GPR00: c00000000090a038 c0000000073c3540 c000000001be3200 0000000000000001
[    2.177667][    T1] GPR04: c0000000072d65c0 0000000000000000 c0000000091ba798 c0000000091bb0a0
[    2.177667][    T1] GPR08: 0000000000000001 0000000000000000 c000000008581918 fffffffffffffc00
[    2.177667][    T1] GPR12: 0000000044000240 c000000001dd0000 c000000000012300 0000000000000000
[    2.177667][    T1] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[    2.177667][    T1] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[    2.177667][    T1] GPR24: 0000000000000000 c0000000017e3200 0000000000000000 c000000001a0e778
[    2.177667][    T1] GPR28: c0000000072d65b0 c0000000072d65a8 c000000007de72c8 c0000000073c35d0
[    2.178019][    T1] NIP [c0000000007ff81c] .klist_add_tail+0x3c/0x110
[    2.178058][    T1] LR [c00000000090a038] .bus_add_driver+0x148/0x290
[    2.178088][    T1] Call Trace:
[    2.178105][    T1] [c0000000073c3540] [c0000000073c35d0] 0xc0000000073c35d0 (unreliable)
[    2.178150][    T1] [c0000000073c35d0] [c00000000090a038] .bus_add_driver+0x148/0x290
[    2.178190][    T1] [c0000000073c3670] [c00000000090fae8] .driver_register+0xb8/0x190
[    2.178234][    T1] [c0000000073c3700] [c000000000be55c0] .__hid_register_driver+0x70/0xd0
[    2.178275][    T1] [c0000000073c37a0] [c00000000116955c] .redragon_driver_init+0x34/0x58
[    2.178314][    T1] [c0000000073c3820] [c000000000011ae0] .do_one_initcall+0x130/0x3b0
[    2.178357][    T1] [c0000000073c3bb0] [c0000000011065e0] .do_initcall_level+0xd8/0x188
[    2.178403][    T1] [c0000000073c3c50] [c0000000011064a8] .do_initcalls+0x7c/0xdc
[    2.178445][    T1] [c0000000073c3ce0] [c000000001106238] .kernel_init_freeable+0x178/0x21c
[    2.178491][    T1] [c0000000073c3d90] [c000000000012334] .kernel_init+0x34/0x220
[    2.178530][    T1] [c0000000073c3e10] [c00000000000cf50] .ret_from_kernel_thread+0x58/0x60
[    2.178569][    T1] Instruction dump:
[    2.178592][    T1] fba10078 7c7d1b78 38600001 fb810070 3b9d0008 fbc10080 7c9e2378 389d0018
[    2.178662][    T1] fb9d0008 fb9d0010 90640000 fbdd0000 <0b1e0000> e87e0018 28230000 41820024
[    2.178728][    T1] ---[ end trace 52ed3431f58f1847 ]---

Is this a bug with clang or is there something wrong with the patch? The
vmlinux image is available at [1] if you want to inspect it and our QEMU
command and the warning at boot can be viewed at [2]. If there is any
other information I can provide, please let me know.

[1] https://builds.tuxbuild.com/1xDcmp3Tvno0TTGxDVPedRKIKM2/
[2] https://github.com/ClangBuiltLinux/continuous-integration2/commit/cee159b66a58eb57fa2359e7888074b9da24126c/checks/3422232736/logs

Cheers,
Nathan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto
  2021-08-25 21:25     ` Nathan Chancellor
@ 2021-08-26  3:21       ` Michael Ellerman
  -1 siblings, 0 replies; 48+ messages in thread
From: Michael Ellerman @ 2021-08-26  3:21 UTC (permalink / raw)
  To: Nathan Chancellor, Christophe Leroy
  Cc: Benjamin Herrenschmidt, Paul Mackerras, linux-kernel,
	linuxppc-dev, clang-built-linux, llvm

Nathan Chancellor <nathan@kernel.org> writes:
> On Tue, Apr 13, 2021 at 04:38:10PM +0000, Christophe Leroy wrote:
>> Using asm goto in __WARN_FLAGS() and WARN_ON() allows more
>> flexibility to GCC.
...
>
> This patch as commit 1e688dd2a3d6 ("powerpc/bug: Provide better
> flexibility to WARN_ON/__WARN_FLAGS() with asm goto") cause a WARN_ON in
> klist_add_tail to trigger over and over on boot when compiling with
> clang:
>
> [    2.177416][    T1] WARNING: CPU: 0 PID: 1 at lib/klist.c:62 .klist_add_tail+0x3c/0x110
> [    2.177456][    T1] Modules linked in:
> [    2.177481][    T1] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W         5.14.0-rc7-next-20210825 #1
> [    2.177520][    T1] NIP:  c0000000007ff81c LR: c00000000090a038 CTR: 0000000000000000
> [    2.177557][    T1] REGS: c0000000073c32a0 TRAP: 0700   Tainted: G        W          (5.14.0-rc7-next-20210825)
> [    2.177593][    T1] MSR:  8000000002029032 <SF,VEC,EE,ME,IR,DR,RI>  CR: 22000a40  XER: 00000000
> [    2.177667][    T1] CFAR: c00000000090a034 IRQMASK: 0
> [    2.177667][    T1] GPR00: c00000000090a038 c0000000073c3540 c000000001be3200 0000000000000001
> [    2.177667][    T1] GPR04: c0000000072d65c0 0000000000000000 c0000000091ba798 c0000000091bb0a0
> [    2.177667][    T1] GPR08: 0000000000000001 0000000000000000 c000000008581918 fffffffffffffc00
> [    2.177667][    T1] GPR12: 0000000044000240 c000000001dd0000 c000000000012300 0000000000000000
> [    2.177667][    T1] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [    2.177667][    T1] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [    2.177667][    T1] GPR24: 0000000000000000 c0000000017e3200 0000000000000000 c000000001a0e778
> [    2.177667][    T1] GPR28: c0000000072d65b0 c0000000072d65a8 c000000007de72c8 c0000000073c35d0
> [    2.178019][    T1] NIP [c0000000007ff81c] .klist_add_tail+0x3c/0x110
> [    2.178058][    T1] LR [c00000000090a038] .bus_add_driver+0x148/0x290
> [    2.178088][    T1] Call Trace:
> [    2.178105][    T1] [c0000000073c3540] [c0000000073c35d0] 0xc0000000073c35d0 (unreliable)
> [    2.178150][    T1] [c0000000073c35d0] [c00000000090a038] .bus_add_driver+0x148/0x290
> [    2.178190][    T1] [c0000000073c3670] [c00000000090fae8] .driver_register+0xb8/0x190
> [    2.178234][    T1] [c0000000073c3700] [c000000000be55c0] .__hid_register_driver+0x70/0xd0
> [    2.178275][    T1] [c0000000073c37a0] [c00000000116955c] .redragon_driver_init+0x34/0x58
> [    2.178314][    T1] [c0000000073c3820] [c000000000011ae0] .do_one_initcall+0x130/0x3b0
> [    2.178357][    T1] [c0000000073c3bb0] [c0000000011065e0] .do_initcall_level+0xd8/0x188
> [    2.178403][    T1] [c0000000073c3c50] [c0000000011064a8] .do_initcalls+0x7c/0xdc
> [    2.178445][    T1] [c0000000073c3ce0] [c000000001106238] .kernel_init_freeable+0x178/0x21c
> [    2.178491][    T1] [c0000000073c3d90] [c000000000012334] .kernel_init+0x34/0x220
> [    2.178530][    T1] [c0000000073c3e10] [c00000000000cf50] .ret_from_kernel_thread+0x58/0x60
> [    2.178569][    T1] Instruction dump:
> [    2.178592][    T1] fba10078 7c7d1b78 38600001 fb810070 3b9d0008 fbc10080 7c9e2378 389d0018
> [    2.178662][    T1] fb9d0008 fb9d0010 90640000 fbdd0000 <0b1e0000> e87e0018 28230000 41820024
> [    2.178728][    T1] ---[ end trace 52ed3431f58f1847 ]---
>
> Is this a bug with clang or is there something wrong with the patch? The
> vmlinux image is available at [1] if you want to inspect it and our QEMU
> command and the warning at boot can be viewed at [2]. If there is any
> other information I can provide, please let me know.
>
> [1] https://builds.tuxbuild.com/1xDcmp3Tvno0TTGxDVPedRKIKM2/
> [2] https://github.com/ClangBuiltLinux/continuous-integration2/commit/cee159b66a58eb57fa2359e7888074b9da24126c/checks/3422232736/logs

Thanks.

This is the generated assembly:

c0000000007ff600 <.klist_add_tail>:
c0000000007ff600:       7c 08 02 a6     mflr    r0
c0000000007ff604:       f8 01 00 10     std     r0,16(r1)
c0000000007ff608:       f8 21 ff 71     stdu    r1,-144(r1)	^ prolog
c0000000007ff60c:       fb a1 00 78     std     r29,120(r1)	save r29 to stack
c0000000007ff610:       7c 7d 1b 78     mr      r29,r3		r29 = struct klist_node *n
c0000000007ff614:       38 60 00 01     li      r3,1		r3 = 1
c0000000007ff618:       fb 81 00 70     std     r28,112(r1)	save r28 to stack
c0000000007ff61c:       3b 9d 00 08     addi    r28,r29,8	r28 = &n->n_node
c0000000007ff620:       fb c1 00 80     std     r30,128(r1)	save r30 to stack
c0000000007ff624:       7c 9e 23 78     mr      r30,r4		r30 = struct klist *k
c0000000007ff628:       38 9d 00 18     addi    r4,r29,24	r4 = &n->n_ref
c0000000007ff62c:       fb 9d 00 08     std     r28,8(r29)	n->n_node.next = &n->n_node	INIT_LIST_HEAD
c0000000007ff630:       fb 9d 00 10     std     r28,16(r29)	n->n_node.prev = &n->n_node
c0000000007ff634:       90 64 00 00     stw     r3,0(r4)	kref_init(&n->n_ref)
c0000000007ff638:       fb dd 00 00     std     r30,0(r29)	n->n_klist = k
c0000000007ff63c:       0b 1e 00 00     tdnei   r30,0		trap if r30 (k) is not zero


From:

static void knode_set_klist(struct klist_node *knode, struct klist *klist)
{
	knode->n_klist = klist;
	/* no knode deserves to start its life dead */
	WARN_ON(knode_dead(knode));
                ^^^^^^^^^^^^^^^^^
}

Which expands to:

static void knode_set_klist(struct klist_node *knode, struct klist *klist)
{
	knode->n_klist = klist;

	({
		bool __ret_warn_on = false;
		do {
                ...
			} else {
				__label__ __label_warn_on;
				do {
					asm goto(
						"1:   "
						"tdnei"
						"
						" " % 4,
						0 " "\n " ".section __ex_table,\"a\";"
										" "
										".balign 4;"
										" "
										".long (1b) - . ;"
										" "
										".long (%l[__label_warn_on]) - . ;"
										" "
										".previous"
										" "
										".section __bug_table,\"aw\"\n"
										"2:\t.4byte 1b - 2b, %0 - 2b\n"
										"\t.short %1, %2\n"
										".org 2b+%3\n"
										".previous\n"
						:
						: "i"("lib/klist.c"), "i"(62),
						  "i"((1 << 0) | ((9) << 8)),
						  "i"(sizeof(struct bug_entry)),
						  "r"(knode_dead(knode))
                                                  ^^^^^^^^^^^^^^^^^^^^^

						:
						: __label_warn_on);
					asm("");
				} while (0);
				break;
			__label_warn_on:
				__ret_warn_on = true;
			}
		} while (0);
		__builtin_expect(!!(__ret_warn_on), 0);
	});
}

And knode_dead is:

#define KNODE_DEAD		1LU

static bool knode_dead(struct klist_node *knode)
{
	return (unsigned long)knode->n_klist & KNODE_DEAD;
}


So it's meant to warn if (n_klist & KNODE_DEAD) is not equal to zero.

But in the asm:

c0000000007ff600 <.klist_add_tail>:
...
c0000000007ff624:       7c 9e 23 78     mr      r30,r4		r30 = struct klist *k
...
c0000000007ff638:       fb dd 00 00     std     r30,0(r29)	n->n_klist = k
c0000000007ff63c:       0b 1e 00 00     tdnei   r30,0		trap if r30 (k) is not zero


It's just warning if n_klist is not equal to zero. ie. we lost the "& KNODE_DEAD".

In the GCC output you can see it:

c0000000008c8a30 <.klist_node_init>:
c0000000008c8a30:       39 24 00 08     addi    r9,r4,8
c0000000008c8a34:       39 40 00 01     li      r10,1
c0000000008c8a38:       f9 24 00 08     std     r9,8(r4)
c0000000008c8a3c:       f9 24 00 10     std     r9,16(r4)
c0000000008c8a40:       91 44 00 18     stw     r10,24(r4)
c0000000008c8a44:       f8 64 00 00     std     r3,0(r4)
c0000000008c8a48:       54 69 07 fe     clrlwi  r9,r3,31
c0000000008c8a4c:       0b 09 00 00     tdnei   r9,0

ie. the clrlwi is "clear left (word) immediate", and zeroes everything
except bit 0, which is equivalent to "& KNODE_DEAD".


So there seems to be some misunderstanding with clang, it doesn't like
us passing an expression to the inline asm.

AFAIK it is legal to pass expressions as inputs to inline asm, ie. it
doesn't have to just be a variable name.

This patch seems to fix it. Not sure if that's just papering over it though.

diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
index 1ee0f22313ee..75fcb4370d96 100644
--- a/arch/powerpc/include/asm/bug.h
+++ b/arch/powerpc/include/asm/bug.h
@@ -119,7 +119,7 @@ __label_warn_on:						\
 								\
 			WARN_ENTRY(PPC_TLNEI " %4, 0",		\
 				   BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN),	\
-				   __label_warn_on, "r" (x));	\
+				   __label_warn_on, "r" (!!(x))); \
 			break;					\
 __label_warn_on:						\
 			__ret_warn_on = true;			\


Generates:

c0000000008e2ac0 <.klist_add_tail>:
c0000000008e2ac0:       7c 08 02 a6     mflr    r0
c0000000008e2ac4:       f8 01 00 10     std     r0,16(r1)
c0000000008e2ac8:       f8 21 ff 71     stdu    r1,-144(r1)
c0000000008e2acc:       fb a1 00 78     std     r29,120(r1)
c0000000008e2ad0:       7c 7d 1b 78     mr      r29,r3
c0000000008e2ad4:       38 60 00 01     li      r3,1
c0000000008e2ad8:       fb c1 00 80     std     r30,128(r1)
c0000000008e2adc:       7c 9e 23 78     mr      r30,r4
c0000000008e2ae0:       38 9d 00 18     addi    r4,r29,24
c0000000008e2ae4:       57 c5 07 fe     clrlwi  r5,r30,31	<-
c0000000008e2ae8:       fb 81 00 70     std     r28,112(r1)
c0000000008e2aec:       3b 9d 00 08     addi    r28,r29,8
c0000000008e2af0:       fb 9d 00 08     std     r28,8(r29)
c0000000008e2af4:       fb 9d 00 10     std     r28,16(r29)
c0000000008e2af8:       90 64 00 00     stw     r3,0(r4)
c0000000008e2afc:       fb dd 00 00     std     r30,0(r29)
c0000000008e2b00:       0b 05 00 00     tdnei   r5,0		<-


cheers

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto
@ 2021-08-26  3:21       ` Michael Ellerman
  0 siblings, 0 replies; 48+ messages in thread
From: Michael Ellerman @ 2021-08-26  3:21 UTC (permalink / raw)
  To: Nathan Chancellor, Christophe Leroy
  Cc: llvm, linux-kernel, clang-built-linux, Paul Mackerras, linuxppc-dev

Nathan Chancellor <nathan@kernel.org> writes:
> On Tue, Apr 13, 2021 at 04:38:10PM +0000, Christophe Leroy wrote:
>> Using asm goto in __WARN_FLAGS() and WARN_ON() allows more
>> flexibility to GCC.
...
>
> This patch as commit 1e688dd2a3d6 ("powerpc/bug: Provide better
> flexibility to WARN_ON/__WARN_FLAGS() with asm goto") cause a WARN_ON in
> klist_add_tail to trigger over and over on boot when compiling with
> clang:
>
> [    2.177416][    T1] WARNING: CPU: 0 PID: 1 at lib/klist.c:62 .klist_add_tail+0x3c/0x110
> [    2.177456][    T1] Modules linked in:
> [    2.177481][    T1] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W         5.14.0-rc7-next-20210825 #1
> [    2.177520][    T1] NIP:  c0000000007ff81c LR: c00000000090a038 CTR: 0000000000000000
> [    2.177557][    T1] REGS: c0000000073c32a0 TRAP: 0700   Tainted: G        W          (5.14.0-rc7-next-20210825)
> [    2.177593][    T1] MSR:  8000000002029032 <SF,VEC,EE,ME,IR,DR,RI>  CR: 22000a40  XER: 00000000
> [    2.177667][    T1] CFAR: c00000000090a034 IRQMASK: 0
> [    2.177667][    T1] GPR00: c00000000090a038 c0000000073c3540 c000000001be3200 0000000000000001
> [    2.177667][    T1] GPR04: c0000000072d65c0 0000000000000000 c0000000091ba798 c0000000091bb0a0
> [    2.177667][    T1] GPR08: 0000000000000001 0000000000000000 c000000008581918 fffffffffffffc00
> [    2.177667][    T1] GPR12: 0000000044000240 c000000001dd0000 c000000000012300 0000000000000000
> [    2.177667][    T1] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [    2.177667][    T1] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [    2.177667][    T1] GPR24: 0000000000000000 c0000000017e3200 0000000000000000 c000000001a0e778
> [    2.177667][    T1] GPR28: c0000000072d65b0 c0000000072d65a8 c000000007de72c8 c0000000073c35d0
> [    2.178019][    T1] NIP [c0000000007ff81c] .klist_add_tail+0x3c/0x110
> [    2.178058][    T1] LR [c00000000090a038] .bus_add_driver+0x148/0x290
> [    2.178088][    T1] Call Trace:
> [    2.178105][    T1] [c0000000073c3540] [c0000000073c35d0] 0xc0000000073c35d0 (unreliable)
> [    2.178150][    T1] [c0000000073c35d0] [c00000000090a038] .bus_add_driver+0x148/0x290
> [    2.178190][    T1] [c0000000073c3670] [c00000000090fae8] .driver_register+0xb8/0x190
> [    2.178234][    T1] [c0000000073c3700] [c000000000be55c0] .__hid_register_driver+0x70/0xd0
> [    2.178275][    T1] [c0000000073c37a0] [c00000000116955c] .redragon_driver_init+0x34/0x58
> [    2.178314][    T1] [c0000000073c3820] [c000000000011ae0] .do_one_initcall+0x130/0x3b0
> [    2.178357][    T1] [c0000000073c3bb0] [c0000000011065e0] .do_initcall_level+0xd8/0x188
> [    2.178403][    T1] [c0000000073c3c50] [c0000000011064a8] .do_initcalls+0x7c/0xdc
> [    2.178445][    T1] [c0000000073c3ce0] [c000000001106238] .kernel_init_freeable+0x178/0x21c
> [    2.178491][    T1] [c0000000073c3d90] [c000000000012334] .kernel_init+0x34/0x220
> [    2.178530][    T1] [c0000000073c3e10] [c00000000000cf50] .ret_from_kernel_thread+0x58/0x60
> [    2.178569][    T1] Instruction dump:
> [    2.178592][    T1] fba10078 7c7d1b78 38600001 fb810070 3b9d0008 fbc10080 7c9e2378 389d0018
> [    2.178662][    T1] fb9d0008 fb9d0010 90640000 fbdd0000 <0b1e0000> e87e0018 28230000 41820024
> [    2.178728][    T1] ---[ end trace 52ed3431f58f1847 ]---
>
> Is this a bug with clang or is there something wrong with the patch? The
> vmlinux image is available at [1] if you want to inspect it and our QEMU
> command and the warning at boot can be viewed at [2]. If there is any
> other information I can provide, please let me know.
>
> [1] https://builds.tuxbuild.com/1xDcmp3Tvno0TTGxDVPedRKIKM2/
> [2] https://github.com/ClangBuiltLinux/continuous-integration2/commit/cee159b66a58eb57fa2359e7888074b9da24126c/checks/3422232736/logs

Thanks.

This is the generated assembly:

c0000000007ff600 <.klist_add_tail>:
c0000000007ff600:       7c 08 02 a6     mflr    r0
c0000000007ff604:       f8 01 00 10     std     r0,16(r1)
c0000000007ff608:       f8 21 ff 71     stdu    r1,-144(r1)	^ prolog
c0000000007ff60c:       fb a1 00 78     std     r29,120(r1)	save r29 to stack
c0000000007ff610:       7c 7d 1b 78     mr      r29,r3		r29 = struct klist_node *n
c0000000007ff614:       38 60 00 01     li      r3,1		r3 = 1
c0000000007ff618:       fb 81 00 70     std     r28,112(r1)	save r28 to stack
c0000000007ff61c:       3b 9d 00 08     addi    r28,r29,8	r28 = &n->n_node
c0000000007ff620:       fb c1 00 80     std     r30,128(r1)	save r30 to stack
c0000000007ff624:       7c 9e 23 78     mr      r30,r4		r30 = struct klist *k
c0000000007ff628:       38 9d 00 18     addi    r4,r29,24	r4 = &n->n_ref
c0000000007ff62c:       fb 9d 00 08     std     r28,8(r29)	n->n_node.next = &n->n_node	INIT_LIST_HEAD
c0000000007ff630:       fb 9d 00 10     std     r28,16(r29)	n->n_node.prev = &n->n_node
c0000000007ff634:       90 64 00 00     stw     r3,0(r4)	kref_init(&n->n_ref)
c0000000007ff638:       fb dd 00 00     std     r30,0(r29)	n->n_klist = k
c0000000007ff63c:       0b 1e 00 00     tdnei   r30,0		trap if r30 (k) is not zero


From:

static void knode_set_klist(struct klist_node *knode, struct klist *klist)
{
	knode->n_klist = klist;
	/* no knode deserves to start its life dead */
	WARN_ON(knode_dead(knode));
                ^^^^^^^^^^^^^^^^^
}

Which expands to:

static void knode_set_klist(struct klist_node *knode, struct klist *klist)
{
	knode->n_klist = klist;

	({
		bool __ret_warn_on = false;
		do {
                ...
			} else {
				__label__ __label_warn_on;
				do {
					asm goto(
						"1:   "
						"tdnei"
						"
						" " % 4,
						0 " "\n " ".section __ex_table,\"a\";"
										" "
										".balign 4;"
										" "
										".long (1b) - . ;"
										" "
										".long (%l[__label_warn_on]) - . ;"
										" "
										".previous"
										" "
										".section __bug_table,\"aw\"\n"
										"2:\t.4byte 1b - 2b, %0 - 2b\n"
										"\t.short %1, %2\n"
										".org 2b+%3\n"
										".previous\n"
						:
						: "i"("lib/klist.c"), "i"(62),
						  "i"((1 << 0) | ((9) << 8)),
						  "i"(sizeof(struct bug_entry)),
						  "r"(knode_dead(knode))
                                                  ^^^^^^^^^^^^^^^^^^^^^

						:
						: __label_warn_on);
					asm("");
				} while (0);
				break;
			__label_warn_on:
				__ret_warn_on = true;
			}
		} while (0);
		__builtin_expect(!!(__ret_warn_on), 0);
	});
}

And knode_dead is:

#define KNODE_DEAD		1LU

static bool knode_dead(struct klist_node *knode)
{
	return (unsigned long)knode->n_klist & KNODE_DEAD;
}


So it's meant to warn if (n_klist & KNODE_DEAD) is not equal to zero.

But in the asm:

c0000000007ff600 <.klist_add_tail>:
...
c0000000007ff624:       7c 9e 23 78     mr      r30,r4		r30 = struct klist *k
...
c0000000007ff638:       fb dd 00 00     std     r30,0(r29)	n->n_klist = k
c0000000007ff63c:       0b 1e 00 00     tdnei   r30,0		trap if r30 (k) is not zero


It's just warning if n_klist is not equal to zero. ie. we lost the "& KNODE_DEAD".

In the GCC output you can see it:

c0000000008c8a30 <.klist_node_init>:
c0000000008c8a30:       39 24 00 08     addi    r9,r4,8
c0000000008c8a34:       39 40 00 01     li      r10,1
c0000000008c8a38:       f9 24 00 08     std     r9,8(r4)
c0000000008c8a3c:       f9 24 00 10     std     r9,16(r4)
c0000000008c8a40:       91 44 00 18     stw     r10,24(r4)
c0000000008c8a44:       f8 64 00 00     std     r3,0(r4)
c0000000008c8a48:       54 69 07 fe     clrlwi  r9,r3,31
c0000000008c8a4c:       0b 09 00 00     tdnei   r9,0

ie. the clrlwi is "clear left (word) immediate", and zeroes everything
except bit 0, which is equivalent to "& KNODE_DEAD".


So there seems to be some misunderstanding with clang, it doesn't like
us passing an expression to the inline asm.

AFAIK it is legal to pass expressions as inputs to inline asm, ie. it
doesn't have to just be a variable name.

This patch seems to fix it. Not sure if that's just papering over it though.

diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
index 1ee0f22313ee..75fcb4370d96 100644
--- a/arch/powerpc/include/asm/bug.h
+++ b/arch/powerpc/include/asm/bug.h
@@ -119,7 +119,7 @@ __label_warn_on:						\
 								\
 			WARN_ENTRY(PPC_TLNEI " %4, 0",		\
 				   BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN),	\
-				   __label_warn_on, "r" (x));	\
+				   __label_warn_on, "r" (!!(x))); \
 			break;					\
 __label_warn_on:						\
 			__ret_warn_on = true;			\


Generates:

c0000000008e2ac0 <.klist_add_tail>:
c0000000008e2ac0:       7c 08 02 a6     mflr    r0
c0000000008e2ac4:       f8 01 00 10     std     r0,16(r1)
c0000000008e2ac8:       f8 21 ff 71     stdu    r1,-144(r1)
c0000000008e2acc:       fb a1 00 78     std     r29,120(r1)
c0000000008e2ad0:       7c 7d 1b 78     mr      r29,r3
c0000000008e2ad4:       38 60 00 01     li      r3,1
c0000000008e2ad8:       fb c1 00 80     std     r30,128(r1)
c0000000008e2adc:       7c 9e 23 78     mr      r30,r4
c0000000008e2ae0:       38 9d 00 18     addi    r4,r29,24
c0000000008e2ae4:       57 c5 07 fe     clrlwi  r5,r30,31	<-
c0000000008e2ae8:       fb 81 00 70     std     r28,112(r1)
c0000000008e2aec:       3b 9d 00 08     addi    r28,r29,8
c0000000008e2af0:       fb 9d 00 08     std     r28,8(r29)
c0000000008e2af4:       fb 9d 00 10     std     r28,16(r29)
c0000000008e2af8:       90 64 00 00     stw     r3,0(r4)
c0000000008e2afc:       fb dd 00 00     std     r30,0(r29)
c0000000008e2b00:       0b 05 00 00     tdnei   r5,0		<-


cheers

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32
  2021-08-18 15:06     ` Segher Boessenkool
@ 2021-08-26  3:26       ` Nicholas Piggin
  -1 siblings, 0 replies; 48+ messages in thread
From: Nicholas Piggin @ 2021-08-26  3:26 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Benjamin Herrenschmidt, Christophe Leroy, linux-kernel,
	linuxppc-dev, Michael Ellerman, Paul Mackerras

Excerpts from Segher Boessenkool's message of August 19, 2021 1:06 am:
> On Fri, Aug 13, 2021 at 04:08:13PM +1000, Nicholas Piggin wrote:
>> This one possibly the branches end up in predictors, whereas conditional 
>> trap is always just speculated not to hit. Branches may also have a
>> throughput limit on execution whereas trap could be more (1 per cycle
>> vs 4 per cycle on POWER9).
> 
> I thought only *taken* branches are just one per cycle?

Taken branches are fetched by the front end at one per cycle (assuming 
they hit the BTAC), but all branches have to be executed by BR at one 
per cycle. On POWER9 BR even has to execute some other things like mflr
as well, but at least that's improved on POWER10.

Trap is executed at 4 per cycle and will never use branch table entries 
or alias with a non-tagged predictor and mispredict.

> And those
> branches are only taken for the exceptional condition (or the case where
> we do not care about performance, anyway, if we do have an error most of
> the time ;-) )

It's not that big a deal, but trap is really the best instruction for 
this.

> 
>> On typical ppc32 CPUs, maybe it's a more obvious win. As you say there
>> is the CFAR issue as well which makes it a problem for 64s. It would
>> have been nice if it could use the same code though.
> 
> On 64-bit the code looks better for the no-error path as well.
> 
>> Maybe one day gcc's __builtin_trap() will become smart enough around
>> conditional statements that it it generates better code and tries to
>> avoid branches.
> 
> Internally *all* traps are conditional, in GCC.  It also can optimise
> them quite well.  There must be something in the kernel macros that
> prevents good optimisation.

I did take a look at it at one point.

One problem is that the kernel needs the address of the trap instruction 
to create the entry for it. The other problem is that __builtin_trap 
does not return so it can't be used for WARN. LLVM at least seems to 
have a __builtin_debugtrap which does return.

The first problem seems like the show stopper though. AFAIKS it would 
need a special builtin support that does something to create the table
entry, or a guarantee that we could put an inline asm right after the
builtin as a recognized pattern and that would give us the instruction
following the trap.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32
@ 2021-08-26  3:26       ` Nicholas Piggin
  0 siblings, 0 replies; 48+ messages in thread
From: Nicholas Piggin @ 2021-08-26  3:26 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: linux-kernel, Paul Mackerras, linuxppc-dev

Excerpts from Segher Boessenkool's message of August 19, 2021 1:06 am:
> On Fri, Aug 13, 2021 at 04:08:13PM +1000, Nicholas Piggin wrote:
>> This one possibly the branches end up in predictors, whereas conditional 
>> trap is always just speculated not to hit. Branches may also have a
>> throughput limit on execution whereas trap could be more (1 per cycle
>> vs 4 per cycle on POWER9).
> 
> I thought only *taken* branches are just one per cycle?

Taken branches are fetched by the front end at one per cycle (assuming 
they hit the BTAC), but all branches have to be executed by BR at one 
per cycle. On POWER9 BR even has to execute some other things like mflr
as well, but at least that's improved on POWER10.

Trap is executed at 4 per cycle and will never use branch table entries 
or alias with a non-tagged predictor and mispredict.

> And those
> branches are only taken for the exceptional condition (or the case where
> we do not care about performance, anyway, if we do have an error most of
> the time ;-) )

It's not that big a deal, but trap is really the best instruction for 
this.

> 
>> On typical ppc32 CPUs, maybe it's a more obvious win. As you say there
>> is the CFAR issue as well which makes it a problem for 64s. It would
>> have been nice if it could use the same code though.
> 
> On 64-bit the code looks better for the no-error path as well.
> 
>> Maybe one day gcc's __builtin_trap() will become smart enough around
>> conditional statements that it it generates better code and tries to
>> avoid branches.
> 
> Internally *all* traps are conditional, in GCC.  It also can optimise
> them quite well.  There must be something in the kernel macros that
> prevents good optimisation.

I did take a look at it at one point.

One problem is that the kernel needs the address of the trap instruction 
to create the entry for it. The other problem is that __builtin_trap 
does not return so it can't be used for WARN. LLVM at least seems to 
have a __builtin_debugtrap which does return.

The first problem seems like the show stopper though. AFAIKS it would 
need a special builtin support that does something to create the table
entry, or a guarantee that we could put an inline asm right after the
builtin as a recognized pattern and that would give us the instruction
following the trap.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto
  2021-08-26  3:21       ` Michael Ellerman
@ 2021-08-26  6:37         ` Christophe Leroy
  -1 siblings, 0 replies; 48+ messages in thread
From: Christophe Leroy @ 2021-08-26  6:37 UTC (permalink / raw)
  To: Michael Ellerman, Nathan Chancellor
  Cc: Benjamin Herrenschmidt, Paul Mackerras, linux-kernel,
	linuxppc-dev, clang-built-linux, llvm



Le 26/08/2021 à 05:21, Michael Ellerman a écrit :
> Nathan Chancellor <nathan@kernel.org> writes:
>> On Tue, Apr 13, 2021 at 04:38:10PM +0000, Christophe Leroy wrote:
>>> Using asm goto in __WARN_FLAGS() and WARN_ON() allows more
>>> flexibility to GCC.
> ...
>>
>> This patch as commit 1e688dd2a3d6 ("powerpc/bug: Provide better
>> flexibility to WARN_ON/__WARN_FLAGS() with asm goto") cause a WARN_ON in
>> klist_add_tail to trigger over and over on boot when compiling with
>> clang:

...

> 
> This patch seems to fix it. Not sure if that's just papering over it though.
> 
> diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
> index 1ee0f22313ee..75fcb4370d96 100644
> --- a/arch/powerpc/include/asm/bug.h
> +++ b/arch/powerpc/include/asm/bug.h
> @@ -119,7 +119,7 @@ __label_warn_on:						\
>   								\
>   			WARN_ENTRY(PPC_TLNEI " %4, 0",		\
>   				   BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN),	\
> -				   __label_warn_on, "r" (x));	\
> +				   __label_warn_on, "r" (!!(x))); \
>   			break;					\
>   __label_warn_on:						\
>   			__ret_warn_on = true;			\
> 
> 

But for a simple WARN_ON() call:

void test(unsigned long b)
{
	WARN_ON(b);
}

Without your change with GCC you get:

00000000000012d0 <.test>:
     12d0:	0b 03 00 00 	tdnei   r3,0
     12d4:	4e 80 00 20 	blr


With the !! change you get:

00000000000012d0 <.test>:
     12d0:	31 23 ff ff 	addic   r9,r3,-1
     12d4:	7d 29 19 10 	subfe   r9,r9,r3
     12d8:	0b 09 00 00 	tdnei   r9,0
     12dc:	4e 80 00 20 	blr

Christophe

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto
@ 2021-08-26  6:37         ` Christophe Leroy
  0 siblings, 0 replies; 48+ messages in thread
From: Christophe Leroy @ 2021-08-26  6:37 UTC (permalink / raw)
  To: Michael Ellerman, Nathan Chancellor
  Cc: llvm, linux-kernel, clang-built-linux, Paul Mackerras, linuxppc-dev



Le 26/08/2021 à 05:21, Michael Ellerman a écrit :
> Nathan Chancellor <nathan@kernel.org> writes:
>> On Tue, Apr 13, 2021 at 04:38:10PM +0000, Christophe Leroy wrote:
>>> Using asm goto in __WARN_FLAGS() and WARN_ON() allows more
>>> flexibility to GCC.
> ...
>>
>> This patch as commit 1e688dd2a3d6 ("powerpc/bug: Provide better
>> flexibility to WARN_ON/__WARN_FLAGS() with asm goto") cause a WARN_ON in
>> klist_add_tail to trigger over and over on boot when compiling with
>> clang:

...

> 
> This patch seems to fix it. Not sure if that's just papering over it though.
> 
> diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
> index 1ee0f22313ee..75fcb4370d96 100644
> --- a/arch/powerpc/include/asm/bug.h
> +++ b/arch/powerpc/include/asm/bug.h
> @@ -119,7 +119,7 @@ __label_warn_on:						\
>   								\
>   			WARN_ENTRY(PPC_TLNEI " %4, 0",		\
>   				   BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN),	\
> -				   __label_warn_on, "r" (x));	\
> +				   __label_warn_on, "r" (!!(x))); \
>   			break;					\
>   __label_warn_on:						\
>   			__ret_warn_on = true;			\
> 
> 

But for a simple WARN_ON() call:

void test(unsigned long b)
{
	WARN_ON(b);
}

Without your change with GCC you get:

00000000000012d0 <.test>:
     12d0:	0b 03 00 00 	tdnei   r3,0
     12d4:	4e 80 00 20 	blr


With the !! change you get:

00000000000012d0 <.test>:
     12d0:	31 23 ff ff 	addic   r9,r3,-1
     12d4:	7d 29 19 10 	subfe   r9,r9,r3
     12d8:	0b 09 00 00 	tdnei   r9,0
     12dc:	4e 80 00 20 	blr

Christophe

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32
  2021-08-26  3:26       ` Nicholas Piggin
@ 2021-08-26 12:49         ` Segher Boessenkool
  -1 siblings, 0 replies; 48+ messages in thread
From: Segher Boessenkool @ 2021-08-26 12:49 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: linux-kernel, Paul Mackerras, linuxppc-dev

Hi!

On Thu, Aug 26, 2021 at 01:26:14PM +1000, Nicholas Piggin wrote:
> Excerpts from Segher Boessenkool's message of August 19, 2021 1:06 am:
> > On Fri, Aug 13, 2021 at 04:08:13PM +1000, Nicholas Piggin wrote:
> >> This one possibly the branches end up in predictors, whereas conditional 
> >> trap is always just speculated not to hit. Branches may also have a
> >> throughput limit on execution whereas trap could be more (1 per cycle
> >> vs 4 per cycle on POWER9).
> > 
> > I thought only *taken* branches are just one per cycle?
> 
> Taken branches are fetched by the front end at one per cycle (assuming 
> they hit the BTAC), but all branches have to be executed by BR at one 
> per cycle

This is not true.  (Simple) predicted not-taken conditional branches are
just folded out, never hit the issue queues.  And they are fetched as
many together as fit in a fetch group, can complete without limits as
well.

The BTAC is a frontend thing, used for target address prediction.  It
does not limit execution.

Correctly predicted simple conditional branches just get their prediction
validated (and that is not done in the execution units).  Incorrectly
predicted branches the same, but those cause a redirect and refetch.

> > Internally *all* traps are conditional, in GCC.  It also can optimise
> > them quite well.  There must be something in the kernel macros that
> > prevents good optimisation.
> 
> I did take a look at it at one point.
> 
> One problem is that the kernel needs the address of the trap instruction 
> to create the entry for it. The other problem is that __builtin_trap 
> does not return so it can't be used for WARN. LLVM at least seems to 
> have a __builtin_debugtrap which does return.

This is <https://gcc.gnu.org/PR99299>.

> The first problem seems like the show stopper though. AFAIKS it would 
> need a special builtin support that does something to create the table
> entry, or a guarantee that we could put an inline asm right after the
> builtin as a recognized pattern and that would give us the instruction
> following the trap.

I'm not quite sure what this means.  Can't you always just put a

bla:	asm("");

in there, and use the address of "bla"?  If not, you need to say a lot
more about what you actually want to do :-/


Segher

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32
@ 2021-08-26 12:49         ` Segher Boessenkool
  0 siblings, 0 replies; 48+ messages in thread
From: Segher Boessenkool @ 2021-08-26 12:49 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Benjamin Herrenschmidt, Christophe Leroy, linux-kernel,
	linuxppc-dev, Michael Ellerman, Paul Mackerras

Hi!

On Thu, Aug 26, 2021 at 01:26:14PM +1000, Nicholas Piggin wrote:
> Excerpts from Segher Boessenkool's message of August 19, 2021 1:06 am:
> > On Fri, Aug 13, 2021 at 04:08:13PM +1000, Nicholas Piggin wrote:
> >> This one possibly the branches end up in predictors, whereas conditional 
> >> trap is always just speculated not to hit. Branches may also have a
> >> throughput limit on execution whereas trap could be more (1 per cycle
> >> vs 4 per cycle on POWER9).
> > 
> > I thought only *taken* branches are just one per cycle?
> 
> Taken branches are fetched by the front end at one per cycle (assuming 
> they hit the BTAC), but all branches have to be executed by BR at one 
> per cycle

This is not true.  (Simple) predicted not-taken conditional branches are
just folded out, never hit the issue queues.  And they are fetched as
many together as fit in a fetch group, can complete without limits as
well.

The BTAC is a frontend thing, used for target address prediction.  It
does not limit execution.

Correctly predicted simple conditional branches just get their prediction
validated (and that is not done in the execution units).  Incorrectly
predicted branches the same, but those cause a redirect and refetch.

> > Internally *all* traps are conditional, in GCC.  It also can optimise
> > them quite well.  There must be something in the kernel macros that
> > prevents good optimisation.
> 
> I did take a look at it at one point.
> 
> One problem is that the kernel needs the address of the trap instruction 
> to create the entry for it. The other problem is that __builtin_trap 
> does not return so it can't be used for WARN. LLVM at least seems to 
> have a __builtin_debugtrap which does return.

This is <https://gcc.gnu.org/PR99299>.

> The first problem seems like the show stopper though. AFAIKS it would 
> need a special builtin support that does something to create the table
> entry, or a guarantee that we could put an inline asm right after the
> builtin as a recognized pattern and that would give us the instruction
> following the trap.

I'm not quite sure what this means.  Can't you always just put a

bla:	asm("");

in there, and use the address of "bla"?  If not, you need to say a lot
more about what you actually want to do :-/


Segher

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto
  2021-08-26  6:37         ` Christophe Leroy
@ 2021-08-26 13:47           ` Segher Boessenkool
  -1 siblings, 0 replies; 48+ messages in thread
From: Segher Boessenkool @ 2021-08-26 13:47 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: llvm, linux-kernel, Nathan Chancellor, clang-built-linux,
	Paul Mackerras, linuxppc-dev

On Thu, Aug 26, 2021 at 08:37:09AM +0200, Christophe Leroy wrote:
> Le 26/08/2021 à 05:21, Michael Ellerman a écrit :
> >This patch seems to fix it. Not sure if that's just papering over it 
> >though.
> >
> >diff --git a/arch/powerpc/include/asm/bug.h 
> >b/arch/powerpc/include/asm/bug.h
> >index 1ee0f22313ee..75fcb4370d96 100644
> >--- a/arch/powerpc/include/asm/bug.h
> >+++ b/arch/powerpc/include/asm/bug.h
> >@@ -119,7 +119,7 @@ __label_warn_on:					 \
> >  								\
> >  			WARN_ENTRY(PPC_TLNEI " %4, 0",		\
> >  				   BUGFLAG_WARNING | 
> >  				   BUGFLAG_TAINT(TAINT_WARN),	\
> >-				   __label_warn_on, "r" (x));	\
> >+				   __label_warn_on, "r" (!!(x))); \
> >  			break;					\
> >  __label_warn_on:						\
> >  			__ret_warn_on = true;			\
> 
> But for a simple WARN_ON() call:
> 
> void test(unsigned long b)
> {
> 	WARN_ON(b);
> }
> 
> Without your change with GCC you get:
> 
> 00000000000012d0 <.test>:
>     12d0:	0b 03 00 00 	tdnei   r3,0
>     12d4:	4e 80 00 20 	blr
> 
> 
> With the !! change you get:
> 
> 00000000000012d0 <.test>:
>     12d0:	31 23 ff ff 	addic   r9,r3,-1
>     12d4:	7d 29 19 10 	subfe   r9,r9,r3
>     12d8:	0b 09 00 00 	tdnei   r9,0
>     12dc:	4e 80 00 20 	blr

That is because the asm (unlike the builtin) cannot be optimised by the
compiler.


Segher

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto
@ 2021-08-26 13:47           ` Segher Boessenkool
  0 siblings, 0 replies; 48+ messages in thread
From: Segher Boessenkool @ 2021-08-26 13:47 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Michael Ellerman, Nathan Chancellor, llvm, linux-kernel,
	clang-built-linux, Paul Mackerras, linuxppc-dev

On Thu, Aug 26, 2021 at 08:37:09AM +0200, Christophe Leroy wrote:
> Le 26/08/2021 à 05:21, Michael Ellerman a écrit :
> >This patch seems to fix it. Not sure if that's just papering over it 
> >though.
> >
> >diff --git a/arch/powerpc/include/asm/bug.h 
> >b/arch/powerpc/include/asm/bug.h
> >index 1ee0f22313ee..75fcb4370d96 100644
> >--- a/arch/powerpc/include/asm/bug.h
> >+++ b/arch/powerpc/include/asm/bug.h
> >@@ -119,7 +119,7 @@ __label_warn_on:					 \
> >  								\
> >  			WARN_ENTRY(PPC_TLNEI " %4, 0",		\
> >  				   BUGFLAG_WARNING | 
> >  				   BUGFLAG_TAINT(TAINT_WARN),	\
> >-				   __label_warn_on, "r" (x));	\
> >+				   __label_warn_on, "r" (!!(x))); \
> >  			break;					\
> >  __label_warn_on:						\
> >  			__ret_warn_on = true;			\
> 
> But for a simple WARN_ON() call:
> 
> void test(unsigned long b)
> {
> 	WARN_ON(b);
> }
> 
> Without your change with GCC you get:
> 
> 00000000000012d0 <.test>:
>     12d0:	0b 03 00 00 	tdnei   r3,0
>     12d4:	4e 80 00 20 	blr
> 
> 
> With the !! change you get:
> 
> 00000000000012d0 <.test>:
>     12d0:	31 23 ff ff 	addic   r9,r3,-1
>     12d4:	7d 29 19 10 	subfe   r9,r9,r3
>     12d8:	0b 09 00 00 	tdnei   r9,0
>     12dc:	4e 80 00 20 	blr

That is because the asm (unlike the builtin) cannot be optimised by the
compiler.


Segher

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32
  2021-08-26 12:49         ` Segher Boessenkool
@ 2021-08-26 13:57           ` Nicholas Piggin
  -1 siblings, 0 replies; 48+ messages in thread
From: Nicholas Piggin @ 2021-08-26 13:57 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Benjamin Herrenschmidt, Christophe Leroy, linux-kernel,
	linuxppc-dev, Michael Ellerman, Paul Mackerras

Excerpts from Segher Boessenkool's message of August 26, 2021 10:49 pm:
> Hi!
> 
> On Thu, Aug 26, 2021 at 01:26:14PM +1000, Nicholas Piggin wrote:
>> Excerpts from Segher Boessenkool's message of August 19, 2021 1:06 am:
>> > On Fri, Aug 13, 2021 at 04:08:13PM +1000, Nicholas Piggin wrote:
>> >> This one possibly the branches end up in predictors, whereas conditional 
>> >> trap is always just speculated not to hit. Branches may also have a
>> >> throughput limit on execution whereas trap could be more (1 per cycle
>> >> vs 4 per cycle on POWER9).
>> > 
>> > I thought only *taken* branches are just one per cycle?
>> 
>> Taken branches are fetched by the front end at one per cycle (assuming 
>> they hit the BTAC), but all branches have to be executed by BR at one 
>> per cycle
> 
> This is not true.  (Simple) predicted not-taken conditional branches are
> just folded out, never hit the issue queues.  And they are fetched as
> many together as fit in a fetch group, can complete without limits as
> well.

No, they are all dispatched and issue to the BRU for execution. It's 
trivial to construct a test of a lot of not taken branches in a row
and time a loop of it to see it executes at 1 cycle per branch.

> The BTAC is a frontend thing, used for target address prediction.  It
> does not limit execution.

I didn't say it did.

> Correctly predicted simple conditional branches just get their prediction
> validated (and that is not done in the execution units).  Incorrectly
> predicted branches the same, but those cause a redirect and refetch.

How could it validate prediction without issuing? It wouldn't know when
sources are ready.

> 
>> > Internally *all* traps are conditional, in GCC.  It also can optimise
>> > them quite well.  There must be something in the kernel macros that
>> > prevents good optimisation.
>> 
>> I did take a look at it at one point.
>> 
>> One problem is that the kernel needs the address of the trap instruction 
>> to create the entry for it. The other problem is that __builtin_trap 
>> does not return so it can't be used for WARN. LLVM at least seems to 
>> have a __builtin_debugtrap which does return.
> 
> This is <https://gcc.gnu.org/PR99299>.

Aha.

>> The first problem seems like the show stopper though. AFAIKS it would 
>> need a special builtin support that does something to create the table
>> entry, or a guarantee that we could put an inline asm right after the
>> builtin as a recognized pattern and that would give us the instruction
>> following the trap.
> 
> I'm not quite sure what this means.  Can't you always just put a
> 
> bla:	asm("");
> 
> in there, and use the address of "bla"?

Not AFAIKS. Put it where?

> If not, you need to say a lot
> more about what you actually want to do :-/

We need to put the address (or relative offset) of the trap instruction
into an entry in a different section. Basically this:

   __builtin_trap();
   asm ("1:                               \n\t"
        "    .section __bug_table,\"aw\"  \n\t"
        "2:  .4byte 1b - 2b - 4           \n\t"
        "    .previous");

Where the 1: label must follow immediately after the trap instruction, 
and where the asm must be emitted even for the case of no-return traps 
(but anything following the asm could be omitted).

Thanks,
Nick

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32
@ 2021-08-26 13:57           ` Nicholas Piggin
  0 siblings, 0 replies; 48+ messages in thread
From: Nicholas Piggin @ 2021-08-26 13:57 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: linux-kernel, Paul Mackerras, linuxppc-dev

Excerpts from Segher Boessenkool's message of August 26, 2021 10:49 pm:
> Hi!
> 
> On Thu, Aug 26, 2021 at 01:26:14PM +1000, Nicholas Piggin wrote:
>> Excerpts from Segher Boessenkool's message of August 19, 2021 1:06 am:
>> > On Fri, Aug 13, 2021 at 04:08:13PM +1000, Nicholas Piggin wrote:
>> >> This one possibly the branches end up in predictors, whereas conditional 
>> >> trap is always just speculated not to hit. Branches may also have a
>> >> throughput limit on execution whereas trap could be more (1 per cycle
>> >> vs 4 per cycle on POWER9).
>> > 
>> > I thought only *taken* branches are just one per cycle?
>> 
>> Taken branches are fetched by the front end at one per cycle (assuming 
>> they hit the BTAC), but all branches have to be executed by BR at one 
>> per cycle
> 
> This is not true.  (Simple) predicted not-taken conditional branches are
> just folded out, never hit the issue queues.  And they are fetched as
> many together as fit in a fetch group, can complete without limits as
> well.

No, they are all dispatched and issue to the BRU for execution. It's 
trivial to construct a test of a lot of not taken branches in a row
and time a loop of it to see it executes at 1 cycle per branch.

> The BTAC is a frontend thing, used for target address prediction.  It
> does not limit execution.

I didn't say it did.

> Correctly predicted simple conditional branches just get their prediction
> validated (and that is not done in the execution units).  Incorrectly
> predicted branches the same, but those cause a redirect and refetch.

How could it validate prediction without issuing? It wouldn't know when
sources are ready.

> 
>> > Internally *all* traps are conditional, in GCC.  It also can optimise
>> > them quite well.  There must be something in the kernel macros that
>> > prevents good optimisation.
>> 
>> I did take a look at it at one point.
>> 
>> One problem is that the kernel needs the address of the trap instruction 
>> to create the entry for it. The other problem is that __builtin_trap 
>> does not return so it can't be used for WARN. LLVM at least seems to 
>> have a __builtin_debugtrap which does return.
> 
> This is <https://gcc.gnu.org/PR99299>.

Aha.

>> The first problem seems like the show stopper though. AFAIKS it would 
>> need a special builtin support that does something to create the table
>> entry, or a guarantee that we could put an inline asm right after the
>> builtin as a recognized pattern and that would give us the instruction
>> following the trap.
> 
> I'm not quite sure what this means.  Can't you always just put a
> 
> bla:	asm("");
> 
> in there, and use the address of "bla"?

Not AFAIKS. Put it where?

> If not, you need to say a lot
> more about what you actually want to do :-/

We need to put the address (or relative offset) of the trap instruction
into an entry in a different section. Basically this:

   __builtin_trap();
   asm ("1:                               \n\t"
        "    .section __bug_table,\"aw\"  \n\t"
        "2:  .4byte 1b - 2b - 4           \n\t"
        "    .previous");

Where the 1: label must follow immediately after the trap instruction, 
and where the asm must be emitted even for the case of no-return traps 
(but anything following the asm could be omitted).

Thanks,
Nick

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto
  2021-08-26  3:21       ` Michael Ellerman
@ 2021-08-26 14:12         ` Segher Boessenkool
  -1 siblings, 0 replies; 48+ messages in thread
From: Segher Boessenkool @ 2021-08-26 14:12 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Nathan Chancellor, Christophe Leroy, llvm, linux-kernel,
	clang-built-linux, Paul Mackerras, linuxppc-dev

On Thu, Aug 26, 2021 at 01:21:39PM +1000, Michael Ellerman wrote:
> So there seems to be some misunderstanding with clang, it doesn't like
> us passing an expression to the inline asm.
> 
> AFAIK it is legal to pass expressions as inputs to inline asm, ie. it
> doesn't have to just be a variable name.

It certainly is.  That is the whole point of inline asm!  This way, all
of the C code "around" the asm can be optimised.

> This patch seems to fix it. Not sure if that's just papering over it though.

It is, and it makes less optimised code (also on GCC), as Christophe
points out.


Segher

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto
@ 2021-08-26 14:12         ` Segher Boessenkool
  0 siblings, 0 replies; 48+ messages in thread
From: Segher Boessenkool @ 2021-08-26 14:12 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: llvm, linux-kernel, Nathan Chancellor, clang-built-linux,
	Paul Mackerras, linuxppc-dev

On Thu, Aug 26, 2021 at 01:21:39PM +1000, Michael Ellerman wrote:
> So there seems to be some misunderstanding with clang, it doesn't like
> us passing an expression to the inline asm.
> 
> AFAIK it is legal to pass expressions as inputs to inline asm, ie. it
> doesn't have to just be a variable name.

It certainly is.  That is the whole point of inline asm!  This way, all
of the C code "around" the asm can be optimised.

> This patch seems to fix it. Not sure if that's just papering over it though.

It is, and it makes less optimised code (also on GCC), as Christophe
points out.


Segher

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32
  2021-08-26 13:57           ` Nicholas Piggin
@ 2021-08-26 14:37             ` Segher Boessenkool
  -1 siblings, 0 replies; 48+ messages in thread
From: Segher Boessenkool @ 2021-08-26 14:37 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: linux-kernel, Paul Mackerras, linuxppc-dev

On Thu, Aug 26, 2021 at 11:57:52PM +1000, Nicholas Piggin wrote:
> Excerpts from Segher Boessenkool's message of August 26, 2021 10:49 pm:
> > On Thu, Aug 26, 2021 at 01:26:14PM +1000, Nicholas Piggin wrote:
> >> Excerpts from Segher Boessenkool's message of August 19, 2021 1:06 am:
> >> > On Fri, Aug 13, 2021 at 04:08:13PM +1000, Nicholas Piggin wrote:
> >> >> This one possibly the branches end up in predictors, whereas conditional 
> >> >> trap is always just speculated not to hit. Branches may also have a
> >> >> throughput limit on execution whereas trap could be more (1 per cycle
> >> >> vs 4 per cycle on POWER9).
> >> > 
> >> > I thought only *taken* branches are just one per cycle?
> >> 
> >> Taken branches are fetched by the front end at one per cycle (assuming 
> >> they hit the BTAC), but all branches have to be executed by BR at one 
> >> per cycle
> > 
> > This is not true.  (Simple) predicted not-taken conditional branches are
> > just folded out, never hit the issue queues.  And they are fetched as
> > many together as fit in a fetch group, can complete without limits as
> > well.
> 
> No, they are all dispatched and issue to the BRU for execution. It's 
> trivial to construct a test of a lot of not taken branches in a row
> and time a loop of it to see it executes at 1 cycle per branch.

(s/dispatched/issued/)

Huh, this was true on p8 already.  Sorry for my confusion.  In my
defence, this doesn't matter for performance on "real code".

> > Correctly predicted simple conditional branches just get their prediction
> > validated (and that is not done in the execution units).  Incorrectly
> > predicted branches the same, but those cause a redirect and refetch.
> 
> How could it validate prediction without issuing? It wouldn't know when
> sources are ready.

In the backend.  But that is just how it worked on older cores :-/

> >> The first problem seems like the show stopper though. AFAIKS it would 
> >> need a special builtin support that does something to create the table
> >> entry, or a guarantee that we could put an inline asm right after the
> >> builtin as a recognized pattern and that would give us the instruction
> >> following the trap.
> > 
> > I'm not quite sure what this means.  Can't you always just put a
> > 
> > bla:	asm("");
> > 
> > in there, and use the address of "bla"?
> 
> Not AFAIKS. Put it where?

After wherever you want to know the address after.  You will have to
make sure they stay together somehow.

It is much easier to get the address of something, not the address after
it.  If you know it is just one insn anyway, that isn't hard to handle
either (even if prefixed insns exist).

> > If not, you need to say a lot
> > more about what you actually want to do :-/
> 
> We need to put the address (or relative offset) of the trap instruction
> into an entry in a different section. Basically this:
> 
>    __builtin_trap();
>    asm ("1:                               \n\t"
>         "    .section __bug_table,\"aw\"  \n\t"
>         "2:  .4byte 1b - 2b - 4           \n\t"
>         "    .previous");
> 
> Where the 1: label must follow immediately after the trap instruction, 
> and where the asm must be emitted even for the case of no-return traps 
> (but anything following the asm could be omitted).

The address *after* the trap insn?  That is much much harder than the
address *of* the trap insn.


Segher

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32
@ 2021-08-26 14:37             ` Segher Boessenkool
  0 siblings, 0 replies; 48+ messages in thread
From: Segher Boessenkool @ 2021-08-26 14:37 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Benjamin Herrenschmidt, Christophe Leroy, linux-kernel,
	linuxppc-dev, Michael Ellerman, Paul Mackerras

On Thu, Aug 26, 2021 at 11:57:52PM +1000, Nicholas Piggin wrote:
> Excerpts from Segher Boessenkool's message of August 26, 2021 10:49 pm:
> > On Thu, Aug 26, 2021 at 01:26:14PM +1000, Nicholas Piggin wrote:
> >> Excerpts from Segher Boessenkool's message of August 19, 2021 1:06 am:
> >> > On Fri, Aug 13, 2021 at 04:08:13PM +1000, Nicholas Piggin wrote:
> >> >> This one possibly the branches end up in predictors, whereas conditional 
> >> >> trap is always just speculated not to hit. Branches may also have a
> >> >> throughput limit on execution whereas trap could be more (1 per cycle
> >> >> vs 4 per cycle on POWER9).
> >> > 
> >> > I thought only *taken* branches are just one per cycle?
> >> 
> >> Taken branches are fetched by the front end at one per cycle (assuming 
> >> they hit the BTAC), but all branches have to be executed by BR at one 
> >> per cycle
> > 
> > This is not true.  (Simple) predicted not-taken conditional branches are
> > just folded out, never hit the issue queues.  And they are fetched as
> > many together as fit in a fetch group, can complete without limits as
> > well.
> 
> No, they are all dispatched and issue to the BRU for execution. It's 
> trivial to construct a test of a lot of not taken branches in a row
> and time a loop of it to see it executes at 1 cycle per branch.

(s/dispatched/issued/)

Huh, this was true on p8 already.  Sorry for my confusion.  In my
defence, this doesn't matter for performance on "real code".

> > Correctly predicted simple conditional branches just get their prediction
> > validated (and that is not done in the execution units).  Incorrectly
> > predicted branches the same, but those cause a redirect and refetch.
> 
> How could it validate prediction without issuing? It wouldn't know when
> sources are ready.

In the backend.  But that is just how it worked on older cores :-/

> >> The first problem seems like the show stopper though. AFAIKS it would 
> >> need a special builtin support that does something to create the table
> >> entry, or a guarantee that we could put an inline asm right after the
> >> builtin as a recognized pattern and that would give us the instruction
> >> following the trap.
> > 
> > I'm not quite sure what this means.  Can't you always just put a
> > 
> > bla:	asm("");
> > 
> > in there, and use the address of "bla"?
> 
> Not AFAIKS. Put it where?

After wherever you want to know the address after.  You will have to
make sure they stay together somehow.

It is much easier to get the address of something, not the address after
it.  If you know it is just one insn anyway, that isn't hard to handle
either (even if prefixed insns exist).

> > If not, you need to say a lot
> > more about what you actually want to do :-/
> 
> We need to put the address (or relative offset) of the trap instruction
> into an entry in a different section. Basically this:
> 
>    __builtin_trap();
>    asm ("1:                               \n\t"
>         "    .section __bug_table,\"aw\"  \n\t"
>         "2:  .4byte 1b - 2b - 4           \n\t"
>         "    .previous");
> 
> Where the 1: label must follow immediately after the trap instruction, 
> and where the asm must be emitted even for the case of no-return traps 
> (but anything following the asm could be omitted).

The address *after* the trap insn?  That is much much harder than the
address *of* the trap insn.


Segher

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto
  2021-08-26  6:37         ` Christophe Leroy
@ 2021-08-26 14:45           ` Michael Ellerman
  -1 siblings, 0 replies; 48+ messages in thread
From: Michael Ellerman @ 2021-08-26 14:45 UTC (permalink / raw)
  To: Christophe Leroy, Nathan Chancellor
  Cc: Benjamin Herrenschmidt, Paul Mackerras, linux-kernel,
	linuxppc-dev, clang-built-linux, llvm

Christophe Leroy <christophe.leroy@csgroup.eu> writes:
> Le 26/08/2021 à 05:21, Michael Ellerman a écrit :
>> Nathan Chancellor <nathan@kernel.org> writes:
>>> On Tue, Apr 13, 2021 at 04:38:10PM +0000, Christophe Leroy wrote:
>>>> Using asm goto in __WARN_FLAGS() and WARN_ON() allows more
>>>> flexibility to GCC.
>> ...
>>>
>>> This patch as commit 1e688dd2a3d6 ("powerpc/bug: Provide better
>>> flexibility to WARN_ON/__WARN_FLAGS() with asm goto") cause a WARN_ON in
>>> klist_add_tail to trigger over and over on boot when compiling with
>>> clang:
>
> ...
>
>> 
>> This patch seems to fix it. Not sure if that's just papering over it though.
>> 
>> diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
>> index 1ee0f22313ee..75fcb4370d96 100644
>> --- a/arch/powerpc/include/asm/bug.h
>> +++ b/arch/powerpc/include/asm/bug.h
>> @@ -119,7 +119,7 @@ __label_warn_on:						\
>>   								\
>>   			WARN_ENTRY(PPC_TLNEI " %4, 0",		\
>>   				   BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN),	\
>> -				   __label_warn_on, "r" (x));	\
>> +				   __label_warn_on, "r" (!!(x))); \
>>   			break;					\
>>   __label_warn_on:						\
>>   			__ret_warn_on = true;			\
>
> But for a simple WARN_ON() call:
>
> void test(unsigned long b)
> {
> 	WARN_ON(b);
> }
>
> Without your change with GCC you get:
>
> 00000000000012d0 <.test>:
>      12d0:	0b 03 00 00 	tdnei   r3,0
>      12d4:	4e 80 00 20 	blr
>
>
> With the !! change you get:
>
> 00000000000012d0 <.test>:
>      12d0:	31 23 ff ff 	addic   r9,r3,-1
>      12d4:	7d 29 19 10 	subfe   r9,r9,r3
>      12d8:	0b 09 00 00 	tdnei   r9,0
>      12dc:	4e 80 00 20 	blr

Yeah that's a pity.

We could do something like below, which is ugly, but would be better
than having to revert the whole thing.

Although this doesn't fix the strange warning in drivers/net/ethernet/sfc.

So possibly we need a CLANG ifdef around the whole thing, and use the
old style warn for clang.

cheers


diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
index 1ee0f22313ee..d978d9004d0d 100644
--- a/arch/powerpc/include/asm/bug.h
+++ b/arch/powerpc/include/asm/bug.h
@@ -106,6 +106,12 @@ __label_warn_on:						\
 	}							\
 } while (0)
 
+#ifdef CONFIG_CC_IS_CLANG
+#define __clang_warn_hack(x)	(!!(x))
+#else
+#define __clang_warn_hack(x)	(x)
+#endif
+
 #define WARN_ON(x) ({						\
 	bool __ret_warn_on = false;				\
 	do {							\
@@ -119,7 +125,8 @@ __label_warn_on:						\
 								\
 			WARN_ENTRY(PPC_TLNEI " %4, 0",		\
 				   BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN),	\
-				   __label_warn_on, "r" (x));	\
+				   __label_warn_on,		\
+				   "r" __clang_warn_hack(x));	\
 			break;					\
 __label_warn_on:						\
 			__ret_warn_on = true;			\



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto
@ 2021-08-26 14:45           ` Michael Ellerman
  0 siblings, 0 replies; 48+ messages in thread
From: Michael Ellerman @ 2021-08-26 14:45 UTC (permalink / raw)
  To: Christophe Leroy, Nathan Chancellor
  Cc: llvm, linux-kernel, clang-built-linux, Paul Mackerras, linuxppc-dev

Christophe Leroy <christophe.leroy@csgroup.eu> writes:
> Le 26/08/2021 à 05:21, Michael Ellerman a écrit :
>> Nathan Chancellor <nathan@kernel.org> writes:
>>> On Tue, Apr 13, 2021 at 04:38:10PM +0000, Christophe Leroy wrote:
>>>> Using asm goto in __WARN_FLAGS() and WARN_ON() allows more
>>>> flexibility to GCC.
>> ...
>>>
>>> This patch as commit 1e688dd2a3d6 ("powerpc/bug: Provide better
>>> flexibility to WARN_ON/__WARN_FLAGS() with asm goto") cause a WARN_ON in
>>> klist_add_tail to trigger over and over on boot when compiling with
>>> clang:
>
> ...
>
>> 
>> This patch seems to fix it. Not sure if that's just papering over it though.
>> 
>> diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
>> index 1ee0f22313ee..75fcb4370d96 100644
>> --- a/arch/powerpc/include/asm/bug.h
>> +++ b/arch/powerpc/include/asm/bug.h
>> @@ -119,7 +119,7 @@ __label_warn_on:						\
>>   								\
>>   			WARN_ENTRY(PPC_TLNEI " %4, 0",		\
>>   				   BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN),	\
>> -				   __label_warn_on, "r" (x));	\
>> +				   __label_warn_on, "r" (!!(x))); \
>>   			break;					\
>>   __label_warn_on:						\
>>   			__ret_warn_on = true;			\
>
> But for a simple WARN_ON() call:
>
> void test(unsigned long b)
> {
> 	WARN_ON(b);
> }
>
> Without your change with GCC you get:
>
> 00000000000012d0 <.test>:
>      12d0:	0b 03 00 00 	tdnei   r3,0
>      12d4:	4e 80 00 20 	blr
>
>
> With the !! change you get:
>
> 00000000000012d0 <.test>:
>      12d0:	31 23 ff ff 	addic   r9,r3,-1
>      12d4:	7d 29 19 10 	subfe   r9,r9,r3
>      12d8:	0b 09 00 00 	tdnei   r9,0
>      12dc:	4e 80 00 20 	blr

Yeah that's a pity.

We could do something like below, which is ugly, but would be better
than having to revert the whole thing.

Although this doesn't fix the strange warning in drivers/net/ethernet/sfc.

So possibly we need a CLANG ifdef around the whole thing, and use the
old style warn for clang.

cheers


diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
index 1ee0f22313ee..d978d9004d0d 100644
--- a/arch/powerpc/include/asm/bug.h
+++ b/arch/powerpc/include/asm/bug.h
@@ -106,6 +106,12 @@ __label_warn_on:						\
 	}							\
 } while (0)
 
+#ifdef CONFIG_CC_IS_CLANG
+#define __clang_warn_hack(x)	(!!(x))
+#else
+#define __clang_warn_hack(x)	(x)
+#endif
+
 #define WARN_ON(x) ({						\
 	bool __ret_warn_on = false;				\
 	do {							\
@@ -119,7 +125,8 @@ __label_warn_on:						\
 								\
 			WARN_ENTRY(PPC_TLNEI " %4, 0",		\
 				   BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN),	\
-				   __label_warn_on, "r" (x));	\
+				   __label_warn_on,		\
+				   "r" __clang_warn_hack(x));	\
 			break;					\
 __label_warn_on:						\
 			__ret_warn_on = true;			\



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto
  2021-08-26 14:45           ` Michael Ellerman
@ 2021-08-26 14:53             ` Christophe Leroy
  -1 siblings, 0 replies; 48+ messages in thread
From: Christophe Leroy @ 2021-08-26 14:53 UTC (permalink / raw)
  To: Michael Ellerman, Nathan Chancellor
  Cc: Benjamin Herrenschmidt, Paul Mackerras, linux-kernel,
	linuxppc-dev, clang-built-linux, llvm



Le 26/08/2021 à 16:45, Michael Ellerman a écrit :
> Christophe Leroy <christophe.leroy@csgroup.eu> writes:
>> Le 26/08/2021 à 05:21, Michael Ellerman a écrit :
>>> Nathan Chancellor <nathan@kernel.org> writes:
>>>> On Tue, Apr 13, 2021 at 04:38:10PM +0000, Christophe Leroy wrote:
>>>>> Using asm goto in __WARN_FLAGS() and WARN_ON() allows more
>>>>> flexibility to GCC.
>>> ...
>>>>
>>>> This patch as commit 1e688dd2a3d6 ("powerpc/bug: Provide better
>>>> flexibility to WARN_ON/__WARN_FLAGS() with asm goto") cause a WARN_ON in
>>>> klist_add_tail to trigger over and over on boot when compiling with
>>>> clang:
>>
>> ...
>>
>>>
>>> This patch seems to fix it. Not sure if that's just papering over it though.
>>>
>>> diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
>>> index 1ee0f22313ee..75fcb4370d96 100644
>>> --- a/arch/powerpc/include/asm/bug.h
>>> +++ b/arch/powerpc/include/asm/bug.h
>>> @@ -119,7 +119,7 @@ __label_warn_on:						\
>>>    								\
>>>    			WARN_ENTRY(PPC_TLNEI " %4, 0",		\
>>>    				   BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN),	\
>>> -				   __label_warn_on, "r" (x));	\
>>> +				   __label_warn_on, "r" (!!(x))); \
>>>    			break;					\
>>>    __label_warn_on:						\
>>>    			__ret_warn_on = true;			\
>>
>> But for a simple WARN_ON() call:
>>
>> void test(unsigned long b)
>> {
>> 	WARN_ON(b);
>> }
>>
>> Without your change with GCC you get:
>>
>> 00000000000012d0 <.test>:
>>       12d0:	0b 03 00 00 	tdnei   r3,0
>>       12d4:	4e 80 00 20 	blr
>>
>>
>> With the !! change you get:
>>
>> 00000000000012d0 <.test>:
>>       12d0:	31 23 ff ff 	addic   r9,r3,-1
>>       12d4:	7d 29 19 10 	subfe   r9,r9,r3
>>       12d8:	0b 09 00 00 	tdnei   r9,0
>>       12dc:	4e 80 00 20 	blr
> 
> Yeah that's a pity.
> 
> We could do something like below, which is ugly, but would be better
> than having to revert the whole thing.

Yes looks nice, we already had that kind of stuff in the past, even more ugly.

> 
> Although this doesn't fix the strange warning in drivers/net/ethernet/sfc.

What's the warning ?


> 
> So possibly we need a CLANG ifdef around the whole thing, and use the
> old style warn for clang.
> 

Christophe

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto
@ 2021-08-26 14:53             ` Christophe Leroy
  0 siblings, 0 replies; 48+ messages in thread
From: Christophe Leroy @ 2021-08-26 14:53 UTC (permalink / raw)
  To: Michael Ellerman, Nathan Chancellor
  Cc: llvm, linux-kernel, clang-built-linux, Paul Mackerras, linuxppc-dev



Le 26/08/2021 à 16:45, Michael Ellerman a écrit :
> Christophe Leroy <christophe.leroy@csgroup.eu> writes:
>> Le 26/08/2021 à 05:21, Michael Ellerman a écrit :
>>> Nathan Chancellor <nathan@kernel.org> writes:
>>>> On Tue, Apr 13, 2021 at 04:38:10PM +0000, Christophe Leroy wrote:
>>>>> Using asm goto in __WARN_FLAGS() and WARN_ON() allows more
>>>>> flexibility to GCC.
>>> ...
>>>>
>>>> This patch as commit 1e688dd2a3d6 ("powerpc/bug: Provide better
>>>> flexibility to WARN_ON/__WARN_FLAGS() with asm goto") cause a WARN_ON in
>>>> klist_add_tail to trigger over and over on boot when compiling with
>>>> clang:
>>
>> ...
>>
>>>
>>> This patch seems to fix it. Not sure if that's just papering over it though.
>>>
>>> diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
>>> index 1ee0f22313ee..75fcb4370d96 100644
>>> --- a/arch/powerpc/include/asm/bug.h
>>> +++ b/arch/powerpc/include/asm/bug.h
>>> @@ -119,7 +119,7 @@ __label_warn_on:						\
>>>    								\
>>>    			WARN_ENTRY(PPC_TLNEI " %4, 0",		\
>>>    				   BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN),	\
>>> -				   __label_warn_on, "r" (x));	\
>>> +				   __label_warn_on, "r" (!!(x))); \
>>>    			break;					\
>>>    __label_warn_on:						\
>>>    			__ret_warn_on = true;			\
>>
>> But for a simple WARN_ON() call:
>>
>> void test(unsigned long b)
>> {
>> 	WARN_ON(b);
>> }
>>
>> Without your change with GCC you get:
>>
>> 00000000000012d0 <.test>:
>>       12d0:	0b 03 00 00 	tdnei   r3,0
>>       12d4:	4e 80 00 20 	blr
>>
>>
>> With the !! change you get:
>>
>> 00000000000012d0 <.test>:
>>       12d0:	31 23 ff ff 	addic   r9,r3,-1
>>       12d4:	7d 29 19 10 	subfe   r9,r9,r3
>>       12d8:	0b 09 00 00 	tdnei   r9,0
>>       12dc:	4e 80 00 20 	blr
> 
> Yeah that's a pity.
> 
> We could do something like below, which is ugly, but would be better
> than having to revert the whole thing.

Yes looks nice, we already had that kind of stuff in the past, even more ugly.

> 
> Although this doesn't fix the strange warning in drivers/net/ethernet/sfc.

What's the warning ?


> 
> So possibly we need a CLANG ifdef around the whole thing, and use the
> old style warn for clang.
> 

Christophe

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32
  2021-08-26 14:37             ` Segher Boessenkool
@ 2021-08-26 15:04               ` Nicholas Piggin
  -1 siblings, 0 replies; 48+ messages in thread
From: Nicholas Piggin @ 2021-08-26 15:04 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Benjamin Herrenschmidt, Christophe Leroy, linux-kernel,
	linuxppc-dev, Michael Ellerman, Paul Mackerras

Excerpts from Segher Boessenkool's message of August 27, 2021 12:37 am:
> On Thu, Aug 26, 2021 at 11:57:52PM +1000, Nicholas Piggin wrote:
>> Excerpts from Segher Boessenkool's message of August 26, 2021 10:49 pm:
>> > On Thu, Aug 26, 2021 at 01:26:14PM +1000, Nicholas Piggin wrote:
>> >> Excerpts from Segher Boessenkool's message of August 19, 2021 1:06 am:
>> >> > On Fri, Aug 13, 2021 at 04:08:13PM +1000, Nicholas Piggin wrote:
>> >> >> This one possibly the branches end up in predictors, whereas conditional 
>> >> >> trap is always just speculated not to hit. Branches may also have a
>> >> >> throughput limit on execution whereas trap could be more (1 per cycle
>> >> >> vs 4 per cycle on POWER9).
>> >> > 
>> >> > I thought only *taken* branches are just one per cycle?
>> >> 
>> >> Taken branches are fetched by the front end at one per cycle (assuming 
>> >> they hit the BTAC), but all branches have to be executed by BR at one 
>> >> per cycle
>> > 
>> > This is not true.  (Simple) predicted not-taken conditional branches are
>> > just folded out, never hit the issue queues.  And they are fetched as
>> > many together as fit in a fetch group, can complete without limits as
>> > well.
>> 
>> No, they are all dispatched and issue to the BRU for execution. It's 
>> trivial to construct a test of a lot of not taken branches in a row
>> and time a loop of it to see it executes at 1 cycle per branch.
> 
> (s/dispatched/issued/)

?

> 
> Huh, this was true on p8 already.  Sorry for my confusion.  In my
> defence, this doesn't matter for performance on "real code".
> 
>> > Correctly predicted simple conditional branches just get their prediction
>> > validated (and that is not done in the execution units).  Incorrectly
>> > predicted branches the same, but those cause a redirect and refetch.
>> 
>> How could it validate prediction without issuing? It wouldn't know when
>> sources are ready.
> 
> In the backend.  But that is just how it worked on older cores :-/

Okay. I don't know about older cores than POWER9. Backend would normally 
include execution though. Only other place you could do it if you don't
issue/exec would be after it goes back in order, like completion. But
that would be horrible for mispredict penalty.

>> >> The first problem seems like the show stopper though. AFAIKS it would 
>> >> need a special builtin support that does something to create the table
>> >> entry, or a guarantee that we could put an inline asm right after the
>> >> builtin as a recognized pattern and that would give us the instruction
>> >> following the trap.
>> > 
>> > I'm not quite sure what this means.  Can't you always just put a
>> > 
>> > bla:	asm("");
>> > 
>> > in there, and use the address of "bla"?
>> 
>> Not AFAIKS. Put it where?
> 
> After wherever you want to know the address after.  You will have to
> make sure they stay together somehow.

I still don't follow.

> It is much easier to get the address of something, not the address after
> it.  If you know it is just one insn anyway, that isn't hard to handle
> either (even if prefixed insns exist).
> 
>> > If not, you need to say a lot
>> > more about what you actually want to do :-/
>> 
>> We need to put the address (or relative offset) of the trap instruction
>> into an entry in a different section. Basically this:
>> 
>>    __builtin_trap();
>>    asm ("1:                               \n\t"
>>         "    .section __bug_table,\"aw\"  \n\t"
>>         "2:  .4byte 1b - 2b - 4           \n\t"
>>         "    .previous");
>> 
>> Where the 1: label must follow immediately after the trap instruction, 
>> and where the asm must be emitted even for the case of no-return traps 
>> (but anything following the asm could be omitted).
> 
> The address *after* the trap insn?  That is much much harder than the
> address *of* the trap insn.

No the address of the trap instruction, hence the -4. I have the label
after because that is the semantics I have from inline asm.

If you could give a built in that put a label at the address of the trap 
instruction that could be used later by inline asm then that could work
too:

    __builtin_labeled_trap("1:");
    asm ("    .section __bug_table,\"aw\"  \n\t"
         "2:  .4byte 1b - 2b               \n\t"
         "    .previous");

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32
@ 2021-08-26 15:04               ` Nicholas Piggin
  0 siblings, 0 replies; 48+ messages in thread
From: Nicholas Piggin @ 2021-08-26 15:04 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: linux-kernel, Paul Mackerras, linuxppc-dev

Excerpts from Segher Boessenkool's message of August 27, 2021 12:37 am:
> On Thu, Aug 26, 2021 at 11:57:52PM +1000, Nicholas Piggin wrote:
>> Excerpts from Segher Boessenkool's message of August 26, 2021 10:49 pm:
>> > On Thu, Aug 26, 2021 at 01:26:14PM +1000, Nicholas Piggin wrote:
>> >> Excerpts from Segher Boessenkool's message of August 19, 2021 1:06 am:
>> >> > On Fri, Aug 13, 2021 at 04:08:13PM +1000, Nicholas Piggin wrote:
>> >> >> This one possibly the branches end up in predictors, whereas conditional 
>> >> >> trap is always just speculated not to hit. Branches may also have a
>> >> >> throughput limit on execution whereas trap could be more (1 per cycle
>> >> >> vs 4 per cycle on POWER9).
>> >> > 
>> >> > I thought only *taken* branches are just one per cycle?
>> >> 
>> >> Taken branches are fetched by the front end at one per cycle (assuming 
>> >> they hit the BTAC), but all branches have to be executed by BR at one 
>> >> per cycle
>> > 
>> > This is not true.  (Simple) predicted not-taken conditional branches are
>> > just folded out, never hit the issue queues.  And they are fetched as
>> > many together as fit in a fetch group, can complete without limits as
>> > well.
>> 
>> No, they are all dispatched and issue to the BRU for execution. It's 
>> trivial to construct a test of a lot of not taken branches in a row
>> and time a loop of it to see it executes at 1 cycle per branch.
> 
> (s/dispatched/issued/)

?

> 
> Huh, this was true on p8 already.  Sorry for my confusion.  In my
> defence, this doesn't matter for performance on "real code".
> 
>> > Correctly predicted simple conditional branches just get their prediction
>> > validated (and that is not done in the execution units).  Incorrectly
>> > predicted branches the same, but those cause a redirect and refetch.
>> 
>> How could it validate prediction without issuing? It wouldn't know when
>> sources are ready.
> 
> In the backend.  But that is just how it worked on older cores :-/

Okay. I don't know about older cores than POWER9. Backend would normally 
include execution though. Only other place you could do it if you don't
issue/exec would be after it goes back in order, like completion. But
that would be horrible for mispredict penalty.

>> >> The first problem seems like the show stopper though. AFAIKS it would 
>> >> need a special builtin support that does something to create the table
>> >> entry, or a guarantee that we could put an inline asm right after the
>> >> builtin as a recognized pattern and that would give us the instruction
>> >> following the trap.
>> > 
>> > I'm not quite sure what this means.  Can't you always just put a
>> > 
>> > bla:	asm("");
>> > 
>> > in there, and use the address of "bla"?
>> 
>> Not AFAIKS. Put it where?
> 
> After wherever you want to know the address after.  You will have to
> make sure they stay together somehow.

I still don't follow.

> It is much easier to get the address of something, not the address after
> it.  If you know it is just one insn anyway, that isn't hard to handle
> either (even if prefixed insns exist).
> 
>> > If not, you need to say a lot
>> > more about what you actually want to do :-/
>> 
>> We need to put the address (or relative offset) of the trap instruction
>> into an entry in a different section. Basically this:
>> 
>>    __builtin_trap();
>>    asm ("1:                               \n\t"
>>         "    .section __bug_table,\"aw\"  \n\t"
>>         "2:  .4byte 1b - 2b - 4           \n\t"
>>         "    .previous");
>> 
>> Where the 1: label must follow immediately after the trap instruction, 
>> and where the asm must be emitted even for the case of no-return traps 
>> (but anything following the asm could be omitted).
> 
> The address *after* the trap insn?  That is much much harder than the
> address *of* the trap insn.

No the address of the trap instruction, hence the -4. I have the label
after because that is the semantics I have from inline asm.

If you could give a built in that put a label at the address of the trap 
instruction that could be used later by inline asm then that could work
too:

    __builtin_labeled_trap("1:");
    asm ("    .section __bug_table,\"aw\"  \n\t"
         "2:  .4byte 1b - 2b               \n\t"
         "    .previous");

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32
  2021-08-26 15:04               ` Nicholas Piggin
@ 2021-08-26 15:30                 ` Segher Boessenkool
  -1 siblings, 0 replies; 48+ messages in thread
From: Segher Boessenkool @ 2021-08-26 15:30 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: linux-kernel, Paul Mackerras, linuxppc-dev

On Fri, Aug 27, 2021 at 01:04:36AM +1000, Nicholas Piggin wrote:
> Excerpts from Segher Boessenkool's message of August 27, 2021 12:37 am:
> >> No, they are all dispatched and issue to the BRU for execution. It's 
> >> trivial to construct a test of a lot of not taken branches in a row
> >> and time a loop of it to see it executes at 1 cycle per branch.
> > 
> > (s/dispatched/issued/)
> 
> ?

Dispatch is from decode to the issue queues.  Issue is from there to
execution units.  Dispatch is in-order, issue is not.

> >> How could it validate prediction without issuing? It wouldn't know when
> >> sources are ready.
> > 
> > In the backend.  But that is just how it worked on older cores :-/
> 
> Okay. I don't know about older cores than POWER9. Backend would normally 
> include execution though.
> Only other place you could do it if you don't
> issue/exec would be after it goes back in order, like completion.

You do not have to do the verification in-order: the insn cannot finish
until it is no longer speculative, that takes care of all ordering
needed.

> But that would be horrible for mispredict penalty.

See the previous point.  Also, any insn known to be mispredicted can be
flushed immediately anyway.

> >> >> The first problem seems like the show stopper though. AFAIKS it would 
> >> >> need a special builtin support that does something to create the table
> >> >> entry, or a guarantee that we could put an inline asm right after the
> >> >> builtin as a recognized pattern and that would give us the instruction
> >> >> following the trap.
> >> > 
> >> > I'm not quite sure what this means.  Can't you always just put a
> >> > 
> >> > bla:	asm("");
> >> > 
> >> > in there, and use the address of "bla"?
> >> 
> >> Not AFAIKS. Put it where?
> > 
> > After wherever you want to know the address after.  You will have to
> > make sure they stay together somehow.
> 
> I still don't follow.

some_thing_you_want_to_know_the_address_after_let_us_call_it_A;
empty_asm_that_we_can_take_the_address_of_known_as_B;

You have to make sure the compiler keeps A and B together, does not
insert anything between them, does put them in the assembler output in
the same fragment, etc.

> If you could give a built in that put a label at the address of the trap 
> instruction that could be used later by inline asm then that could work
> too:
> 
>     __builtin_labeled_trap("1:");
>     asm ("    .section __bug_table,\"aw\"  \n\t"
>          "2:  .4byte 1b - 2b               \n\t"
>          "    .previous");

How could a compiler do anything like that?!


Segher

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32
@ 2021-08-26 15:30                 ` Segher Boessenkool
  0 siblings, 0 replies; 48+ messages in thread
From: Segher Boessenkool @ 2021-08-26 15:30 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Benjamin Herrenschmidt, Christophe Leroy, linux-kernel,
	linuxppc-dev, Michael Ellerman, Paul Mackerras

On Fri, Aug 27, 2021 at 01:04:36AM +1000, Nicholas Piggin wrote:
> Excerpts from Segher Boessenkool's message of August 27, 2021 12:37 am:
> >> No, they are all dispatched and issue to the BRU for execution. It's 
> >> trivial to construct a test of a lot of not taken branches in a row
> >> and time a loop of it to see it executes at 1 cycle per branch.
> > 
> > (s/dispatched/issued/)
> 
> ?

Dispatch is from decode to the issue queues.  Issue is from there to
execution units.  Dispatch is in-order, issue is not.

> >> How could it validate prediction without issuing? It wouldn't know when
> >> sources are ready.
> > 
> > In the backend.  But that is just how it worked on older cores :-/
> 
> Okay. I don't know about older cores than POWER9. Backend would normally 
> include execution though.
> Only other place you could do it if you don't
> issue/exec would be after it goes back in order, like completion.

You do not have to do the verification in-order: the insn cannot finish
until it is no longer speculative, that takes care of all ordering
needed.

> But that would be horrible for mispredict penalty.

See the previous point.  Also, any insn known to be mispredicted can be
flushed immediately anyway.

> >> >> The first problem seems like the show stopper though. AFAIKS it would 
> >> >> need a special builtin support that does something to create the table
> >> >> entry, or a guarantee that we could put an inline asm right after the
> >> >> builtin as a recognized pattern and that would give us the instruction
> >> >> following the trap.
> >> > 
> >> > I'm not quite sure what this means.  Can't you always just put a
> >> > 
> >> > bla:	asm("");
> >> > 
> >> > in there, and use the address of "bla"?
> >> 
> >> Not AFAIKS. Put it where?
> > 
> > After wherever you want to know the address after.  You will have to
> > make sure they stay together somehow.
> 
> I still don't follow.

some_thing_you_want_to_know_the_address_after_let_us_call_it_A;
empty_asm_that_we_can_take_the_address_of_known_as_B;

You have to make sure the compiler keeps A and B together, does not
insert anything between them, does put them in the assembler output in
the same fragment, etc.

> If you could give a built in that put a label at the address of the trap 
> instruction that could be used later by inline asm then that could work
> too:
> 
>     __builtin_labeled_trap("1:");
>     asm ("    .section __bug_table,\"aw\"  \n\t"
>          "2:  .4byte 1b - 2b               \n\t"
>          "    .previous");

How could a compiler do anything like that?!


Segher

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto
  2021-08-26  3:21       ` Michael Ellerman
@ 2021-08-26 18:54         ` Nathan Chancellor
  -1 siblings, 0 replies; 48+ messages in thread
From: Nathan Chancellor @ 2021-08-26 18:54 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Christophe Leroy, Benjamin Herrenschmidt, Paul Mackerras,
	linux-kernel, linuxppc-dev, clang-built-linux, llvm

On Thu, Aug 26, 2021 at 01:21:39PM +1000, Michael Ellerman wrote:
> Nathan Chancellor <nathan@kernel.org> writes:
> > On Tue, Apr 13, 2021 at 04:38:10PM +0000, Christophe Leroy wrote:
> >> Using asm goto in __WARN_FLAGS() and WARN_ON() allows more
> >> flexibility to GCC.
> ...
> >
> > This patch as commit 1e688dd2a3d6 ("powerpc/bug: Provide better
> > flexibility to WARN_ON/__WARN_FLAGS() with asm goto") cause a WARN_ON in
> > klist_add_tail to trigger over and over on boot when compiling with
> > clang:
> >
> > [    2.177416][    T1] WARNING: CPU: 0 PID: 1 at lib/klist.c:62 .klist_add_tail+0x3c/0x110
> > [    2.177456][    T1] Modules linked in:
> > [    2.177481][    T1] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W         5.14.0-rc7-next-20210825 #1
> > [    2.177520][    T1] NIP:  c0000000007ff81c LR: c00000000090a038 CTR: 0000000000000000
> > [    2.177557][    T1] REGS: c0000000073c32a0 TRAP: 0700   Tainted: G        W          (5.14.0-rc7-next-20210825)
> > [    2.177593][    T1] MSR:  8000000002029032 <SF,VEC,EE,ME,IR,DR,RI>  CR: 22000a40  XER: 00000000
> > [    2.177667][    T1] CFAR: c00000000090a034 IRQMASK: 0
> > [    2.177667][    T1] GPR00: c00000000090a038 c0000000073c3540 c000000001be3200 0000000000000001
> > [    2.177667][    T1] GPR04: c0000000072d65c0 0000000000000000 c0000000091ba798 c0000000091bb0a0
> > [    2.177667][    T1] GPR08: 0000000000000001 0000000000000000 c000000008581918 fffffffffffffc00
> > [    2.177667][    T1] GPR12: 0000000044000240 c000000001dd0000 c000000000012300 0000000000000000
> > [    2.177667][    T1] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > [    2.177667][    T1] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > [    2.177667][    T1] GPR24: 0000000000000000 c0000000017e3200 0000000000000000 c000000001a0e778
> > [    2.177667][    T1] GPR28: c0000000072d65b0 c0000000072d65a8 c000000007de72c8 c0000000073c35d0
> > [    2.178019][    T1] NIP [c0000000007ff81c] .klist_add_tail+0x3c/0x110
> > [    2.178058][    T1] LR [c00000000090a038] .bus_add_driver+0x148/0x290
> > [    2.178088][    T1] Call Trace:
> > [    2.178105][    T1] [c0000000073c3540] [c0000000073c35d0] 0xc0000000073c35d0 (unreliable)
> > [    2.178150][    T1] [c0000000073c35d0] [c00000000090a038] .bus_add_driver+0x148/0x290
> > [    2.178190][    T1] [c0000000073c3670] [c00000000090fae8] .driver_register+0xb8/0x190
> > [    2.178234][    T1] [c0000000073c3700] [c000000000be55c0] .__hid_register_driver+0x70/0xd0
> > [    2.178275][    T1] [c0000000073c37a0] [c00000000116955c] .redragon_driver_init+0x34/0x58
> > [    2.178314][    T1] [c0000000073c3820] [c000000000011ae0] .do_one_initcall+0x130/0x3b0
> > [    2.178357][    T1] [c0000000073c3bb0] [c0000000011065e0] .do_initcall_level+0xd8/0x188
> > [    2.178403][    T1] [c0000000073c3c50] [c0000000011064a8] .do_initcalls+0x7c/0xdc
> > [    2.178445][    T1] [c0000000073c3ce0] [c000000001106238] .kernel_init_freeable+0x178/0x21c
> > [    2.178491][    T1] [c0000000073c3d90] [c000000000012334] .kernel_init+0x34/0x220
> > [    2.178530][    T1] [c0000000073c3e10] [c00000000000cf50] .ret_from_kernel_thread+0x58/0x60
> > [    2.178569][    T1] Instruction dump:
> > [    2.178592][    T1] fba10078 7c7d1b78 38600001 fb810070 3b9d0008 fbc10080 7c9e2378 389d0018
> > [    2.178662][    T1] fb9d0008 fb9d0010 90640000 fbdd0000 <0b1e0000> e87e0018 28230000 41820024
> > [    2.178728][    T1] ---[ end trace 52ed3431f58f1847 ]---
> >
> > Is this a bug with clang or is there something wrong with the patch? The
> > vmlinux image is available at [1] if you want to inspect it and our QEMU
> > command and the warning at boot can be viewed at [2]. If there is any
> > other information I can provide, please let me know.
> >
> > [1] https://builds.tuxbuild.com/1xDcmp3Tvno0TTGxDVPedRKIKM2/
> > [2] https://github.com/ClangBuiltLinux/continuous-integration2/commit/cee159b66a58eb57fa2359e7888074b9da24126c/checks/3422232736/logs
> 
> Thanks.
> 
> This is the generated assembly:
> 
> c0000000007ff600 <.klist_add_tail>:
> c0000000007ff600:       7c 08 02 a6     mflr    r0
> c0000000007ff604:       f8 01 00 10     std     r0,16(r1)
> c0000000007ff608:       f8 21 ff 71     stdu    r1,-144(r1)	^ prolog
> c0000000007ff60c:       fb a1 00 78     std     r29,120(r1)	save r29 to stack
> c0000000007ff610:       7c 7d 1b 78     mr      r29,r3		r29 = struct klist_node *n
> c0000000007ff614:       38 60 00 01     li      r3,1		r3 = 1
> c0000000007ff618:       fb 81 00 70     std     r28,112(r1)	save r28 to stack
> c0000000007ff61c:       3b 9d 00 08     addi    r28,r29,8	r28 = &n->n_node
> c0000000007ff620:       fb c1 00 80     std     r30,128(r1)	save r30 to stack
> c0000000007ff624:       7c 9e 23 78     mr      r30,r4		r30 = struct klist *k
> c0000000007ff628:       38 9d 00 18     addi    r4,r29,24	r4 = &n->n_ref
> c0000000007ff62c:       fb 9d 00 08     std     r28,8(r29)	n->n_node.next = &n->n_node	INIT_LIST_HEAD
> c0000000007ff630:       fb 9d 00 10     std     r28,16(r29)	n->n_node.prev = &n->n_node
> c0000000007ff634:       90 64 00 00     stw     r3,0(r4)	kref_init(&n->n_ref)
> c0000000007ff638:       fb dd 00 00     std     r30,0(r29)	n->n_klist = k
> c0000000007ff63c:       0b 1e 00 00     tdnei   r30,0		trap if r30 (k) is not zero
> 
> 
> From:
> 
> static void knode_set_klist(struct klist_node *knode, struct klist *klist)
> {
> 	knode->n_klist = klist;
> 	/* no knode deserves to start its life dead */
> 	WARN_ON(knode_dead(knode));
>                 ^^^^^^^^^^^^^^^^^
> }
> 
> Which expands to:
> 
> static void knode_set_klist(struct klist_node *knode, struct klist *klist)
> {
> 	knode->n_klist = klist;
> 
> 	({
> 		bool __ret_warn_on = false;
> 		do {
>                 ...
> 			} else {
> 				__label__ __label_warn_on;
> 				do {
> 					asm goto(
> 						"1:   "
> 						"tdnei"
> 						"
> 						" " % 4,
> 						0 " "\n " ".section __ex_table,\"a\";"
> 										" "
> 										".balign 4;"
> 										" "
> 										".long (1b) - . ;"
> 										" "
> 										".long (%l[__label_warn_on]) - . ;"
> 										" "
> 										".previous"
> 										" "
> 										".section __bug_table,\"aw\"\n"
> 										"2:\t.4byte 1b - 2b, %0 - 2b\n"
> 										"\t.short %1, %2\n"
> 										".org 2b+%3\n"
> 										".previous\n"
> 						:
> 						: "i"("lib/klist.c"), "i"(62),
> 						  "i"((1 << 0) | ((9) << 8)),
> 						  "i"(sizeof(struct bug_entry)),
> 						  "r"(knode_dead(knode))
>                                                   ^^^^^^^^^^^^^^^^^^^^^
> 
> 						:
> 						: __label_warn_on);
> 					asm("");
> 				} while (0);
> 				break;
> 			__label_warn_on:
> 				__ret_warn_on = true;
> 			}
> 		} while (0);
> 		__builtin_expect(!!(__ret_warn_on), 0);
> 	});
> }
> 
> And knode_dead is:
> 
> #define KNODE_DEAD		1LU
> 
> static bool knode_dead(struct klist_node *knode)
> {
> 	return (unsigned long)knode->n_klist & KNODE_DEAD;
> }
> 
> 
> So it's meant to warn if (n_klist & KNODE_DEAD) is not equal to zero.
> 
> But in the asm:
> 
> c0000000007ff600 <.klist_add_tail>:
> ...
> c0000000007ff624:       7c 9e 23 78     mr      r30,r4		r30 = struct klist *k
> ...
> c0000000007ff638:       fb dd 00 00     std     r30,0(r29)	n->n_klist = k
> c0000000007ff63c:       0b 1e 00 00     tdnei   r30,0		trap if r30 (k) is not zero
> 
> 
> It's just warning if n_klist is not equal to zero. ie. we lost the "& KNODE_DEAD".
> 
> In the GCC output you can see it:
> 
> c0000000008c8a30 <.klist_node_init>:
> c0000000008c8a30:       39 24 00 08     addi    r9,r4,8
> c0000000008c8a34:       39 40 00 01     li      r10,1
> c0000000008c8a38:       f9 24 00 08     std     r9,8(r4)
> c0000000008c8a3c:       f9 24 00 10     std     r9,16(r4)
> c0000000008c8a40:       91 44 00 18     stw     r10,24(r4)
> c0000000008c8a44:       f8 64 00 00     std     r3,0(r4)
> c0000000008c8a48:       54 69 07 fe     clrlwi  r9,r3,31
> c0000000008c8a4c:       0b 09 00 00     tdnei   r9,0
> 
> ie. the clrlwi is "clear left (word) immediate", and zeroes everything
> except bit 0, which is equivalent to "& KNODE_DEAD".
> 
> 
> So there seems to be some misunderstanding with clang, it doesn't like
> us passing an expression to the inline asm.
> 
> AFAIK it is legal to pass expressions as inputs to inline asm, ie. it
> doesn't have to just be a variable name.
> 
> This patch seems to fix it. Not sure if that's just papering over it though.
> 
> diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
> index 1ee0f22313ee..75fcb4370d96 100644
> --- a/arch/powerpc/include/asm/bug.h
> +++ b/arch/powerpc/include/asm/bug.h
> @@ -119,7 +119,7 @@ __label_warn_on:						\
>  								\
>  			WARN_ENTRY(PPC_TLNEI " %4, 0",		\
>  				   BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN),	\
> -				   __label_warn_on, "r" (x));	\
> +				   __label_warn_on, "r" (!!(x))); \
>  			break;					\
>  __label_warn_on:						\
>  			__ret_warn_on = true;			\
> 
> 
> Generates:
> 
> c0000000008e2ac0 <.klist_add_tail>:
> c0000000008e2ac0:       7c 08 02 a6     mflr    r0
> c0000000008e2ac4:       f8 01 00 10     std     r0,16(r1)
> c0000000008e2ac8:       f8 21 ff 71     stdu    r1,-144(r1)
> c0000000008e2acc:       fb a1 00 78     std     r29,120(r1)
> c0000000008e2ad0:       7c 7d 1b 78     mr      r29,r3
> c0000000008e2ad4:       38 60 00 01     li      r3,1
> c0000000008e2ad8:       fb c1 00 80     std     r30,128(r1)
> c0000000008e2adc:       7c 9e 23 78     mr      r30,r4
> c0000000008e2ae0:       38 9d 00 18     addi    r4,r29,24
> c0000000008e2ae4:       57 c5 07 fe     clrlwi  r5,r30,31	<-
> c0000000008e2ae8:       fb 81 00 70     std     r28,112(r1)
> c0000000008e2aec:       3b 9d 00 08     addi    r28,r29,8
> c0000000008e2af0:       fb 9d 00 08     std     r28,8(r29)
> c0000000008e2af4:       fb 9d 00 10     std     r28,16(r29)
> c0000000008e2af8:       90 64 00 00     stw     r3,0(r4)
> c0000000008e2afc:       fb dd 00 00     std     r30,0(r29)
> c0000000008e2b00:       0b 05 00 00     tdnei   r5,0		<-

Thank you for the in-depth explanation and triage! I have gone ahead and
created a standalone reproducer that shows this based on the
preprocessed file and opened https://bugs.llvm.org/show_bug.cgi?id=51634
so the LLVM developers can take a look.

Cheers,
Nathan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto
@ 2021-08-26 18:54         ` Nathan Chancellor
  0 siblings, 0 replies; 48+ messages in thread
From: Nathan Chancellor @ 2021-08-26 18:54 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: llvm, linux-kernel, clang-built-linux, Paul Mackerras, linuxppc-dev

On Thu, Aug 26, 2021 at 01:21:39PM +1000, Michael Ellerman wrote:
> Nathan Chancellor <nathan@kernel.org> writes:
> > On Tue, Apr 13, 2021 at 04:38:10PM +0000, Christophe Leroy wrote:
> >> Using asm goto in __WARN_FLAGS() and WARN_ON() allows more
> >> flexibility to GCC.
> ...
> >
> > This patch as commit 1e688dd2a3d6 ("powerpc/bug: Provide better
> > flexibility to WARN_ON/__WARN_FLAGS() with asm goto") cause a WARN_ON in
> > klist_add_tail to trigger over and over on boot when compiling with
> > clang:
> >
> > [    2.177416][    T1] WARNING: CPU: 0 PID: 1 at lib/klist.c:62 .klist_add_tail+0x3c/0x110
> > [    2.177456][    T1] Modules linked in:
> > [    2.177481][    T1] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W         5.14.0-rc7-next-20210825 #1
> > [    2.177520][    T1] NIP:  c0000000007ff81c LR: c00000000090a038 CTR: 0000000000000000
> > [    2.177557][    T1] REGS: c0000000073c32a0 TRAP: 0700   Tainted: G        W          (5.14.0-rc7-next-20210825)
> > [    2.177593][    T1] MSR:  8000000002029032 <SF,VEC,EE,ME,IR,DR,RI>  CR: 22000a40  XER: 00000000
> > [    2.177667][    T1] CFAR: c00000000090a034 IRQMASK: 0
> > [    2.177667][    T1] GPR00: c00000000090a038 c0000000073c3540 c000000001be3200 0000000000000001
> > [    2.177667][    T1] GPR04: c0000000072d65c0 0000000000000000 c0000000091ba798 c0000000091bb0a0
> > [    2.177667][    T1] GPR08: 0000000000000001 0000000000000000 c000000008581918 fffffffffffffc00
> > [    2.177667][    T1] GPR12: 0000000044000240 c000000001dd0000 c000000000012300 0000000000000000
> > [    2.177667][    T1] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > [    2.177667][    T1] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > [    2.177667][    T1] GPR24: 0000000000000000 c0000000017e3200 0000000000000000 c000000001a0e778
> > [    2.177667][    T1] GPR28: c0000000072d65b0 c0000000072d65a8 c000000007de72c8 c0000000073c35d0
> > [    2.178019][    T1] NIP [c0000000007ff81c] .klist_add_tail+0x3c/0x110
> > [    2.178058][    T1] LR [c00000000090a038] .bus_add_driver+0x148/0x290
> > [    2.178088][    T1] Call Trace:
> > [    2.178105][    T1] [c0000000073c3540] [c0000000073c35d0] 0xc0000000073c35d0 (unreliable)
> > [    2.178150][    T1] [c0000000073c35d0] [c00000000090a038] .bus_add_driver+0x148/0x290
> > [    2.178190][    T1] [c0000000073c3670] [c00000000090fae8] .driver_register+0xb8/0x190
> > [    2.178234][    T1] [c0000000073c3700] [c000000000be55c0] .__hid_register_driver+0x70/0xd0
> > [    2.178275][    T1] [c0000000073c37a0] [c00000000116955c] .redragon_driver_init+0x34/0x58
> > [    2.178314][    T1] [c0000000073c3820] [c000000000011ae0] .do_one_initcall+0x130/0x3b0
> > [    2.178357][    T1] [c0000000073c3bb0] [c0000000011065e0] .do_initcall_level+0xd8/0x188
> > [    2.178403][    T1] [c0000000073c3c50] [c0000000011064a8] .do_initcalls+0x7c/0xdc
> > [    2.178445][    T1] [c0000000073c3ce0] [c000000001106238] .kernel_init_freeable+0x178/0x21c
> > [    2.178491][    T1] [c0000000073c3d90] [c000000000012334] .kernel_init+0x34/0x220
> > [    2.178530][    T1] [c0000000073c3e10] [c00000000000cf50] .ret_from_kernel_thread+0x58/0x60
> > [    2.178569][    T1] Instruction dump:
> > [    2.178592][    T1] fba10078 7c7d1b78 38600001 fb810070 3b9d0008 fbc10080 7c9e2378 389d0018
> > [    2.178662][    T1] fb9d0008 fb9d0010 90640000 fbdd0000 <0b1e0000> e87e0018 28230000 41820024
> > [    2.178728][    T1] ---[ end trace 52ed3431f58f1847 ]---
> >
> > Is this a bug with clang or is there something wrong with the patch? The
> > vmlinux image is available at [1] if you want to inspect it and our QEMU
> > command and the warning at boot can be viewed at [2]. If there is any
> > other information I can provide, please let me know.
> >
> > [1] https://builds.tuxbuild.com/1xDcmp3Tvno0TTGxDVPedRKIKM2/
> > [2] https://github.com/ClangBuiltLinux/continuous-integration2/commit/cee159b66a58eb57fa2359e7888074b9da24126c/checks/3422232736/logs
> 
> Thanks.
> 
> This is the generated assembly:
> 
> c0000000007ff600 <.klist_add_tail>:
> c0000000007ff600:       7c 08 02 a6     mflr    r0
> c0000000007ff604:       f8 01 00 10     std     r0,16(r1)
> c0000000007ff608:       f8 21 ff 71     stdu    r1,-144(r1)	^ prolog
> c0000000007ff60c:       fb a1 00 78     std     r29,120(r1)	save r29 to stack
> c0000000007ff610:       7c 7d 1b 78     mr      r29,r3		r29 = struct klist_node *n
> c0000000007ff614:       38 60 00 01     li      r3,1		r3 = 1
> c0000000007ff618:       fb 81 00 70     std     r28,112(r1)	save r28 to stack
> c0000000007ff61c:       3b 9d 00 08     addi    r28,r29,8	r28 = &n->n_node
> c0000000007ff620:       fb c1 00 80     std     r30,128(r1)	save r30 to stack
> c0000000007ff624:       7c 9e 23 78     mr      r30,r4		r30 = struct klist *k
> c0000000007ff628:       38 9d 00 18     addi    r4,r29,24	r4 = &n->n_ref
> c0000000007ff62c:       fb 9d 00 08     std     r28,8(r29)	n->n_node.next = &n->n_node	INIT_LIST_HEAD
> c0000000007ff630:       fb 9d 00 10     std     r28,16(r29)	n->n_node.prev = &n->n_node
> c0000000007ff634:       90 64 00 00     stw     r3,0(r4)	kref_init(&n->n_ref)
> c0000000007ff638:       fb dd 00 00     std     r30,0(r29)	n->n_klist = k
> c0000000007ff63c:       0b 1e 00 00     tdnei   r30,0		trap if r30 (k) is not zero
> 
> 
> From:
> 
> static void knode_set_klist(struct klist_node *knode, struct klist *klist)
> {
> 	knode->n_klist = klist;
> 	/* no knode deserves to start its life dead */
> 	WARN_ON(knode_dead(knode));
>                 ^^^^^^^^^^^^^^^^^
> }
> 
> Which expands to:
> 
> static void knode_set_klist(struct klist_node *knode, struct klist *klist)
> {
> 	knode->n_klist = klist;
> 
> 	({
> 		bool __ret_warn_on = false;
> 		do {
>                 ...
> 			} else {
> 				__label__ __label_warn_on;
> 				do {
> 					asm goto(
> 						"1:   "
> 						"tdnei"
> 						"
> 						" " % 4,
> 						0 " "\n " ".section __ex_table,\"a\";"
> 										" "
> 										".balign 4;"
> 										" "
> 										".long (1b) - . ;"
> 										" "
> 										".long (%l[__label_warn_on]) - . ;"
> 										" "
> 										".previous"
> 										" "
> 										".section __bug_table,\"aw\"\n"
> 										"2:\t.4byte 1b - 2b, %0 - 2b\n"
> 										"\t.short %1, %2\n"
> 										".org 2b+%3\n"
> 										".previous\n"
> 						:
> 						: "i"("lib/klist.c"), "i"(62),
> 						  "i"((1 << 0) | ((9) << 8)),
> 						  "i"(sizeof(struct bug_entry)),
> 						  "r"(knode_dead(knode))
>                                                   ^^^^^^^^^^^^^^^^^^^^^
> 
> 						:
> 						: __label_warn_on);
> 					asm("");
> 				} while (0);
> 				break;
> 			__label_warn_on:
> 				__ret_warn_on = true;
> 			}
> 		} while (0);
> 		__builtin_expect(!!(__ret_warn_on), 0);
> 	});
> }
> 
> And knode_dead is:
> 
> #define KNODE_DEAD		1LU
> 
> static bool knode_dead(struct klist_node *knode)
> {
> 	return (unsigned long)knode->n_klist & KNODE_DEAD;
> }
> 
> 
> So it's meant to warn if (n_klist & KNODE_DEAD) is not equal to zero.
> 
> But in the asm:
> 
> c0000000007ff600 <.klist_add_tail>:
> ...
> c0000000007ff624:       7c 9e 23 78     mr      r30,r4		r30 = struct klist *k
> ...
> c0000000007ff638:       fb dd 00 00     std     r30,0(r29)	n->n_klist = k
> c0000000007ff63c:       0b 1e 00 00     tdnei   r30,0		trap if r30 (k) is not zero
> 
> 
> It's just warning if n_klist is not equal to zero. ie. we lost the "& KNODE_DEAD".
> 
> In the GCC output you can see it:
> 
> c0000000008c8a30 <.klist_node_init>:
> c0000000008c8a30:       39 24 00 08     addi    r9,r4,8
> c0000000008c8a34:       39 40 00 01     li      r10,1
> c0000000008c8a38:       f9 24 00 08     std     r9,8(r4)
> c0000000008c8a3c:       f9 24 00 10     std     r9,16(r4)
> c0000000008c8a40:       91 44 00 18     stw     r10,24(r4)
> c0000000008c8a44:       f8 64 00 00     std     r3,0(r4)
> c0000000008c8a48:       54 69 07 fe     clrlwi  r9,r3,31
> c0000000008c8a4c:       0b 09 00 00     tdnei   r9,0
> 
> ie. the clrlwi is "clear left (word) immediate", and zeroes everything
> except bit 0, which is equivalent to "& KNODE_DEAD".
> 
> 
> So there seems to be some misunderstanding with clang, it doesn't like
> us passing an expression to the inline asm.
> 
> AFAIK it is legal to pass expressions as inputs to inline asm, ie. it
> doesn't have to just be a variable name.
> 
> This patch seems to fix it. Not sure if that's just papering over it though.
> 
> diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
> index 1ee0f22313ee..75fcb4370d96 100644
> --- a/arch/powerpc/include/asm/bug.h
> +++ b/arch/powerpc/include/asm/bug.h
> @@ -119,7 +119,7 @@ __label_warn_on:						\
>  								\
>  			WARN_ENTRY(PPC_TLNEI " %4, 0",		\
>  				   BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN),	\
> -				   __label_warn_on, "r" (x));	\
> +				   __label_warn_on, "r" (!!(x))); \
>  			break;					\
>  __label_warn_on:						\
>  			__ret_warn_on = true;			\
> 
> 
> Generates:
> 
> c0000000008e2ac0 <.klist_add_tail>:
> c0000000008e2ac0:       7c 08 02 a6     mflr    r0
> c0000000008e2ac4:       f8 01 00 10     std     r0,16(r1)
> c0000000008e2ac8:       f8 21 ff 71     stdu    r1,-144(r1)
> c0000000008e2acc:       fb a1 00 78     std     r29,120(r1)
> c0000000008e2ad0:       7c 7d 1b 78     mr      r29,r3
> c0000000008e2ad4:       38 60 00 01     li      r3,1
> c0000000008e2ad8:       fb c1 00 80     std     r30,128(r1)
> c0000000008e2adc:       7c 9e 23 78     mr      r30,r4
> c0000000008e2ae0:       38 9d 00 18     addi    r4,r29,24
> c0000000008e2ae4:       57 c5 07 fe     clrlwi  r5,r30,31	<-
> c0000000008e2ae8:       fb 81 00 70     std     r28,112(r1)
> c0000000008e2aec:       3b 9d 00 08     addi    r28,r29,8
> c0000000008e2af0:       fb 9d 00 08     std     r28,8(r29)
> c0000000008e2af4:       fb 9d 00 10     std     r28,16(r29)
> c0000000008e2af8:       90 64 00 00     stw     r3,0(r4)
> c0000000008e2afc:       fb dd 00 00     std     r30,0(r29)
> c0000000008e2b00:       0b 05 00 00     tdnei   r5,0		<-

Thank you for the in-depth explanation and triage! I have gone ahead and
created a standalone reproducer that shows this based on the
preprocessed file and opened https://bugs.llvm.org/show_bug.cgi?id=51634
so the LLVM developers can take a look.

Cheers,
Nathan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto
  2021-08-26 18:54         ` Nathan Chancellor
@ 2021-08-26 23:55           ` Nathan Chancellor
  -1 siblings, 0 replies; 48+ messages in thread
From: Nathan Chancellor @ 2021-08-26 23:55 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: llvm, linux-kernel, clang-built-linux, Paul Mackerras,
	linuxppc-dev, Christophe Leroy, Nick Desaulniers

On Thu, Aug 26, 2021 at 11:54:17AM -0700, Nathan Chancellor wrote:
> On Thu, Aug 26, 2021 at 01:21:39PM +1000, Michael Ellerman wrote:
> > Nathan Chancellor <nathan@kernel.org> writes:
> > > On Tue, Apr 13, 2021 at 04:38:10PM +0000, Christophe Leroy wrote:
> > >> Using asm goto in __WARN_FLAGS() and WARN_ON() allows more
> > >> flexibility to GCC.
> > ...
> > >
> > > This patch as commit 1e688dd2a3d6 ("powerpc/bug: Provide better
> > > flexibility to WARN_ON/__WARN_FLAGS() with asm goto") cause a WARN_ON in
> > > klist_add_tail to trigger over and over on boot when compiling with
> > > clang:
> > >
> > > [    2.177416][    T1] WARNING: CPU: 0 PID: 1 at lib/klist.c:62 .klist_add_tail+0x3c/0x110
> > > [    2.177456][    T1] Modules linked in:
> > > [    2.177481][    T1] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W         5.14.0-rc7-next-20210825 #1
> > > [    2.177520][    T1] NIP:  c0000000007ff81c LR: c00000000090a038 CTR: 0000000000000000
> > > [    2.177557][    T1] REGS: c0000000073c32a0 TRAP: 0700   Tainted: G        W          (5.14.0-rc7-next-20210825)
> > > [    2.177593][    T1] MSR:  8000000002029032 <SF,VEC,EE,ME,IR,DR,RI>  CR: 22000a40  XER: 00000000
> > > [    2.177667][    T1] CFAR: c00000000090a034 IRQMASK: 0
> > > [    2.177667][    T1] GPR00: c00000000090a038 c0000000073c3540 c000000001be3200 0000000000000001
> > > [    2.177667][    T1] GPR04: c0000000072d65c0 0000000000000000 c0000000091ba798 c0000000091bb0a0
> > > [    2.177667][    T1] GPR08: 0000000000000001 0000000000000000 c000000008581918 fffffffffffffc00
> > > [    2.177667][    T1] GPR12: 0000000044000240 c000000001dd0000 c000000000012300 0000000000000000
> > > [    2.177667][    T1] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > > [    2.177667][    T1] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > > [    2.177667][    T1] GPR24: 0000000000000000 c0000000017e3200 0000000000000000 c000000001a0e778
> > > [    2.177667][    T1] GPR28: c0000000072d65b0 c0000000072d65a8 c000000007de72c8 c0000000073c35d0
> > > [    2.178019][    T1] NIP [c0000000007ff81c] .klist_add_tail+0x3c/0x110
> > > [    2.178058][    T1] LR [c00000000090a038] .bus_add_driver+0x148/0x290
> > > [    2.178088][    T1] Call Trace:
> > > [    2.178105][    T1] [c0000000073c3540] [c0000000073c35d0] 0xc0000000073c35d0 (unreliable)
> > > [    2.178150][    T1] [c0000000073c35d0] [c00000000090a038] .bus_add_driver+0x148/0x290
> > > [    2.178190][    T1] [c0000000073c3670] [c00000000090fae8] .driver_register+0xb8/0x190
> > > [    2.178234][    T1] [c0000000073c3700] [c000000000be55c0] .__hid_register_driver+0x70/0xd0
> > > [    2.178275][    T1] [c0000000073c37a0] [c00000000116955c] .redragon_driver_init+0x34/0x58
> > > [    2.178314][    T1] [c0000000073c3820] [c000000000011ae0] .do_one_initcall+0x130/0x3b0
> > > [    2.178357][    T1] [c0000000073c3bb0] [c0000000011065e0] .do_initcall_level+0xd8/0x188
> > > [    2.178403][    T1] [c0000000073c3c50] [c0000000011064a8] .do_initcalls+0x7c/0xdc
> > > [    2.178445][    T1] [c0000000073c3ce0] [c000000001106238] .kernel_init_freeable+0x178/0x21c
> > > [    2.178491][    T1] [c0000000073c3d90] [c000000000012334] .kernel_init+0x34/0x220
> > > [    2.178530][    T1] [c0000000073c3e10] [c00000000000cf50] .ret_from_kernel_thread+0x58/0x60
> > > [    2.178569][    T1] Instruction dump:
> > > [    2.178592][    T1] fba10078 7c7d1b78 38600001 fb810070 3b9d0008 fbc10080 7c9e2378 389d0018
> > > [    2.178662][    T1] fb9d0008 fb9d0010 90640000 fbdd0000 <0b1e0000> e87e0018 28230000 41820024
> > > [    2.178728][    T1] ---[ end trace 52ed3431f58f1847 ]---
> > >
> > > Is this a bug with clang or is there something wrong with the patch? The
> > > vmlinux image is available at [1] if you want to inspect it and our QEMU
> > > command and the warning at boot can be viewed at [2]. If there is any
> > > other information I can provide, please let me know.
> > >
> > > [1] https://builds.tuxbuild.com/1xDcmp3Tvno0TTGxDVPedRKIKM2/
> > > [2] https://github.com/ClangBuiltLinux/continuous-integration2/commit/cee159b66a58eb57fa2359e7888074b9da24126c/checks/3422232736/logs
> > 
> > Thanks.
> > 
> > This is the generated assembly:
> > 
> > c0000000007ff600 <.klist_add_tail>:
> > c0000000007ff600:       7c 08 02 a6     mflr    r0
> > c0000000007ff604:       f8 01 00 10     std     r0,16(r1)
> > c0000000007ff608:       f8 21 ff 71     stdu    r1,-144(r1)	^ prolog
> > c0000000007ff60c:       fb a1 00 78     std     r29,120(r1)	save r29 to stack
> > c0000000007ff610:       7c 7d 1b 78     mr      r29,r3		r29 = struct klist_node *n
> > c0000000007ff614:       38 60 00 01     li      r3,1		r3 = 1
> > c0000000007ff618:       fb 81 00 70     std     r28,112(r1)	save r28 to stack
> > c0000000007ff61c:       3b 9d 00 08     addi    r28,r29,8	r28 = &n->n_node
> > c0000000007ff620:       fb c1 00 80     std     r30,128(r1)	save r30 to stack
> > c0000000007ff624:       7c 9e 23 78     mr      r30,r4		r30 = struct klist *k
> > c0000000007ff628:       38 9d 00 18     addi    r4,r29,24	r4 = &n->n_ref
> > c0000000007ff62c:       fb 9d 00 08     std     r28,8(r29)	n->n_node.next = &n->n_node	INIT_LIST_HEAD
> > c0000000007ff630:       fb 9d 00 10     std     r28,16(r29)	n->n_node.prev = &n->n_node
> > c0000000007ff634:       90 64 00 00     stw     r3,0(r4)	kref_init(&n->n_ref)
> > c0000000007ff638:       fb dd 00 00     std     r30,0(r29)	n->n_klist = k
> > c0000000007ff63c:       0b 1e 00 00     tdnei   r30,0		trap if r30 (k) is not zero
> > 
> > 
> > From:
> > 
> > static void knode_set_klist(struct klist_node *knode, struct klist *klist)
> > {
> > 	knode->n_klist = klist;
> > 	/* no knode deserves to start its life dead */
> > 	WARN_ON(knode_dead(knode));
> >                 ^^^^^^^^^^^^^^^^^
> > }
> > 
> > Which expands to:
> > 
> > static void knode_set_klist(struct klist_node *knode, struct klist *klist)
> > {
> > 	knode->n_klist = klist;
> > 
> > 	({
> > 		bool __ret_warn_on = false;
> > 		do {
> >                 ...
> > 			} else {
> > 				__label__ __label_warn_on;
> > 				do {
> > 					asm goto(
> > 						"1:   "
> > 						"tdnei"
> > 						"
> > 						" " % 4,
> > 						0 " "\n " ".section __ex_table,\"a\";"
> > 										" "
> > 										".balign 4;"
> > 										" "
> > 										".long (1b) - . ;"
> > 										" "
> > 										".long (%l[__label_warn_on]) - . ;"
> > 										" "
> > 										".previous"
> > 										" "
> > 										".section __bug_table,\"aw\"\n"
> > 										"2:\t.4byte 1b - 2b, %0 - 2b\n"
> > 										"\t.short %1, %2\n"
> > 										".org 2b+%3\n"
> > 										".previous\n"
> > 						:
> > 						: "i"("lib/klist.c"), "i"(62),
> > 						  "i"((1 << 0) | ((9) << 8)),
> > 						  "i"(sizeof(struct bug_entry)),
> > 						  "r"(knode_dead(knode))
> >                                                   ^^^^^^^^^^^^^^^^^^^^^
> > 
> > 						:
> > 						: __label_warn_on);
> > 					asm("");
> > 				} while (0);
> > 				break;
> > 			__label_warn_on:
> > 				__ret_warn_on = true;
> > 			}
> > 		} while (0);
> > 		__builtin_expect(!!(__ret_warn_on), 0);
> > 	});
> > }
> > 
> > And knode_dead is:
> > 
> > #define KNODE_DEAD		1LU
> > 
> > static bool knode_dead(struct klist_node *knode)
> > {
> > 	return (unsigned long)knode->n_klist & KNODE_DEAD;
> > }
> > 
> > 
> > So it's meant to warn if (n_klist & KNODE_DEAD) is not equal to zero.
> > 
> > But in the asm:
> > 
> > c0000000007ff600 <.klist_add_tail>:
> > ...
> > c0000000007ff624:       7c 9e 23 78     mr      r30,r4		r30 = struct klist *k
> > ...
> > c0000000007ff638:       fb dd 00 00     std     r30,0(r29)	n->n_klist = k
> > c0000000007ff63c:       0b 1e 00 00     tdnei   r30,0		trap if r30 (k) is not zero
> > 
> > 
> > It's just warning if n_klist is not equal to zero. ie. we lost the "& KNODE_DEAD".
> > 
> > In the GCC output you can see it:
> > 
> > c0000000008c8a30 <.klist_node_init>:
> > c0000000008c8a30:       39 24 00 08     addi    r9,r4,8
> > c0000000008c8a34:       39 40 00 01     li      r10,1
> > c0000000008c8a38:       f9 24 00 08     std     r9,8(r4)
> > c0000000008c8a3c:       f9 24 00 10     std     r9,16(r4)
> > c0000000008c8a40:       91 44 00 18     stw     r10,24(r4)
> > c0000000008c8a44:       f8 64 00 00     std     r3,0(r4)
> > c0000000008c8a48:       54 69 07 fe     clrlwi  r9,r3,31
> > c0000000008c8a4c:       0b 09 00 00     tdnei   r9,0
> > 
> > ie. the clrlwi is "clear left (word) immediate", and zeroes everything
> > except bit 0, which is equivalent to "& KNODE_DEAD".
> > 
> > 
> > So there seems to be some misunderstanding with clang, it doesn't like
> > us passing an expression to the inline asm.
> > 
> > AFAIK it is legal to pass expressions as inputs to inline asm, ie. it
> > doesn't have to just be a variable name.
> > 
> > This patch seems to fix it. Not sure if that's just papering over it though.
> > 
> > diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
> > index 1ee0f22313ee..75fcb4370d96 100644
> > --- a/arch/powerpc/include/asm/bug.h
> > +++ b/arch/powerpc/include/asm/bug.h
> > @@ -119,7 +119,7 @@ __label_warn_on:						\
> >  								\
> >  			WARN_ENTRY(PPC_TLNEI " %4, 0",		\
> >  				   BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN),	\
> > -				   __label_warn_on, "r" (x));	\
> > +				   __label_warn_on, "r" (!!(x))); \
> >  			break;					\
> >  __label_warn_on:						\
> >  			__ret_warn_on = true;			\
> > 
> > 
> > Generates:
> > 
> > c0000000008e2ac0 <.klist_add_tail>:
> > c0000000008e2ac0:       7c 08 02 a6     mflr    r0
> > c0000000008e2ac4:       f8 01 00 10     std     r0,16(r1)
> > c0000000008e2ac8:       f8 21 ff 71     stdu    r1,-144(r1)
> > c0000000008e2acc:       fb a1 00 78     std     r29,120(r1)
> > c0000000008e2ad0:       7c 7d 1b 78     mr      r29,r3
> > c0000000008e2ad4:       38 60 00 01     li      r3,1
> > c0000000008e2ad8:       fb c1 00 80     std     r30,128(r1)
> > c0000000008e2adc:       7c 9e 23 78     mr      r30,r4
> > c0000000008e2ae0:       38 9d 00 18     addi    r4,r29,24
> > c0000000008e2ae4:       57 c5 07 fe     clrlwi  r5,r30,31	<-
> > c0000000008e2ae8:       fb 81 00 70     std     r28,112(r1)
> > c0000000008e2aec:       3b 9d 00 08     addi    r28,r29,8
> > c0000000008e2af0:       fb 9d 00 08     std     r28,8(r29)
> > c0000000008e2af4:       fb 9d 00 10     std     r28,16(r29)
> > c0000000008e2af8:       90 64 00 00     stw     r3,0(r4)
> > c0000000008e2afc:       fb dd 00 00     std     r30,0(r29)
> > c0000000008e2b00:       0b 05 00 00     tdnei   r5,0		<-
> 
> Thank you for the in-depth explanation and triage! I have gone ahead and
> created a standalone reproducer that shows this based on the
> preprocessed file and opened https://bugs.llvm.org/show_bug.cgi?id=51634
> so the LLVM developers can take a look.

Based on Eli Friedman's comment in the bug, would something like this
work and not regress GCC? I noticed that the BUG_ON macro does a cast as
well. Nick pointed out to me privately that we have run into what seems
like a similar issue before in commit 6c58f25e6938 ("riscv/atomic: Fix
sign extension for RV64I").

Cheers,
Nathan

diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
index 1ee0f22313ee..35022667f57d 100644
--- a/arch/powerpc/include/asm/bug.h
+++ b/arch/powerpc/include/asm/bug.h
@@ -119,7 +119,7 @@ __label_warn_on:                                            \
                                                                \
                        WARN_ENTRY(PPC_TLNEI " %4, 0",          \
                                   BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN), \
-                                  __label_warn_on, "r" (x));   \
+                                  __label_warn_on, "r" ((__force long)(x)));   \
                        break;                                  \
 __label_warn_on:                                               \
                        __ret_warn_on = true;                   \

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto
@ 2021-08-26 23:55           ` Nathan Chancellor
  0 siblings, 0 replies; 48+ messages in thread
From: Nathan Chancellor @ 2021-08-26 23:55 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: llvm, Nick Desaulniers, linux-kernel, clang-built-linux,
	Paul Mackerras, linuxppc-dev

On Thu, Aug 26, 2021 at 11:54:17AM -0700, Nathan Chancellor wrote:
> On Thu, Aug 26, 2021 at 01:21:39PM +1000, Michael Ellerman wrote:
> > Nathan Chancellor <nathan@kernel.org> writes:
> > > On Tue, Apr 13, 2021 at 04:38:10PM +0000, Christophe Leroy wrote:
> > >> Using asm goto in __WARN_FLAGS() and WARN_ON() allows more
> > >> flexibility to GCC.
> > ...
> > >
> > > This patch as commit 1e688dd2a3d6 ("powerpc/bug: Provide better
> > > flexibility to WARN_ON/__WARN_FLAGS() with asm goto") cause a WARN_ON in
> > > klist_add_tail to trigger over and over on boot when compiling with
> > > clang:
> > >
> > > [    2.177416][    T1] WARNING: CPU: 0 PID: 1 at lib/klist.c:62 .klist_add_tail+0x3c/0x110
> > > [    2.177456][    T1] Modules linked in:
> > > [    2.177481][    T1] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W         5.14.0-rc7-next-20210825 #1
> > > [    2.177520][    T1] NIP:  c0000000007ff81c LR: c00000000090a038 CTR: 0000000000000000
> > > [    2.177557][    T1] REGS: c0000000073c32a0 TRAP: 0700   Tainted: G        W          (5.14.0-rc7-next-20210825)
> > > [    2.177593][    T1] MSR:  8000000002029032 <SF,VEC,EE,ME,IR,DR,RI>  CR: 22000a40  XER: 00000000
> > > [    2.177667][    T1] CFAR: c00000000090a034 IRQMASK: 0
> > > [    2.177667][    T1] GPR00: c00000000090a038 c0000000073c3540 c000000001be3200 0000000000000001
> > > [    2.177667][    T1] GPR04: c0000000072d65c0 0000000000000000 c0000000091ba798 c0000000091bb0a0
> > > [    2.177667][    T1] GPR08: 0000000000000001 0000000000000000 c000000008581918 fffffffffffffc00
> > > [    2.177667][    T1] GPR12: 0000000044000240 c000000001dd0000 c000000000012300 0000000000000000
> > > [    2.177667][    T1] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > > [    2.177667][    T1] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > > [    2.177667][    T1] GPR24: 0000000000000000 c0000000017e3200 0000000000000000 c000000001a0e778
> > > [    2.177667][    T1] GPR28: c0000000072d65b0 c0000000072d65a8 c000000007de72c8 c0000000073c35d0
> > > [    2.178019][    T1] NIP [c0000000007ff81c] .klist_add_tail+0x3c/0x110
> > > [    2.178058][    T1] LR [c00000000090a038] .bus_add_driver+0x148/0x290
> > > [    2.178088][    T1] Call Trace:
> > > [    2.178105][    T1] [c0000000073c3540] [c0000000073c35d0] 0xc0000000073c35d0 (unreliable)
> > > [    2.178150][    T1] [c0000000073c35d0] [c00000000090a038] .bus_add_driver+0x148/0x290
> > > [    2.178190][    T1] [c0000000073c3670] [c00000000090fae8] .driver_register+0xb8/0x190
> > > [    2.178234][    T1] [c0000000073c3700] [c000000000be55c0] .__hid_register_driver+0x70/0xd0
> > > [    2.178275][    T1] [c0000000073c37a0] [c00000000116955c] .redragon_driver_init+0x34/0x58
> > > [    2.178314][    T1] [c0000000073c3820] [c000000000011ae0] .do_one_initcall+0x130/0x3b0
> > > [    2.178357][    T1] [c0000000073c3bb0] [c0000000011065e0] .do_initcall_level+0xd8/0x188
> > > [    2.178403][    T1] [c0000000073c3c50] [c0000000011064a8] .do_initcalls+0x7c/0xdc
> > > [    2.178445][    T1] [c0000000073c3ce0] [c000000001106238] .kernel_init_freeable+0x178/0x21c
> > > [    2.178491][    T1] [c0000000073c3d90] [c000000000012334] .kernel_init+0x34/0x220
> > > [    2.178530][    T1] [c0000000073c3e10] [c00000000000cf50] .ret_from_kernel_thread+0x58/0x60
> > > [    2.178569][    T1] Instruction dump:
> > > [    2.178592][    T1] fba10078 7c7d1b78 38600001 fb810070 3b9d0008 fbc10080 7c9e2378 389d0018
> > > [    2.178662][    T1] fb9d0008 fb9d0010 90640000 fbdd0000 <0b1e0000> e87e0018 28230000 41820024
> > > [    2.178728][    T1] ---[ end trace 52ed3431f58f1847 ]---
> > >
> > > Is this a bug with clang or is there something wrong with the patch? The
> > > vmlinux image is available at [1] if you want to inspect it and our QEMU
> > > command and the warning at boot can be viewed at [2]. If there is any
> > > other information I can provide, please let me know.
> > >
> > > [1] https://builds.tuxbuild.com/1xDcmp3Tvno0TTGxDVPedRKIKM2/
> > > [2] https://github.com/ClangBuiltLinux/continuous-integration2/commit/cee159b66a58eb57fa2359e7888074b9da24126c/checks/3422232736/logs
> > 
> > Thanks.
> > 
> > This is the generated assembly:
> > 
> > c0000000007ff600 <.klist_add_tail>:
> > c0000000007ff600:       7c 08 02 a6     mflr    r0
> > c0000000007ff604:       f8 01 00 10     std     r0,16(r1)
> > c0000000007ff608:       f8 21 ff 71     stdu    r1,-144(r1)	^ prolog
> > c0000000007ff60c:       fb a1 00 78     std     r29,120(r1)	save r29 to stack
> > c0000000007ff610:       7c 7d 1b 78     mr      r29,r3		r29 = struct klist_node *n
> > c0000000007ff614:       38 60 00 01     li      r3,1		r3 = 1
> > c0000000007ff618:       fb 81 00 70     std     r28,112(r1)	save r28 to stack
> > c0000000007ff61c:       3b 9d 00 08     addi    r28,r29,8	r28 = &n->n_node
> > c0000000007ff620:       fb c1 00 80     std     r30,128(r1)	save r30 to stack
> > c0000000007ff624:       7c 9e 23 78     mr      r30,r4		r30 = struct klist *k
> > c0000000007ff628:       38 9d 00 18     addi    r4,r29,24	r4 = &n->n_ref
> > c0000000007ff62c:       fb 9d 00 08     std     r28,8(r29)	n->n_node.next = &n->n_node	INIT_LIST_HEAD
> > c0000000007ff630:       fb 9d 00 10     std     r28,16(r29)	n->n_node.prev = &n->n_node
> > c0000000007ff634:       90 64 00 00     stw     r3,0(r4)	kref_init(&n->n_ref)
> > c0000000007ff638:       fb dd 00 00     std     r30,0(r29)	n->n_klist = k
> > c0000000007ff63c:       0b 1e 00 00     tdnei   r30,0		trap if r30 (k) is not zero
> > 
> > 
> > From:
> > 
> > static void knode_set_klist(struct klist_node *knode, struct klist *klist)
> > {
> > 	knode->n_klist = klist;
> > 	/* no knode deserves to start its life dead */
> > 	WARN_ON(knode_dead(knode));
> >                 ^^^^^^^^^^^^^^^^^
> > }
> > 
> > Which expands to:
> > 
> > static void knode_set_klist(struct klist_node *knode, struct klist *klist)
> > {
> > 	knode->n_klist = klist;
> > 
> > 	({
> > 		bool __ret_warn_on = false;
> > 		do {
> >                 ...
> > 			} else {
> > 				__label__ __label_warn_on;
> > 				do {
> > 					asm goto(
> > 						"1:   "
> > 						"tdnei"
> > 						"
> > 						" " % 4,
> > 						0 " "\n " ".section __ex_table,\"a\";"
> > 										" "
> > 										".balign 4;"
> > 										" "
> > 										".long (1b) - . ;"
> > 										" "
> > 										".long (%l[__label_warn_on]) - . ;"
> > 										" "
> > 										".previous"
> > 										" "
> > 										".section __bug_table,\"aw\"\n"
> > 										"2:\t.4byte 1b - 2b, %0 - 2b\n"
> > 										"\t.short %1, %2\n"
> > 										".org 2b+%3\n"
> > 										".previous\n"
> > 						:
> > 						: "i"("lib/klist.c"), "i"(62),
> > 						  "i"((1 << 0) | ((9) << 8)),
> > 						  "i"(sizeof(struct bug_entry)),
> > 						  "r"(knode_dead(knode))
> >                                                   ^^^^^^^^^^^^^^^^^^^^^
> > 
> > 						:
> > 						: __label_warn_on);
> > 					asm("");
> > 				} while (0);
> > 				break;
> > 			__label_warn_on:
> > 				__ret_warn_on = true;
> > 			}
> > 		} while (0);
> > 		__builtin_expect(!!(__ret_warn_on), 0);
> > 	});
> > }
> > 
> > And knode_dead is:
> > 
> > #define KNODE_DEAD		1LU
> > 
> > static bool knode_dead(struct klist_node *knode)
> > {
> > 	return (unsigned long)knode->n_klist & KNODE_DEAD;
> > }
> > 
> > 
> > So it's meant to warn if (n_klist & KNODE_DEAD) is not equal to zero.
> > 
> > But in the asm:
> > 
> > c0000000007ff600 <.klist_add_tail>:
> > ...
> > c0000000007ff624:       7c 9e 23 78     mr      r30,r4		r30 = struct klist *k
> > ...
> > c0000000007ff638:       fb dd 00 00     std     r30,0(r29)	n->n_klist = k
> > c0000000007ff63c:       0b 1e 00 00     tdnei   r30,0		trap if r30 (k) is not zero
> > 
> > 
> > It's just warning if n_klist is not equal to zero. ie. we lost the "& KNODE_DEAD".
> > 
> > In the GCC output you can see it:
> > 
> > c0000000008c8a30 <.klist_node_init>:
> > c0000000008c8a30:       39 24 00 08     addi    r9,r4,8
> > c0000000008c8a34:       39 40 00 01     li      r10,1
> > c0000000008c8a38:       f9 24 00 08     std     r9,8(r4)
> > c0000000008c8a3c:       f9 24 00 10     std     r9,16(r4)
> > c0000000008c8a40:       91 44 00 18     stw     r10,24(r4)
> > c0000000008c8a44:       f8 64 00 00     std     r3,0(r4)
> > c0000000008c8a48:       54 69 07 fe     clrlwi  r9,r3,31
> > c0000000008c8a4c:       0b 09 00 00     tdnei   r9,0
> > 
> > ie. the clrlwi is "clear left (word) immediate", and zeroes everything
> > except bit 0, which is equivalent to "& KNODE_DEAD".
> > 
> > 
> > So there seems to be some misunderstanding with clang, it doesn't like
> > us passing an expression to the inline asm.
> > 
> > AFAIK it is legal to pass expressions as inputs to inline asm, ie. it
> > doesn't have to just be a variable name.
> > 
> > This patch seems to fix it. Not sure if that's just papering over it though.
> > 
> > diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
> > index 1ee0f22313ee..75fcb4370d96 100644
> > --- a/arch/powerpc/include/asm/bug.h
> > +++ b/arch/powerpc/include/asm/bug.h
> > @@ -119,7 +119,7 @@ __label_warn_on:						\
> >  								\
> >  			WARN_ENTRY(PPC_TLNEI " %4, 0",		\
> >  				   BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN),	\
> > -				   __label_warn_on, "r" (x));	\
> > +				   __label_warn_on, "r" (!!(x))); \
> >  			break;					\
> >  __label_warn_on:						\
> >  			__ret_warn_on = true;			\
> > 
> > 
> > Generates:
> > 
> > c0000000008e2ac0 <.klist_add_tail>:
> > c0000000008e2ac0:       7c 08 02 a6     mflr    r0
> > c0000000008e2ac4:       f8 01 00 10     std     r0,16(r1)
> > c0000000008e2ac8:       f8 21 ff 71     stdu    r1,-144(r1)
> > c0000000008e2acc:       fb a1 00 78     std     r29,120(r1)
> > c0000000008e2ad0:       7c 7d 1b 78     mr      r29,r3
> > c0000000008e2ad4:       38 60 00 01     li      r3,1
> > c0000000008e2ad8:       fb c1 00 80     std     r30,128(r1)
> > c0000000008e2adc:       7c 9e 23 78     mr      r30,r4
> > c0000000008e2ae0:       38 9d 00 18     addi    r4,r29,24
> > c0000000008e2ae4:       57 c5 07 fe     clrlwi  r5,r30,31	<-
> > c0000000008e2ae8:       fb 81 00 70     std     r28,112(r1)
> > c0000000008e2aec:       3b 9d 00 08     addi    r28,r29,8
> > c0000000008e2af0:       fb 9d 00 08     std     r28,8(r29)
> > c0000000008e2af4:       fb 9d 00 10     std     r28,16(r29)
> > c0000000008e2af8:       90 64 00 00     stw     r3,0(r4)
> > c0000000008e2afc:       fb dd 00 00     std     r30,0(r29)
> > c0000000008e2b00:       0b 05 00 00     tdnei   r5,0		<-
> 
> Thank you for the in-depth explanation and triage! I have gone ahead and
> created a standalone reproducer that shows this based on the
> preprocessed file and opened https://bugs.llvm.org/show_bug.cgi?id=51634
> so the LLVM developers can take a look.

Based on Eli Friedman's comment in the bug, would something like this
work and not regress GCC? I noticed that the BUG_ON macro does a cast as
well. Nick pointed out to me privately that we have run into what seems
like a similar issue before in commit 6c58f25e6938 ("riscv/atomic: Fix
sign extension for RV64I").

Cheers,
Nathan

diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
index 1ee0f22313ee..35022667f57d 100644
--- a/arch/powerpc/include/asm/bug.h
+++ b/arch/powerpc/include/asm/bug.h
@@ -119,7 +119,7 @@ __label_warn_on:                                            \
                                                                \
                        WARN_ENTRY(PPC_TLNEI " %4, 0",          \
                                   BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN), \
-                                  __label_warn_on, "r" (x));   \
+                                  __label_warn_on, "r" ((__force long)(x)));   \
                        break;                                  \
 __label_warn_on:                                               \
                        __ret_warn_on = true;                   \

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32
  2021-08-26 15:30                 ` Segher Boessenkool
@ 2021-08-27  1:28                   ` Nicholas Piggin
  -1 siblings, 0 replies; 48+ messages in thread
From: Nicholas Piggin @ 2021-08-27  1:28 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Benjamin Herrenschmidt, Christophe Leroy, linux-kernel,
	linuxppc-dev, Michael Ellerman, Paul Mackerras

Excerpts from Segher Boessenkool's message of August 27, 2021 1:30 am:
> On Fri, Aug 27, 2021 at 01:04:36AM +1000, Nicholas Piggin wrote:
>> Excerpts from Segher Boessenkool's message of August 27, 2021 12:37 am:
>> >> No, they are all dispatched and issue to the BRU for execution. It's 
>> >> trivial to construct a test of a lot of not taken branches in a row
>> >> and time a loop of it to see it executes at 1 cycle per branch.
>> > 
>> > (s/dispatched/issued/)
>> 
>> ?
> 
> Dispatch is from decode to the issue queues.  Issue is from there to
> execution units.  Dispatch is in-order, issue is not.

I know what those mean, I wonder what your s/dispatched/issued means.
I was saying they are dispatched in response to you saying they never
hit the issue queue.

> 
>> >> How could it validate prediction without issuing? It wouldn't know when
>> >> sources are ready.
>> > 
>> > In the backend.  But that is just how it worked on older cores :-/
>> 
>> Okay. I don't know about older cores than POWER9. Backend would normally 
>> include execution though.
>> Only other place you could do it if you don't
>> issue/exec would be after it goes back in order, like completion.
> 
> You do not have to do the verification in-order: the insn cannot finish
> until it is no longer speculative, that takes care of all ordering
> needed.

Branches *can* finish out of order and speculative as they do in P9 and 
P10. Are you talking about these CPUs or something else which can 
verify branches without issuing them?

> 
>> But that would be horrible for mispredict penalty.
> 
> See the previous point.  Also, any insn known to be mispredicted can be
> flushed immediately anyway.

The point is it has to know sources (CR) to verify (aka execute) the 
branch prediction was right, and if it needs sources then it needs to 
either issue and execute in the out of order part, or it needs to wait
until completion which would seem to be prohibitively expensive. I am
interested to know how it works.

> 
>> >> >> The first problem seems like the show stopper though. AFAIKS it would 
>> >> >> need a special builtin support that does something to create the table
>> >> >> entry, or a guarantee that we could put an inline asm right after the
>> >> >> builtin as a recognized pattern and that would give us the instruction
>> >> >> following the trap.
>> >> > 
>> >> > I'm not quite sure what this means.  Can't you always just put a
>> >> > 
>> >> > bla:	asm("");
>> >> > 
>> >> > in there, and use the address of "bla"?
>> >> 
>> >> Not AFAIKS. Put it where?
>> > 
>> > After wherever you want to know the address after.  You will have to
>> > make sure they stay together somehow.
>> 
>> I still don't follow.
> 
> some_thing_you_want_to_know_the_address_after_let_us_call_it_A;
> empty_asm_that_we_can_take_the_address_of_known_as_B;
> 
> You have to make sure the compiler keeps A and B together, does not
> insert anything between them, does put them in the assembler output in
> the same fragment, etc.

How does all this help our problem of putting the address of the trap 
into the table?

> 
>> If you could give a built in that put a label at the address of the trap 
>> instruction that could be used later by inline asm then that could work
>> too:
>> 
>>     __builtin_labeled_trap("1:");
>>     asm ("    .section __bug_table,\"aw\"  \n\t"
>>          "2:  .4byte 1b - 2b               \n\t"
>>          "    .previous");
> 
> How could a compiler do anything like that?!

How could it add a label at the trap instruction it generates? It didn't 
seem like an outlandish thing to do, but I'm not a compiler writer. It was 
just a handwaving idea to show what we want to be able to do.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32
@ 2021-08-27  1:28                   ` Nicholas Piggin
  0 siblings, 0 replies; 48+ messages in thread
From: Nicholas Piggin @ 2021-08-27  1:28 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: linux-kernel, Paul Mackerras, linuxppc-dev

Excerpts from Segher Boessenkool's message of August 27, 2021 1:30 am:
> On Fri, Aug 27, 2021 at 01:04:36AM +1000, Nicholas Piggin wrote:
>> Excerpts from Segher Boessenkool's message of August 27, 2021 12:37 am:
>> >> No, they are all dispatched and issue to the BRU for execution. It's 
>> >> trivial to construct a test of a lot of not taken branches in a row
>> >> and time a loop of it to see it executes at 1 cycle per branch.
>> > 
>> > (s/dispatched/issued/)
>> 
>> ?
> 
> Dispatch is from decode to the issue queues.  Issue is from there to
> execution units.  Dispatch is in-order, issue is not.

I know what those mean, I wonder what your s/dispatched/issued means.
I was saying they are dispatched in response to you saying they never
hit the issue queue.

> 
>> >> How could it validate prediction without issuing? It wouldn't know when
>> >> sources are ready.
>> > 
>> > In the backend.  But that is just how it worked on older cores :-/
>> 
>> Okay. I don't know about older cores than POWER9. Backend would normally 
>> include execution though.
>> Only other place you could do it if you don't
>> issue/exec would be after it goes back in order, like completion.
> 
> You do not have to do the verification in-order: the insn cannot finish
> until it is no longer speculative, that takes care of all ordering
> needed.

Branches *can* finish out of order and speculative as they do in P9 and 
P10. Are you talking about these CPUs or something else which can 
verify branches without issuing them?

> 
>> But that would be horrible for mispredict penalty.
> 
> See the previous point.  Also, any insn known to be mispredicted can be
> flushed immediately anyway.

The point is it has to know sources (CR) to verify (aka execute) the 
branch prediction was right, and if it needs sources then it needs to 
either issue and execute in the out of order part, or it needs to wait
until completion which would seem to be prohibitively expensive. I am
interested to know how it works.

> 
>> >> >> The first problem seems like the show stopper though. AFAIKS it would 
>> >> >> need a special builtin support that does something to create the table
>> >> >> entry, or a guarantee that we could put an inline asm right after the
>> >> >> builtin as a recognized pattern and that would give us the instruction
>> >> >> following the trap.
>> >> > 
>> >> > I'm not quite sure what this means.  Can't you always just put a
>> >> > 
>> >> > bla:	asm("");
>> >> > 
>> >> > in there, and use the address of "bla"?
>> >> 
>> >> Not AFAIKS. Put it where?
>> > 
>> > After wherever you want to know the address after.  You will have to
>> > make sure they stay together somehow.
>> 
>> I still don't follow.
> 
> some_thing_you_want_to_know_the_address_after_let_us_call_it_A;
> empty_asm_that_we_can_take_the_address_of_known_as_B;
> 
> You have to make sure the compiler keeps A and B together, does not
> insert anything between them, does put them in the assembler output in
> the same fragment, etc.

How does all this help our problem of putting the address of the trap 
into the table?

> 
>> If you could give a built in that put a label at the address of the trap 
>> instruction that could be used later by inline asm then that could work
>> too:
>> 
>>     __builtin_labeled_trap("1:");
>>     asm ("    .section __bug_table,\"aw\"  \n\t"
>>          "2:  .4byte 1b - 2b               \n\t"
>>          "    .previous");
> 
> How could a compiler do anything like that?!

How could it add a label at the trap instruction it generates? It didn't 
seem like an outlandish thing to do, but I'm not a compiler writer. It was 
just a handwaving idea to show what we want to be able to do.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto
  2021-08-26 23:55           ` Nathan Chancellor
@ 2021-08-27  7:53             ` Michael Ellerman
  -1 siblings, 0 replies; 48+ messages in thread
From: Michael Ellerman @ 2021-08-27  7:53 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: llvm, linux-kernel, clang-built-linux, Paul Mackerras,
	linuxppc-dev, Christophe Leroy, Nick Desaulniers

Nathan Chancellor <nathan@kernel.org> writes:
> On Thu, Aug 26, 2021 at 11:54:17AM -0700, Nathan Chancellor wrote:
>> On Thu, Aug 26, 2021 at 01:21:39PM +1000, Michael Ellerman wrote:
>> > Nathan Chancellor <nathan@kernel.org> writes:
>> > > On Tue, Apr 13, 2021 at 04:38:10PM +0000, Christophe Leroy wrote:
>> > >> Using asm goto in __WARN_FLAGS() and WARN_ON() allows more
>> > >> flexibility to GCC.
>> > ...
>> > >
>> > > This patch as commit 1e688dd2a3d6 ("powerpc/bug: Provide better
>> > > flexibility to WARN_ON/__WARN_FLAGS() with asm goto") cause a WARN_ON in
>> > > klist_add_tail to trigger over and over on boot when compiling with
>> > > clang:
>> > >
...
>> > 
>> > This patch seems to fix it. Not sure if that's just papering over it though.
>> > 
>> > diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
>> > index 1ee0f22313ee..75fcb4370d96 100644
>> > --- a/arch/powerpc/include/asm/bug.h
>> > +++ b/arch/powerpc/include/asm/bug.h
>> > @@ -119,7 +119,7 @@ __label_warn_on:						\
>> >  								\
>> >  			WARN_ENTRY(PPC_TLNEI " %4, 0",		\
>> >  				   BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN),	\
>> > -				   __label_warn_on, "r" (x));	\
>> > +				   __label_warn_on, "r" (!!(x))); \
>> >  			break;					\
>> >  __label_warn_on:						\
>> >  			__ret_warn_on = true;			\
>> > 
>> 
>> Thank you for the in-depth explanation and triage! I have gone ahead and
>> created a standalone reproducer that shows this based on the
>> preprocessed file and opened https://bugs.llvm.org/show_bug.cgi?id=51634
>> so the LLVM developers can take a look.
>
> Based on Eli Friedman's comment in the bug, would something like this
> work and not regress GCC? I noticed that the BUG_ON macro does a cast as
> well. Nick pointed out to me privately that we have run into what seems
> like a similar issue before in commit 6c58f25e6938 ("riscv/atomic: Fix
> sign extension for RV64I").

Yes, I read that comment this morning, and then landed at the same fix
via digging through the history of our bug code.

We in fact fixed a similar bug 16 years ago :}

32818c2eb6b8 ("[PATCH] ppc64: Fix issue with gcc 4.0 compiled kernels")

Which is when we first started adding the cast to long.


> diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
> index 1ee0f22313ee..35022667f57d 100644
> --- a/arch/powerpc/include/asm/bug.h
> +++ b/arch/powerpc/include/asm/bug.h
> @@ -119,7 +119,7 @@ __label_warn_on:                                            \
>                                                                 \
>                         WARN_ENTRY(PPC_TLNEI " %4, 0",          \
>                                    BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN), \
> -                                  __label_warn_on, "r" (x));   \
> +                                  __label_warn_on, "r" ((__force long)(x)));   \
>                         break;                                  \
>  __label_warn_on:                                               \
>                         __ret_warn_on = true;                   \


Yeah that fixes the clang build for me.

For GCC it seems to generate the same code in the simple cases:

void test_warn_on_ulong(unsigned long b)
{
        WARN_ON(b);
}

void test_warn_on_int(int b)
{
        WARN_ON(b);
}

I get:

0000000000000020 <.test_warn_on_ulong>:
  20:   0b 03 00 00     tdnei   r3,0
  24:   4e 80 00 20     blr
  28:   60 00 00 00     nop
  2c:   60 00 00 00     nop

0000000000000030 <.test_warn_on_int>:
  30:   0b 03 00 00     tdnei   r3,0
  34:   4e 80 00 20     blr

Both before and after the change. So that's good.

For:

void test_warn_on_int_addition(int b)
{
        WARN_ON(b+1);
}

Before I get:

0000000000000040 <.test_warn_on_int_addition>:
  40:   38 63 00 01     addi    r3,r3,1
  44:   0b 03 00 00     tdnei   r3,0
  48:   4e 80 00 20     blr

vs after:

0000000000000040 <.test_warn_on_int_addition>:
  40:   38 63 00 01     addi    r3,r3,1
  44:   7c 63 07 b4     extsw   r3,r3
  48:   0b 03 00 00     tdnei   r3,0
  4c:   4e 80 00 20     blr


So there's an extra sign extension we don't need, but I guess we can
probably live with that.

cheers

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto
@ 2021-08-27  7:53             ` Michael Ellerman
  0 siblings, 0 replies; 48+ messages in thread
From: Michael Ellerman @ 2021-08-27  7:53 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: llvm, Nick Desaulniers, linux-kernel, clang-built-linux,
	Paul Mackerras, linuxppc-dev

Nathan Chancellor <nathan@kernel.org> writes:
> On Thu, Aug 26, 2021 at 11:54:17AM -0700, Nathan Chancellor wrote:
>> On Thu, Aug 26, 2021 at 01:21:39PM +1000, Michael Ellerman wrote:
>> > Nathan Chancellor <nathan@kernel.org> writes:
>> > > On Tue, Apr 13, 2021 at 04:38:10PM +0000, Christophe Leroy wrote:
>> > >> Using asm goto in __WARN_FLAGS() and WARN_ON() allows more
>> > >> flexibility to GCC.
>> > ...
>> > >
>> > > This patch as commit 1e688dd2a3d6 ("powerpc/bug: Provide better
>> > > flexibility to WARN_ON/__WARN_FLAGS() with asm goto") cause a WARN_ON in
>> > > klist_add_tail to trigger over and over on boot when compiling with
>> > > clang:
>> > >
...
>> > 
>> > This patch seems to fix it. Not sure if that's just papering over it though.
>> > 
>> > diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
>> > index 1ee0f22313ee..75fcb4370d96 100644
>> > --- a/arch/powerpc/include/asm/bug.h
>> > +++ b/arch/powerpc/include/asm/bug.h
>> > @@ -119,7 +119,7 @@ __label_warn_on:						\
>> >  								\
>> >  			WARN_ENTRY(PPC_TLNEI " %4, 0",		\
>> >  				   BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN),	\
>> > -				   __label_warn_on, "r" (x));	\
>> > +				   __label_warn_on, "r" (!!(x))); \
>> >  			break;					\
>> >  __label_warn_on:						\
>> >  			__ret_warn_on = true;			\
>> > 
>> 
>> Thank you for the in-depth explanation and triage! I have gone ahead and
>> created a standalone reproducer that shows this based on the
>> preprocessed file and opened https://bugs.llvm.org/show_bug.cgi?id=51634
>> so the LLVM developers can take a look.
>
> Based on Eli Friedman's comment in the bug, would something like this
> work and not regress GCC? I noticed that the BUG_ON macro does a cast as
> well. Nick pointed out to me privately that we have run into what seems
> like a similar issue before in commit 6c58f25e6938 ("riscv/atomic: Fix
> sign extension for RV64I").

Yes, I read that comment this morning, and then landed at the same fix
via digging through the history of our bug code.

We in fact fixed a similar bug 16 years ago :}

32818c2eb6b8 ("[PATCH] ppc64: Fix issue with gcc 4.0 compiled kernels")

Which is when we first started adding the cast to long.


> diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
> index 1ee0f22313ee..35022667f57d 100644
> --- a/arch/powerpc/include/asm/bug.h
> +++ b/arch/powerpc/include/asm/bug.h
> @@ -119,7 +119,7 @@ __label_warn_on:                                            \
>                                                                 \
>                         WARN_ENTRY(PPC_TLNEI " %4, 0",          \
>                                    BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN), \
> -                                  __label_warn_on, "r" (x));   \
> +                                  __label_warn_on, "r" ((__force long)(x)));   \
>                         break;                                  \
>  __label_warn_on:                                               \
>                         __ret_warn_on = true;                   \


Yeah that fixes the clang build for me.

For GCC it seems to generate the same code in the simple cases:

void test_warn_on_ulong(unsigned long b)
{
        WARN_ON(b);
}

void test_warn_on_int(int b)
{
        WARN_ON(b);
}

I get:

0000000000000020 <.test_warn_on_ulong>:
  20:   0b 03 00 00     tdnei   r3,0
  24:   4e 80 00 20     blr
  28:   60 00 00 00     nop
  2c:   60 00 00 00     nop

0000000000000030 <.test_warn_on_int>:
  30:   0b 03 00 00     tdnei   r3,0
  34:   4e 80 00 20     blr

Both before and after the change. So that's good.

For:

void test_warn_on_int_addition(int b)
{
        WARN_ON(b+1);
}

Before I get:

0000000000000040 <.test_warn_on_int_addition>:
  40:   38 63 00 01     addi    r3,r3,1
  44:   0b 03 00 00     tdnei   r3,0
  48:   4e 80 00 20     blr

vs after:

0000000000000040 <.test_warn_on_int_addition>:
  40:   38 63 00 01     addi    r3,r3,1
  44:   7c 63 07 b4     extsw   r3,r3
  48:   0b 03 00 00     tdnei   r3,0
  4c:   4e 80 00 20     blr


So there's an extra sign extension we don't need, but I guess we can
probably live with that.

cheers

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2021-08-27  7:54 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-13 16:38 [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32 Christophe Leroy
2021-04-13 16:38 ` Christophe Leroy
2021-04-13 16:38 ` [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto Christophe Leroy
2021-04-13 16:38   ` Christophe Leroy
2021-08-13  6:19   ` Nicholas Piggin
2021-08-13  6:19     ` Nicholas Piggin
2021-08-15  3:49   ` Michael Ellerman
2021-08-15  3:49     ` Michael Ellerman
2021-08-25 21:25   ` Nathan Chancellor
2021-08-25 21:25     ` Nathan Chancellor
2021-08-26  3:21     ` Michael Ellerman
2021-08-26  3:21       ` Michael Ellerman
2021-08-26  6:37       ` Christophe Leroy
2021-08-26  6:37         ` Christophe Leroy
2021-08-26 13:47         ` Segher Boessenkool
2021-08-26 13:47           ` Segher Boessenkool
2021-08-26 14:45         ` Michael Ellerman
2021-08-26 14:45           ` Michael Ellerman
2021-08-26 14:53           ` Christophe Leroy
2021-08-26 14:53             ` Christophe Leroy
2021-08-26 14:12       ` Segher Boessenkool
2021-08-26 14:12         ` Segher Boessenkool
2021-08-26 18:54       ` Nathan Chancellor
2021-08-26 18:54         ` Nathan Chancellor
2021-08-26 23:55         ` Nathan Chancellor
2021-08-26 23:55           ` Nathan Chancellor
2021-08-27  7:53           ` Michael Ellerman
2021-08-27  7:53             ` Michael Ellerman
2021-08-13  6:08 ` [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32 Nicholas Piggin
2021-08-13  6:08   ` Nicholas Piggin
2021-08-18 15:06   ` Segher Boessenkool
2021-08-18 15:06     ` Segher Boessenkool
2021-08-26  3:26     ` Nicholas Piggin
2021-08-26  3:26       ` Nicholas Piggin
2021-08-26 12:49       ` Segher Boessenkool
2021-08-26 12:49         ` Segher Boessenkool
2021-08-26 13:57         ` Nicholas Piggin
2021-08-26 13:57           ` Nicholas Piggin
2021-08-26 14:37           ` Segher Boessenkool
2021-08-26 14:37             ` Segher Boessenkool
2021-08-26 15:04             ` Nicholas Piggin
2021-08-26 15:04               ` Nicholas Piggin
2021-08-26 15:30               ` Segher Boessenkool
2021-08-26 15:30                 ` Segher Boessenkool
2021-08-27  1:28                 ` Nicholas Piggin
2021-08-27  1:28                   ` Nicholas Piggin
2021-08-18 13:38 ` Michael Ellerman
2021-08-18 13:38   ` Michael Ellerman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.