* [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32 @ 2021-04-13 16:38 Christophe Leroy 2021-04-13 16:38 ` [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto Christophe Leroy ` (2 more replies) 0 siblings, 3 replies; 24+ messages in thread From: Christophe Leroy @ 2021-04-13 16:38 UTC (permalink / raw) To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman Cc: linux-kernel, linuxppc-dev powerpc BUG_ON() and WARN_ON() are based on using twnei instruction. For catching simple conditions like a variable having value 0, this is efficient because it does the test and the trap at the same time. But most conditions used with BUG_ON or WARN_ON are more complex and forces GCC to format the condition into a 0 or 1 value in a register. This will usually require 2 to 3 instructions. The most efficient solution would be to use __builtin_trap() because GCC is able to optimise the use of the different trap instructions based on the requested condition, but this is complex if not impossible for the following reasons: - __builtin_trap() is a non-recoverable instruction, so it can't be used for WARN_ON - Knowing which line of code generated the trap would require the analysis of DWARF information. This is not a feature we have today. As mentioned in commit 8d4fbcfbe0a4 ("Fix WARN_ON() on bitfield ops") the way WARN_ON() is implemented is suboptimal. That commit also mentions an issue with 'long long' condition. It fixed it for WARN_ON() but the same problem still exists today with BUG_ON() on PPC32. It will be fixed by using the generic implementation. By using the generic implementation, gcc will naturally generate a branch to the unconditional trap generated by BUG(). As modern powerpc implement zero-cycle branch, that's even more efficient. And for the functions using WARN_ON() and its return, the test on return from WARN_ON() is now also used for the WARN_ON() itself. On PPC64 we don't want it because we want to be able to use CFAR register to track how we entered the code that trapped. The CFAR register would be clobbered by the branch. A simple test function: unsigned long test9w(unsigned long a, unsigned long b) { if (WARN_ON(!b)) return 0; return a / b; } Before the patch: 0000046c <test9w>: 46c: 7c 89 00 34 cntlzw r9,r4 470: 55 29 d9 7e rlwinm r9,r9,27,5,31 474: 0f 09 00 00 twnei r9,0 478: 2c 04 00 00 cmpwi r4,0 47c: 41 82 00 0c beq 488 <test9w+0x1c> 480: 7c 63 23 96 divwu r3,r3,r4 484: 4e 80 00 20 blr 488: 38 60 00 00 li r3,0 48c: 4e 80 00 20 blr After the patch: 00000468 <test9w>: 468: 2c 04 00 00 cmpwi r4,0 46c: 41 82 00 0c beq 478 <test9w+0x10> 470: 7c 63 23 96 divwu r3,r3,r4 474: 4e 80 00 20 blr 478: 0f e0 00 00 twui r0,0 47c: 38 60 00 00 li r3,0 480: 4e 80 00 20 blr So we see before the patch we need 3 instructions on the likely path to handle the WARN_ON(). With the patch the trap goes on the unlikely path. See below the difference at the entry of system_call_exception where we have several BUG_ON(), allthough less impressing. With the patch: 00000000 <system_call_exception>: 0: 81 6a 00 84 lwz r11,132(r10) 4: 90 6a 00 88 stw r3,136(r10) 8: 71 60 00 02 andi. r0,r11,2 c: 41 82 00 70 beq 7c <system_call_exception+0x7c> 10: 71 60 40 00 andi. r0,r11,16384 14: 41 82 00 6c beq 80 <system_call_exception+0x80> 18: 71 6b 80 00 andi. r11,r11,32768 1c: 41 82 00 68 beq 84 <system_call_exception+0x84> 20: 94 21 ff e0 stwu r1,-32(r1) 24: 93 e1 00 1c stw r31,28(r1) 28: 7d 8c 42 e6 mftb r12 ... 7c: 0f e0 00 00 twui r0,0 80: 0f e0 00 00 twui r0,0 84: 0f e0 00 00 twui r0,0 Without the patch: 00000000 <system_call_exception>: 0: 94 21 ff e0 stwu r1,-32(r1) 4: 93 e1 00 1c stw r31,28(r1) 8: 90 6a 00 88 stw r3,136(r10) c: 81 6a 00 84 lwz r11,132(r10) 10: 69 60 00 02 xori r0,r11,2 14: 54 00 ff fe rlwinm r0,r0,31,31,31 18: 0f 00 00 00 twnei r0,0 1c: 69 60 40 00 xori r0,r11,16384 20: 54 00 97 fe rlwinm r0,r0,18,31,31 24: 0f 00 00 00 twnei r0,0 28: 69 6b 80 00 xori r11,r11,32768 2c: 55 6b 8f fe rlwinm r11,r11,17,31,31 30: 0f 0b 00 00 twnei r11,0 34: 7d 8c 42 e6 mftb r12 Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> --- arch/powerpc/include/asm/bug.h | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h index d1635ffbb179..101dea4eec8d 100644 --- a/arch/powerpc/include/asm/bug.h +++ b/arch/powerpc/include/asm/bug.h @@ -68,7 +68,11 @@ BUG_ENTRY("twi 31, 0, 0", 0); \ unreachable(); \ } while (0) +#define HAVE_ARCH_BUG + +#define __WARN_FLAGS(flags) BUG_ENTRY("twi 31, 0, 0", BUGFLAG_WARNING | (flags)) +#ifdef CONFIG_PPC64 #define BUG_ON(x) do { \ if (__builtin_constant_p(x)) { \ if (x) \ @@ -78,8 +82,6 @@ } \ } while (0) -#define __WARN_FLAGS(flags) BUG_ENTRY("twi 31, 0, 0", BUGFLAG_WARNING | (flags)) - #define WARN_ON(x) ({ \ int __ret_warn_on = !!(x); \ if (__builtin_constant_p(__ret_warn_on)) { \ @@ -93,9 +95,10 @@ unlikely(__ret_warn_on); \ }) -#define HAVE_ARCH_BUG #define HAVE_ARCH_BUG_ON #define HAVE_ARCH_WARN_ON +#endif + #endif /* __ASSEMBLY __ */ #else #ifdef __ASSEMBLY__ -- 2.25.0 ^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto 2021-04-13 16:38 [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32 Christophe Leroy @ 2021-04-13 16:38 ` Christophe Leroy 2021-08-13 6:19 ` Nicholas Piggin ` (2 more replies) 2021-08-13 6:08 ` [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32 Nicholas Piggin 2021-08-18 13:38 ` Michael Ellerman 2 siblings, 3 replies; 24+ messages in thread From: Christophe Leroy @ 2021-04-13 16:38 UTC (permalink / raw) To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman Cc: linux-kernel, linuxppc-dev Using asm goto in __WARN_FLAGS() and WARN_ON() allows more flexibility to GCC. For that add an entry to the exception table so that program_check_exception() knowns where to resume execution after a WARNING. Here are two exemples. The first one is done on PPC32 (which benefits from the previous patch), the second is on PPC64. unsigned long test(struct pt_regs *regs) { int ret; WARN_ON(regs->msr & MSR_PR); return regs->gpr[3]; } unsigned long test9w(unsigned long a, unsigned long b) { if (WARN_ON(!b)) return 0; return a / b; } Before the patch: 000003a8 <test>: 3a8: 81 23 00 84 lwz r9,132(r3) 3ac: 71 29 40 00 andi. r9,r9,16384 3b0: 40 82 00 0c bne 3bc <test+0x14> 3b4: 80 63 00 0c lwz r3,12(r3) 3b8: 4e 80 00 20 blr 3bc: 0f e0 00 00 twui r0,0 3c0: 80 63 00 0c lwz r3,12(r3) 3c4: 4e 80 00 20 blr 0000000000000bf0 <.test9w>: bf0: 7c 89 00 74 cntlzd r9,r4 bf4: 79 29 d1 82 rldicl r9,r9,58,6 bf8: 0b 09 00 00 tdnei r9,0 bfc: 2c 24 00 00 cmpdi r4,0 c00: 41 82 00 0c beq c0c <.test9w+0x1c> c04: 7c 63 23 92 divdu r3,r3,r4 c08: 4e 80 00 20 blr c0c: 38 60 00 00 li r3,0 c10: 4e 80 00 20 blr After the patch: 000003a8 <test>: 3a8: 81 23 00 84 lwz r9,132(r3) 3ac: 71 29 40 00 andi. r9,r9,16384 3b0: 40 82 00 0c bne 3bc <test+0x14> 3b4: 80 63 00 0c lwz r3,12(r3) 3b8: 4e 80 00 20 blr 3bc: 0f e0 00 00 twui r0,0 0000000000000c50 <.test9w>: c50: 7c 89 00 74 cntlzd r9,r4 c54: 79 29 d1 82 rldicl r9,r9,58,6 c58: 0b 09 00 00 tdnei r9,0 c5c: 7c 63 23 92 divdu r3,r3,r4 c60: 4e 80 00 20 blr c70: 38 60 00 00 li r3,0 c74: 4e 80 00 20 blr In the first exemple, we see GCC doesn't need to duplicate what happens after the trap. In the second exemple, we see that GCC doesn't need to emit a test and a branch in the likely path in addition to the trap. We've got some WARN_ON() in .softirqentry.text section so it needs to be added in the OTHER_TEXT_SECTIONS in modpost.c Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> --- v2: Fix build failure when CONFIG_BUG is not selected. --- arch/powerpc/include/asm/book3s/64/kup.h | 2 +- arch/powerpc/include/asm/bug.h | 54 ++++++++++++++++++++---- arch/powerpc/include/asm/extable.h | 14 ++++++ arch/powerpc/include/asm/ppc_asm.h | 11 +---- arch/powerpc/kernel/entry_64.S | 2 +- arch/powerpc/kernel/exceptions-64e.S | 2 +- arch/powerpc/kernel/misc_32.S | 2 +- arch/powerpc/kernel/traps.c | 9 +++- scripts/mod/modpost.c | 2 +- 9 files changed, 72 insertions(+), 26 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/64/kup.h b/arch/powerpc/include/asm/book3s/64/kup.h index 9700da3a4093..a22839cba32e 100644 --- a/arch/powerpc/include/asm/book3s/64/kup.h +++ b/arch/powerpc/include/asm/book3s/64/kup.h @@ -90,7 +90,7 @@ /* Prevent access to userspace using any key values */ LOAD_REG_IMMEDIATE(\gpr2, AMR_KUAP_BLOCKED) 999: tdne \gpr1, \gpr2 - EMIT_BUG_ENTRY 999b, __FILE__, __LINE__, (BUGFLAG_WARNING | BUGFLAG_ONCE) + EMIT_WARN_ENTRY 999b, __FILE__, __LINE__, (BUGFLAG_WARNING | BUGFLAG_ONCE) END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_BOOK3S_KUAP, 67) #endif .endm diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h index 101dea4eec8d..e22dc503fb2f 100644 --- a/arch/powerpc/include/asm/bug.h +++ b/arch/powerpc/include/asm/bug.h @@ -4,6 +4,7 @@ #ifdef __KERNEL__ #include <asm/asm-compat.h> +#include <asm/extable.h> #ifdef CONFIG_BUG @@ -30,6 +31,11 @@ .endm #endif /* verbose */ +.macro EMIT_WARN_ENTRY addr,file,line,flags + EX_TABLE(\addr,\addr+4) + EMIT_BUG_ENTRY \addr,\file,\line,\flags +.endm + #else /* !__ASSEMBLY__ */ /* _EMIT_BUG_ENTRY expects args %0,%1,%2,%3 to be FILE, LINE, flags and sizeof(struct bug_entry), respectively */ @@ -58,6 +64,16 @@ "i" (sizeof(struct bug_entry)), \ ##__VA_ARGS__) +#define WARN_ENTRY(insn, flags, label, ...) \ + asm_volatile_goto( \ + "1: " insn "\n" \ + EX_TABLE(1b, %l[label]) \ + _EMIT_BUG_ENTRY \ + : : "i" (__FILE__), "i" (__LINE__), \ + "i" (flags), \ + "i" (sizeof(struct bug_entry)), \ + ##__VA_ARGS__ : : label) + /* * BUG_ON() and WARN_ON() do their best to cooperate with compile-time * optimisations. However depending on the complexity of the condition @@ -70,7 +86,15 @@ } while (0) #define HAVE_ARCH_BUG -#define __WARN_FLAGS(flags) BUG_ENTRY("twi 31, 0, 0", BUGFLAG_WARNING | (flags)) +#define __WARN_FLAGS(flags) do { \ + __label__ __label_warn_on; \ + \ + WARN_ENTRY("twi 31, 0, 0", BUGFLAG_WARNING | (flags), __label_warn_on); \ + unreachable(); \ + \ +__label_warn_on: \ + break; \ +} while (0) #ifdef CONFIG_PPC64 #define BUG_ON(x) do { \ @@ -83,15 +107,24 @@ } while (0) #define WARN_ON(x) ({ \ - int __ret_warn_on = !!(x); \ - if (__builtin_constant_p(__ret_warn_on)) { \ - if (__ret_warn_on) \ + bool __ret_warn_on = false; \ + do { \ + if (__builtin_constant_p((x))) { \ + if (!(x)) \ + break; \ __WARN(); \ - } else { \ - BUG_ENTRY(PPC_TLNEI " %4, 0", \ - BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN), \ - "r" (__ret_warn_on)); \ - } \ + __ret_warn_on = true; \ + } else { \ + __label__ __label_warn_on; \ + \ + WARN_ENTRY(PPC_TLNEI " %4, 0", \ + BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN), \ + __label_warn_on, "r" (x)); \ + break; \ +__label_warn_on: \ + __ret_warn_on = true; \ + } \ + } while (0); \ unlikely(__ret_warn_on); \ }) @@ -104,8 +137,11 @@ #ifdef __ASSEMBLY__ .macro EMIT_BUG_ENTRY addr,file,line,flags .endm +.macro EMIT_WARN_ENTRY addr,file,line,flags +.endm #else /* !__ASSEMBLY__ */ #define _EMIT_BUG_ENTRY +#define _EMIT_WARN_ENTRY #endif #endif /* CONFIG_BUG */ diff --git a/arch/powerpc/include/asm/extable.h b/arch/powerpc/include/asm/extable.h index eb91b2d2935a..26ce2e5c0fa8 100644 --- a/arch/powerpc/include/asm/extable.h +++ b/arch/powerpc/include/asm/extable.h @@ -17,6 +17,8 @@ #define ARCH_HAS_RELATIVE_EXTABLE +#ifndef __ASSEMBLY__ + struct exception_table_entry { int insn; int fixup; @@ -28,3 +30,15 @@ static inline unsigned long extable_fixup(const struct exception_table_entry *x) } #endif + +/* + * Helper macro for exception table entries + */ +#define EX_TABLE(_fault, _target) \ + stringify_in_c(.section __ex_table,"a";)\ + stringify_in_c(.balign 4;) \ + stringify_in_c(.long (_fault) - . ;) \ + stringify_in_c(.long (_target) - . ;) \ + stringify_in_c(.previous) + +#endif diff --git a/arch/powerpc/include/asm/ppc_asm.h b/arch/powerpc/include/asm/ppc_asm.h index 8998122fc7e2..a74e1561535a 100644 --- a/arch/powerpc/include/asm/ppc_asm.h +++ b/arch/powerpc/include/asm/ppc_asm.h @@ -10,6 +10,7 @@ #include <asm/ppc-opcode.h> #include <asm/firmware.h> #include <asm/feature-fixups.h> +#include <asm/extable.h> #ifdef __ASSEMBLY__ @@ -772,16 +773,6 @@ END_FTR_SECTION_NESTED(CPU_FTR_CELL_TB_BUG, CPU_FTR_CELL_TB_BUG, 96) #endif /* __ASSEMBLY__ */ -/* - * Helper macro for exception table entries - */ -#define EX_TABLE(_fault, _target) \ - stringify_in_c(.section __ex_table,"a";)\ - stringify_in_c(.balign 4;) \ - stringify_in_c(.long (_fault) - . ;) \ - stringify_in_c(.long (_target) - . ;) \ - stringify_in_c(.previous) - #ifdef CONFIG_PPC_FSL_BOOK3E #define BTB_FLUSH(reg) \ lis reg,BUCSR_INIT@h; \ diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index 6c4d9e276c4d..faa64cc90f02 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -835,7 +835,7 @@ _GLOBAL(enter_rtas) */ lbz r0,PACAIRQSOFTMASK(r13) 1: tdeqi r0,IRQS_ENABLED - EMIT_BUG_ENTRY 1b,__FILE__,__LINE__,BUGFLAG_WARNING + EMIT_WARN_ENTRY 1b,__FILE__,__LINE__,BUGFLAG_WARNING #endif /* Hard-disable interrupts */ diff --git a/arch/powerpc/kernel/exceptions-64e.S b/arch/powerpc/kernel/exceptions-64e.S index e8eb9992a270..3f8c51390a04 100644 --- a/arch/powerpc/kernel/exceptions-64e.S +++ b/arch/powerpc/kernel/exceptions-64e.S @@ -1249,7 +1249,7 @@ fast_exception_return: /* The interrupt should not have soft enabled. */ lbz r7,PACAIRQSOFTMASK(r13) 1: tdeqi r7,IRQS_ENABLED - EMIT_BUG_ENTRY 1b,__FILE__,__LINE__,BUGFLAG_WARNING + EMIT_WARN_ENTRY 1b,__FILE__,__LINE__,BUGFLAG_WARNING #endif b fast_exception_return diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S index 6a076bef2932..21390f3119a9 100644 --- a/arch/powerpc/kernel/misc_32.S +++ b/arch/powerpc/kernel/misc_32.S @@ -237,7 +237,7 @@ _GLOBAL(copy_page) addi r3,r3,-4 0: twnei r5, 0 /* WARN if r3 is not cache aligned */ - EMIT_BUG_ENTRY 0b,__FILE__,__LINE__, BUGFLAG_WARNING + EMIT_WARN_ENTRY 0b,__FILE__,__LINE__, BUGFLAG_WARNING addi r4,r4,-4 diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c index efba99870691..ee40a637313d 100644 --- a/arch/powerpc/kernel/traps.c +++ b/arch/powerpc/kernel/traps.c @@ -1467,8 +1467,13 @@ static void do_program_check(struct pt_regs *regs) if (!(regs->msr & MSR_PR) && /* not user-mode */ report_bug(bugaddr, regs) == BUG_TRAP_TYPE_WARN) { - regs->nip += 4; - return; + const struct exception_table_entry *entry; + + entry = search_exception_tables(bugaddr); + if (entry) { + regs->nip = extable_fixup(entry) + regs->nip - bugaddr; + return; + } } _exception(SIGTRAP, regs, TRAP_BRKPT, regs->nip); return; diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c index 24725e50c7b4..34745f239208 100644 --- a/scripts/mod/modpost.c +++ b/scripts/mod/modpost.c @@ -926,7 +926,7 @@ static void check_section(const char *modname, struct elf_info *elf, ".kprobes.text", ".cpuidle.text", ".noinstr.text" #define OTHER_TEXT_SECTIONS ".ref.text", ".head.text", ".spinlock.text", \ ".fixup", ".entry.text", ".exception.text", ".text.*", \ - ".coldtext" + ".coldtext .softirqentry.text" #define INIT_SECTIONS ".init.*" #define MEM_INIT_SECTIONS ".meminit.*" -- 2.25.0 ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto 2021-04-13 16:38 ` [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto Christophe Leroy @ 2021-08-13 6:19 ` Nicholas Piggin 2021-08-15 3:49 ` Michael Ellerman 2021-08-25 21:25 ` Nathan Chancellor 2 siblings, 0 replies; 24+ messages in thread From: Nicholas Piggin @ 2021-08-13 6:19 UTC (permalink / raw) To: Benjamin Herrenschmidt, Christophe Leroy, Michael Ellerman, Paul Mackerras Cc: linux-kernel, linuxppc-dev Excerpts from Christophe Leroy's message of April 14, 2021 2:38 am: > Using asm goto in __WARN_FLAGS() and WARN_ON() allows more > flexibility to GCC. > > For that add an entry to the exception table so that > program_check_exception() knowns where to resume execution > after a WARNING. Nice idea. How much does it bloat the exception table? How easy would it be to make a different exception table for unimportant stuff like WARN_ON faults? > > Here are two exemples. The first one is done on PPC32 (which > benefits from the previous patch), the second is on PPC64. > > unsigned long test(struct pt_regs *regs) > { > int ret; > > WARN_ON(regs->msr & MSR_PR); > > return regs->gpr[3]; > } > > unsigned long test9w(unsigned long a, unsigned long b) > { > if (WARN_ON(!b)) > return 0; > return a / b; > } > > Before the patch: > > 000003a8 <test>: > 3a8: 81 23 00 84 lwz r9,132(r3) > 3ac: 71 29 40 00 andi. r9,r9,16384 > 3b0: 40 82 00 0c bne 3bc <test+0x14> > 3b4: 80 63 00 0c lwz r3,12(r3) > 3b8: 4e 80 00 20 blr > > 3bc: 0f e0 00 00 twui r0,0 > 3c0: 80 63 00 0c lwz r3,12(r3) > 3c4: 4e 80 00 20 blr > > 0000000000000bf0 <.test9w>: > bf0: 7c 89 00 74 cntlzd r9,r4 > bf4: 79 29 d1 82 rldicl r9,r9,58,6 > bf8: 0b 09 00 00 tdnei r9,0 > bfc: 2c 24 00 00 cmpdi r4,0 > c00: 41 82 00 0c beq c0c <.test9w+0x1c> > c04: 7c 63 23 92 divdu r3,r3,r4 > c08: 4e 80 00 20 blr > > c0c: 38 60 00 00 li r3,0 > c10: 4e 80 00 20 blr > > After the patch: > > 000003a8 <test>: > 3a8: 81 23 00 84 lwz r9,132(r3) > 3ac: 71 29 40 00 andi. r9,r9,16384 > 3b0: 40 82 00 0c bne 3bc <test+0x14> > 3b4: 80 63 00 0c lwz r3,12(r3) > 3b8: 4e 80 00 20 blr > > 3bc: 0f e0 00 00 twui r0,0 > > 0000000000000c50 <.test9w>: > c50: 7c 89 00 74 cntlzd r9,r4 > c54: 79 29 d1 82 rldicl r9,r9,58,6 > c58: 0b 09 00 00 tdnei r9,0 > c5c: 7c 63 23 92 divdu r3,r3,r4 > c60: 4e 80 00 20 blr > > c70: 38 60 00 00 li r3,0 > c74: 4e 80 00 20 blr One thing that would be nice for WARN_ON_ONCE is to patch out the trap after the program interrupt. I've seen sometimes the WARN_ON_ONCE starts firing a lot and slows down the kernel. That gets harder to do if the trap is now part of the control flow. I guess that's less important case for performance though. And in theory you might be able to replace the trap with an equivalent cmp and branch (but that's probably going too over engineering it). Thanks, Nick ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto 2021-04-13 16:38 ` [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto Christophe Leroy 2021-08-13 6:19 ` Nicholas Piggin @ 2021-08-15 3:49 ` Michael Ellerman 2021-08-25 21:25 ` Nathan Chancellor 2 siblings, 0 replies; 24+ messages in thread From: Michael Ellerman @ 2021-08-15 3:49 UTC (permalink / raw) To: Christophe Leroy, Benjamin Herrenschmidt, Paul Mackerras Cc: linux-kernel, linuxppc-dev Christophe Leroy <christophe.leroy@csgroup.eu> writes: > diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c > index 24725e50c7b4..34745f239208 100644 > --- a/scripts/mod/modpost.c > +++ b/scripts/mod/modpost.c > @@ -926,7 +926,7 @@ static void check_section(const char *modname, struct elf_info *elf, > ".kprobes.text", ".cpuidle.text", ".noinstr.text" > #define OTHER_TEXT_SECTIONS ".ref.text", ".head.text", ".spinlock.text", \ > ".fixup", ".entry.text", ".exception.text", ".text.*", \ > - ".coldtext" > + ".coldtext .softirqentry.text" This wasn't working, I updated it to: ".coldtext", ".softirqentry.text" Which works. cheers ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto 2021-04-13 16:38 ` [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto Christophe Leroy 2021-08-13 6:19 ` Nicholas Piggin 2021-08-15 3:49 ` Michael Ellerman @ 2021-08-25 21:25 ` Nathan Chancellor 2021-08-26 3:21 ` Michael Ellerman 2 siblings, 1 reply; 24+ messages in thread From: Nathan Chancellor @ 2021-08-25 21:25 UTC (permalink / raw) To: Christophe Leroy Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, linux-kernel, linuxppc-dev, clang-built-linux, llvm Hi Christophe, On Tue, Apr 13, 2021 at 04:38:10PM +0000, Christophe Leroy wrote: > Using asm goto in __WARN_FLAGS() and WARN_ON() allows more > flexibility to GCC. > > For that add an entry to the exception table so that > program_check_exception() knowns where to resume execution > after a WARNING. > > Here are two exemples. The first one is done on PPC32 (which > benefits from the previous patch), the second is on PPC64. > > unsigned long test(struct pt_regs *regs) > { > int ret; > > WARN_ON(regs->msr & MSR_PR); > > return regs->gpr[3]; > } > > unsigned long test9w(unsigned long a, unsigned long b) > { > if (WARN_ON(!b)) > return 0; > return a / b; > } > > Before the patch: > > 000003a8 <test>: > 3a8: 81 23 00 84 lwz r9,132(r3) > 3ac: 71 29 40 00 andi. r9,r9,16384 > 3b0: 40 82 00 0c bne 3bc <test+0x14> > 3b4: 80 63 00 0c lwz r3,12(r3) > 3b8: 4e 80 00 20 blr > > 3bc: 0f e0 00 00 twui r0,0 > 3c0: 80 63 00 0c lwz r3,12(r3) > 3c4: 4e 80 00 20 blr > > 0000000000000bf0 <.test9w>: > bf0: 7c 89 00 74 cntlzd r9,r4 > bf4: 79 29 d1 82 rldicl r9,r9,58,6 > bf8: 0b 09 00 00 tdnei r9,0 > bfc: 2c 24 00 00 cmpdi r4,0 > c00: 41 82 00 0c beq c0c <.test9w+0x1c> > c04: 7c 63 23 92 divdu r3,r3,r4 > c08: 4e 80 00 20 blr > > c0c: 38 60 00 00 li r3,0 > c10: 4e 80 00 20 blr > > After the patch: > > 000003a8 <test>: > 3a8: 81 23 00 84 lwz r9,132(r3) > 3ac: 71 29 40 00 andi. r9,r9,16384 > 3b0: 40 82 00 0c bne 3bc <test+0x14> > 3b4: 80 63 00 0c lwz r3,12(r3) > 3b8: 4e 80 00 20 blr > > 3bc: 0f e0 00 00 twui r0,0 > > 0000000000000c50 <.test9w>: > c50: 7c 89 00 74 cntlzd r9,r4 > c54: 79 29 d1 82 rldicl r9,r9,58,6 > c58: 0b 09 00 00 tdnei r9,0 > c5c: 7c 63 23 92 divdu r3,r3,r4 > c60: 4e 80 00 20 blr > > c70: 38 60 00 00 li r3,0 > c74: 4e 80 00 20 blr > > In the first exemple, we see GCC doesn't need to duplicate what > happens after the trap. > > In the second exemple, we see that GCC doesn't need to emit a test > and a branch in the likely path in addition to the trap. > > We've got some WARN_ON() in .softirqentry.text section so it needs > to be added in the OTHER_TEXT_SECTIONS in modpost.c > > Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> This patch as commit 1e688dd2a3d6 ("powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto") cause a WARN_ON in klist_add_tail to trigger over and over on boot when compiling with clang: [ 2.177416][ T1] WARNING: CPU: 0 PID: 1 at lib/klist.c:62 .klist_add_tail+0x3c/0x110 [ 2.177456][ T1] Modules linked in: [ 2.177481][ T1] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 5.14.0-rc7-next-20210825 #1 [ 2.177520][ T1] NIP: c0000000007ff81c LR: c00000000090a038 CTR: 0000000000000000 [ 2.177557][ T1] REGS: c0000000073c32a0 TRAP: 0700 Tainted: G W (5.14.0-rc7-next-20210825) [ 2.177593][ T1] MSR: 8000000002029032 <SF,VEC,EE,ME,IR,DR,RI> CR: 22000a40 XER: 00000000 [ 2.177667][ T1] CFAR: c00000000090a034 IRQMASK: 0 [ 2.177667][ T1] GPR00: c00000000090a038 c0000000073c3540 c000000001be3200 0000000000000001 [ 2.177667][ T1] GPR04: c0000000072d65c0 0000000000000000 c0000000091ba798 c0000000091bb0a0 [ 2.177667][ T1] GPR08: 0000000000000001 0000000000000000 c000000008581918 fffffffffffffc00 [ 2.177667][ T1] GPR12: 0000000044000240 c000000001dd0000 c000000000012300 0000000000000000 [ 2.177667][ T1] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 2.177667][ T1] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 2.177667][ T1] GPR24: 0000000000000000 c0000000017e3200 0000000000000000 c000000001a0e778 [ 2.177667][ T1] GPR28: c0000000072d65b0 c0000000072d65a8 c000000007de72c8 c0000000073c35d0 [ 2.178019][ T1] NIP [c0000000007ff81c] .klist_add_tail+0x3c/0x110 [ 2.178058][ T1] LR [c00000000090a038] .bus_add_driver+0x148/0x290 [ 2.178088][ T1] Call Trace: [ 2.178105][ T1] [c0000000073c3540] [c0000000073c35d0] 0xc0000000073c35d0 (unreliable) [ 2.178150][ T1] [c0000000073c35d0] [c00000000090a038] .bus_add_driver+0x148/0x290 [ 2.178190][ T1] [c0000000073c3670] [c00000000090fae8] .driver_register+0xb8/0x190 [ 2.178234][ T1] [c0000000073c3700] [c000000000be55c0] .__hid_register_driver+0x70/0xd0 [ 2.178275][ T1] [c0000000073c37a0] [c00000000116955c] .redragon_driver_init+0x34/0x58 [ 2.178314][ T1] [c0000000073c3820] [c000000000011ae0] .do_one_initcall+0x130/0x3b0 [ 2.178357][ T1] [c0000000073c3bb0] [c0000000011065e0] .do_initcall_level+0xd8/0x188 [ 2.178403][ T1] [c0000000073c3c50] [c0000000011064a8] .do_initcalls+0x7c/0xdc [ 2.178445][ T1] [c0000000073c3ce0] [c000000001106238] .kernel_init_freeable+0x178/0x21c [ 2.178491][ T1] [c0000000073c3d90] [c000000000012334] .kernel_init+0x34/0x220 [ 2.178530][ T1] [c0000000073c3e10] [c00000000000cf50] .ret_from_kernel_thread+0x58/0x60 [ 2.178569][ T1] Instruction dump: [ 2.178592][ T1] fba10078 7c7d1b78 38600001 fb810070 3b9d0008 fbc10080 7c9e2378 389d0018 [ 2.178662][ T1] fb9d0008 fb9d0010 90640000 fbdd0000 <0b1e0000> e87e0018 28230000 41820024 [ 2.178728][ T1] ---[ end trace 52ed3431f58f1847 ]--- Is this a bug with clang or is there something wrong with the patch? The vmlinux image is available at [1] if you want to inspect it and our QEMU command and the warning at boot can be viewed at [2]. If there is any other information I can provide, please let me know. [1] https://builds.tuxbuild.com/1xDcmp3Tvno0TTGxDVPedRKIKM2/ [2] https://github.com/ClangBuiltLinux/continuous-integration2/commit/cee159b66a58eb57fa2359e7888074b9da24126c/checks/3422232736/logs Cheers, Nathan ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto 2021-08-25 21:25 ` Nathan Chancellor @ 2021-08-26 3:21 ` Michael Ellerman 2021-08-26 6:37 ` Christophe Leroy ` (2 more replies) 0 siblings, 3 replies; 24+ messages in thread From: Michael Ellerman @ 2021-08-26 3:21 UTC (permalink / raw) To: Nathan Chancellor, Christophe Leroy Cc: Benjamin Herrenschmidt, Paul Mackerras, linux-kernel, linuxppc-dev, clang-built-linux, llvm Nathan Chancellor <nathan@kernel.org> writes: > On Tue, Apr 13, 2021 at 04:38:10PM +0000, Christophe Leroy wrote: >> Using asm goto in __WARN_FLAGS() and WARN_ON() allows more >> flexibility to GCC. ... > > This patch as commit 1e688dd2a3d6 ("powerpc/bug: Provide better > flexibility to WARN_ON/__WARN_FLAGS() with asm goto") cause a WARN_ON in > klist_add_tail to trigger over and over on boot when compiling with > clang: > > [ 2.177416][ T1] WARNING: CPU: 0 PID: 1 at lib/klist.c:62 .klist_add_tail+0x3c/0x110 > [ 2.177456][ T1] Modules linked in: > [ 2.177481][ T1] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 5.14.0-rc7-next-20210825 #1 > [ 2.177520][ T1] NIP: c0000000007ff81c LR: c00000000090a038 CTR: 0000000000000000 > [ 2.177557][ T1] REGS: c0000000073c32a0 TRAP: 0700 Tainted: G W (5.14.0-rc7-next-20210825) > [ 2.177593][ T1] MSR: 8000000002029032 <SF,VEC,EE,ME,IR,DR,RI> CR: 22000a40 XER: 00000000 > [ 2.177667][ T1] CFAR: c00000000090a034 IRQMASK: 0 > [ 2.177667][ T1] GPR00: c00000000090a038 c0000000073c3540 c000000001be3200 0000000000000001 > [ 2.177667][ T1] GPR04: c0000000072d65c0 0000000000000000 c0000000091ba798 c0000000091bb0a0 > [ 2.177667][ T1] GPR08: 0000000000000001 0000000000000000 c000000008581918 fffffffffffffc00 > [ 2.177667][ T1] GPR12: 0000000044000240 c000000001dd0000 c000000000012300 0000000000000000 > [ 2.177667][ T1] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > [ 2.177667][ T1] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > [ 2.177667][ T1] GPR24: 0000000000000000 c0000000017e3200 0000000000000000 c000000001a0e778 > [ 2.177667][ T1] GPR28: c0000000072d65b0 c0000000072d65a8 c000000007de72c8 c0000000073c35d0 > [ 2.178019][ T1] NIP [c0000000007ff81c] .klist_add_tail+0x3c/0x110 > [ 2.178058][ T1] LR [c00000000090a038] .bus_add_driver+0x148/0x290 > [ 2.178088][ T1] Call Trace: > [ 2.178105][ T1] [c0000000073c3540] [c0000000073c35d0] 0xc0000000073c35d0 (unreliable) > [ 2.178150][ T1] [c0000000073c35d0] [c00000000090a038] .bus_add_driver+0x148/0x290 > [ 2.178190][ T1] [c0000000073c3670] [c00000000090fae8] .driver_register+0xb8/0x190 > [ 2.178234][ T1] [c0000000073c3700] [c000000000be55c0] .__hid_register_driver+0x70/0xd0 > [ 2.178275][ T1] [c0000000073c37a0] [c00000000116955c] .redragon_driver_init+0x34/0x58 > [ 2.178314][ T1] [c0000000073c3820] [c000000000011ae0] .do_one_initcall+0x130/0x3b0 > [ 2.178357][ T1] [c0000000073c3bb0] [c0000000011065e0] .do_initcall_level+0xd8/0x188 > [ 2.178403][ T1] [c0000000073c3c50] [c0000000011064a8] .do_initcalls+0x7c/0xdc > [ 2.178445][ T1] [c0000000073c3ce0] [c000000001106238] .kernel_init_freeable+0x178/0x21c > [ 2.178491][ T1] [c0000000073c3d90] [c000000000012334] .kernel_init+0x34/0x220 > [ 2.178530][ T1] [c0000000073c3e10] [c00000000000cf50] .ret_from_kernel_thread+0x58/0x60 > [ 2.178569][ T1] Instruction dump: > [ 2.178592][ T1] fba10078 7c7d1b78 38600001 fb810070 3b9d0008 fbc10080 7c9e2378 389d0018 > [ 2.178662][ T1] fb9d0008 fb9d0010 90640000 fbdd0000 <0b1e0000> e87e0018 28230000 41820024 > [ 2.178728][ T1] ---[ end trace 52ed3431f58f1847 ]--- > > Is this a bug with clang or is there something wrong with the patch? The > vmlinux image is available at [1] if you want to inspect it and our QEMU > command and the warning at boot can be viewed at [2]. If there is any > other information I can provide, please let me know. > > [1] https://builds.tuxbuild.com/1xDcmp3Tvno0TTGxDVPedRKIKM2/ > [2] https://github.com/ClangBuiltLinux/continuous-integration2/commit/cee159b66a58eb57fa2359e7888074b9da24126c/checks/3422232736/logs Thanks. This is the generated assembly: c0000000007ff600 <.klist_add_tail>: c0000000007ff600: 7c 08 02 a6 mflr r0 c0000000007ff604: f8 01 00 10 std r0,16(r1) c0000000007ff608: f8 21 ff 71 stdu r1,-144(r1) ^ prolog c0000000007ff60c: fb a1 00 78 std r29,120(r1) save r29 to stack c0000000007ff610: 7c 7d 1b 78 mr r29,r3 r29 = struct klist_node *n c0000000007ff614: 38 60 00 01 li r3,1 r3 = 1 c0000000007ff618: fb 81 00 70 std r28,112(r1) save r28 to stack c0000000007ff61c: 3b 9d 00 08 addi r28,r29,8 r28 = &n->n_node c0000000007ff620: fb c1 00 80 std r30,128(r1) save r30 to stack c0000000007ff624: 7c 9e 23 78 mr r30,r4 r30 = struct klist *k c0000000007ff628: 38 9d 00 18 addi r4,r29,24 r4 = &n->n_ref c0000000007ff62c: fb 9d 00 08 std r28,8(r29) n->n_node.next = &n->n_node INIT_LIST_HEAD c0000000007ff630: fb 9d 00 10 std r28,16(r29) n->n_node.prev = &n->n_node c0000000007ff634: 90 64 00 00 stw r3,0(r4) kref_init(&n->n_ref) c0000000007ff638: fb dd 00 00 std r30,0(r29) n->n_klist = k c0000000007ff63c: 0b 1e 00 00 tdnei r30,0 trap if r30 (k) is not zero From: static void knode_set_klist(struct klist_node *knode, struct klist *klist) { knode->n_klist = klist; /* no knode deserves to start its life dead */ WARN_ON(knode_dead(knode)); ^^^^^^^^^^^^^^^^^ } Which expands to: static void knode_set_klist(struct klist_node *knode, struct klist *klist) { knode->n_klist = klist; ({ bool __ret_warn_on = false; do { ... } else { __label__ __label_warn_on; do { asm goto( "1: " "tdnei" " " " % 4, 0 " "\n " ".section __ex_table,\"a\";" " " ".balign 4;" " " ".long (1b) - . ;" " " ".long (%l[__label_warn_on]) - . ;" " " ".previous" " " ".section __bug_table,\"aw\"\n" "2:\t.4byte 1b - 2b, %0 - 2b\n" "\t.short %1, %2\n" ".org 2b+%3\n" ".previous\n" : : "i"("lib/klist.c"), "i"(62), "i"((1 << 0) | ((9) << 8)), "i"(sizeof(struct bug_entry)), "r"(knode_dead(knode)) ^^^^^^^^^^^^^^^^^^^^^ : : __label_warn_on); asm(""); } while (0); break; __label_warn_on: __ret_warn_on = true; } } while (0); __builtin_expect(!!(__ret_warn_on), 0); }); } And knode_dead is: #define KNODE_DEAD 1LU static bool knode_dead(struct klist_node *knode) { return (unsigned long)knode->n_klist & KNODE_DEAD; } So it's meant to warn if (n_klist & KNODE_DEAD) is not equal to zero. But in the asm: c0000000007ff600 <.klist_add_tail>: ... c0000000007ff624: 7c 9e 23 78 mr r30,r4 r30 = struct klist *k ... c0000000007ff638: fb dd 00 00 std r30,0(r29) n->n_klist = k c0000000007ff63c: 0b 1e 00 00 tdnei r30,0 trap if r30 (k) is not zero It's just warning if n_klist is not equal to zero. ie. we lost the "& KNODE_DEAD". In the GCC output you can see it: c0000000008c8a30 <.klist_node_init>: c0000000008c8a30: 39 24 00 08 addi r9,r4,8 c0000000008c8a34: 39 40 00 01 li r10,1 c0000000008c8a38: f9 24 00 08 std r9,8(r4) c0000000008c8a3c: f9 24 00 10 std r9,16(r4) c0000000008c8a40: 91 44 00 18 stw r10,24(r4) c0000000008c8a44: f8 64 00 00 std r3,0(r4) c0000000008c8a48: 54 69 07 fe clrlwi r9,r3,31 c0000000008c8a4c: 0b 09 00 00 tdnei r9,0 ie. the clrlwi is "clear left (word) immediate", and zeroes everything except bit 0, which is equivalent to "& KNODE_DEAD". So there seems to be some misunderstanding with clang, it doesn't like us passing an expression to the inline asm. AFAIK it is legal to pass expressions as inputs to inline asm, ie. it doesn't have to just be a variable name. This patch seems to fix it. Not sure if that's just papering over it though. diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h index 1ee0f22313ee..75fcb4370d96 100644 --- a/arch/powerpc/include/asm/bug.h +++ b/arch/powerpc/include/asm/bug.h @@ -119,7 +119,7 @@ __label_warn_on: \ \ WARN_ENTRY(PPC_TLNEI " %4, 0", \ BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN), \ - __label_warn_on, "r" (x)); \ + __label_warn_on, "r" (!!(x))); \ break; \ __label_warn_on: \ __ret_warn_on = true; \ Generates: c0000000008e2ac0 <.klist_add_tail>: c0000000008e2ac0: 7c 08 02 a6 mflr r0 c0000000008e2ac4: f8 01 00 10 std r0,16(r1) c0000000008e2ac8: f8 21 ff 71 stdu r1,-144(r1) c0000000008e2acc: fb a1 00 78 std r29,120(r1) c0000000008e2ad0: 7c 7d 1b 78 mr r29,r3 c0000000008e2ad4: 38 60 00 01 li r3,1 c0000000008e2ad8: fb c1 00 80 std r30,128(r1) c0000000008e2adc: 7c 9e 23 78 mr r30,r4 c0000000008e2ae0: 38 9d 00 18 addi r4,r29,24 c0000000008e2ae4: 57 c5 07 fe clrlwi r5,r30,31 <- c0000000008e2ae8: fb 81 00 70 std r28,112(r1) c0000000008e2aec: 3b 9d 00 08 addi r28,r29,8 c0000000008e2af0: fb 9d 00 08 std r28,8(r29) c0000000008e2af4: fb 9d 00 10 std r28,16(r29) c0000000008e2af8: 90 64 00 00 stw r3,0(r4) c0000000008e2afc: fb dd 00 00 std r30,0(r29) c0000000008e2b00: 0b 05 00 00 tdnei r5,0 <- cheers ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto 2021-08-26 3:21 ` Michael Ellerman @ 2021-08-26 6:37 ` Christophe Leroy 2021-08-26 13:47 ` Segher Boessenkool 2021-08-26 14:45 ` Michael Ellerman 2021-08-26 14:12 ` Segher Boessenkool 2021-08-26 18:54 ` Nathan Chancellor 2 siblings, 2 replies; 24+ messages in thread From: Christophe Leroy @ 2021-08-26 6:37 UTC (permalink / raw) To: Michael Ellerman, Nathan Chancellor Cc: Benjamin Herrenschmidt, Paul Mackerras, linux-kernel, linuxppc-dev, clang-built-linux, llvm Le 26/08/2021 à 05:21, Michael Ellerman a écrit : > Nathan Chancellor <nathan@kernel.org> writes: >> On Tue, Apr 13, 2021 at 04:38:10PM +0000, Christophe Leroy wrote: >>> Using asm goto in __WARN_FLAGS() and WARN_ON() allows more >>> flexibility to GCC. > ... >> >> This patch as commit 1e688dd2a3d6 ("powerpc/bug: Provide better >> flexibility to WARN_ON/__WARN_FLAGS() with asm goto") cause a WARN_ON in >> klist_add_tail to trigger over and over on boot when compiling with >> clang: ... > > This patch seems to fix it. Not sure if that's just papering over it though. > > diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h > index 1ee0f22313ee..75fcb4370d96 100644 > --- a/arch/powerpc/include/asm/bug.h > +++ b/arch/powerpc/include/asm/bug.h > @@ -119,7 +119,7 @@ __label_warn_on: \ > \ > WARN_ENTRY(PPC_TLNEI " %4, 0", \ > BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN), \ > - __label_warn_on, "r" (x)); \ > + __label_warn_on, "r" (!!(x))); \ > break; \ > __label_warn_on: \ > __ret_warn_on = true; \ > > But for a simple WARN_ON() call: void test(unsigned long b) { WARN_ON(b); } Without your change with GCC you get: 00000000000012d0 <.test>: 12d0: 0b 03 00 00 tdnei r3,0 12d4: 4e 80 00 20 blr With the !! change you get: 00000000000012d0 <.test>: 12d0: 31 23 ff ff addic r9,r3,-1 12d4: 7d 29 19 10 subfe r9,r9,r3 12d8: 0b 09 00 00 tdnei r9,0 12dc: 4e 80 00 20 blr Christophe ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto 2021-08-26 6:37 ` Christophe Leroy @ 2021-08-26 13:47 ` Segher Boessenkool 2021-08-26 14:45 ` Michael Ellerman 1 sibling, 0 replies; 24+ messages in thread From: Segher Boessenkool @ 2021-08-26 13:47 UTC (permalink / raw) To: Christophe Leroy Cc: Michael Ellerman, Nathan Chancellor, llvm, linux-kernel, clang-built-linux, Paul Mackerras, linuxppc-dev On Thu, Aug 26, 2021 at 08:37:09AM +0200, Christophe Leroy wrote: > Le 26/08/2021 à 05:21, Michael Ellerman a écrit : > >This patch seems to fix it. Not sure if that's just papering over it > >though. > > > >diff --git a/arch/powerpc/include/asm/bug.h > >b/arch/powerpc/include/asm/bug.h > >index 1ee0f22313ee..75fcb4370d96 100644 > >--- a/arch/powerpc/include/asm/bug.h > >+++ b/arch/powerpc/include/asm/bug.h > >@@ -119,7 +119,7 @@ __label_warn_on: \ > > \ > > WARN_ENTRY(PPC_TLNEI " %4, 0", \ > > BUGFLAG_WARNING | > > BUGFLAG_TAINT(TAINT_WARN), \ > >- __label_warn_on, "r" (x)); \ > >+ __label_warn_on, "r" (!!(x))); \ > > break; \ > > __label_warn_on: \ > > __ret_warn_on = true; \ > > But for a simple WARN_ON() call: > > void test(unsigned long b) > { > WARN_ON(b); > } > > Without your change with GCC you get: > > 00000000000012d0 <.test>: > 12d0: 0b 03 00 00 tdnei r3,0 > 12d4: 4e 80 00 20 blr > > > With the !! change you get: > > 00000000000012d0 <.test>: > 12d0: 31 23 ff ff addic r9,r3,-1 > 12d4: 7d 29 19 10 subfe r9,r9,r3 > 12d8: 0b 09 00 00 tdnei r9,0 > 12dc: 4e 80 00 20 blr That is because the asm (unlike the builtin) cannot be optimised by the compiler. Segher ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto 2021-08-26 6:37 ` Christophe Leroy 2021-08-26 13:47 ` Segher Boessenkool @ 2021-08-26 14:45 ` Michael Ellerman 2021-08-26 14:53 ` Christophe Leroy 1 sibling, 1 reply; 24+ messages in thread From: Michael Ellerman @ 2021-08-26 14:45 UTC (permalink / raw) To: Christophe Leroy, Nathan Chancellor Cc: Benjamin Herrenschmidt, Paul Mackerras, linux-kernel, linuxppc-dev, clang-built-linux, llvm Christophe Leroy <christophe.leroy@csgroup.eu> writes: > Le 26/08/2021 à 05:21, Michael Ellerman a écrit : >> Nathan Chancellor <nathan@kernel.org> writes: >>> On Tue, Apr 13, 2021 at 04:38:10PM +0000, Christophe Leroy wrote: >>>> Using asm goto in __WARN_FLAGS() and WARN_ON() allows more >>>> flexibility to GCC. >> ... >>> >>> This patch as commit 1e688dd2a3d6 ("powerpc/bug: Provide better >>> flexibility to WARN_ON/__WARN_FLAGS() with asm goto") cause a WARN_ON in >>> klist_add_tail to trigger over and over on boot when compiling with >>> clang: > > ... > >> >> This patch seems to fix it. Not sure if that's just papering over it though. >> >> diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h >> index 1ee0f22313ee..75fcb4370d96 100644 >> --- a/arch/powerpc/include/asm/bug.h >> +++ b/arch/powerpc/include/asm/bug.h >> @@ -119,7 +119,7 @@ __label_warn_on: \ >> \ >> WARN_ENTRY(PPC_TLNEI " %4, 0", \ >> BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN), \ >> - __label_warn_on, "r" (x)); \ >> + __label_warn_on, "r" (!!(x))); \ >> break; \ >> __label_warn_on: \ >> __ret_warn_on = true; \ > > But for a simple WARN_ON() call: > > void test(unsigned long b) > { > WARN_ON(b); > } > > Without your change with GCC you get: > > 00000000000012d0 <.test>: > 12d0: 0b 03 00 00 tdnei r3,0 > 12d4: 4e 80 00 20 blr > > > With the !! change you get: > > 00000000000012d0 <.test>: > 12d0: 31 23 ff ff addic r9,r3,-1 > 12d4: 7d 29 19 10 subfe r9,r9,r3 > 12d8: 0b 09 00 00 tdnei r9,0 > 12dc: 4e 80 00 20 blr Yeah that's a pity. We could do something like below, which is ugly, but would be better than having to revert the whole thing. Although this doesn't fix the strange warning in drivers/net/ethernet/sfc. So possibly we need a CLANG ifdef around the whole thing, and use the old style warn for clang. cheers diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h index 1ee0f22313ee..d978d9004d0d 100644 --- a/arch/powerpc/include/asm/bug.h +++ b/arch/powerpc/include/asm/bug.h @@ -106,6 +106,12 @@ __label_warn_on: \ } \ } while (0) +#ifdef CONFIG_CC_IS_CLANG +#define __clang_warn_hack(x) (!!(x)) +#else +#define __clang_warn_hack(x) (x) +#endif + #define WARN_ON(x) ({ \ bool __ret_warn_on = false; \ do { \ @@ -119,7 +125,8 @@ __label_warn_on: \ \ WARN_ENTRY(PPC_TLNEI " %4, 0", \ BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN), \ - __label_warn_on, "r" (x)); \ + __label_warn_on, \ + "r" __clang_warn_hack(x)); \ break; \ __label_warn_on: \ __ret_warn_on = true; \ ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto 2021-08-26 14:45 ` Michael Ellerman @ 2021-08-26 14:53 ` Christophe Leroy 0 siblings, 0 replies; 24+ messages in thread From: Christophe Leroy @ 2021-08-26 14:53 UTC (permalink / raw) To: Michael Ellerman, Nathan Chancellor Cc: Benjamin Herrenschmidt, Paul Mackerras, linux-kernel, linuxppc-dev, clang-built-linux, llvm Le 26/08/2021 à 16:45, Michael Ellerman a écrit : > Christophe Leroy <christophe.leroy@csgroup.eu> writes: >> Le 26/08/2021 à 05:21, Michael Ellerman a écrit : >>> Nathan Chancellor <nathan@kernel.org> writes: >>>> On Tue, Apr 13, 2021 at 04:38:10PM +0000, Christophe Leroy wrote: >>>>> Using asm goto in __WARN_FLAGS() and WARN_ON() allows more >>>>> flexibility to GCC. >>> ... >>>> >>>> This patch as commit 1e688dd2a3d6 ("powerpc/bug: Provide better >>>> flexibility to WARN_ON/__WARN_FLAGS() with asm goto") cause a WARN_ON in >>>> klist_add_tail to trigger over and over on boot when compiling with >>>> clang: >> >> ... >> >>> >>> This patch seems to fix it. Not sure if that's just papering over it though. >>> >>> diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h >>> index 1ee0f22313ee..75fcb4370d96 100644 >>> --- a/arch/powerpc/include/asm/bug.h >>> +++ b/arch/powerpc/include/asm/bug.h >>> @@ -119,7 +119,7 @@ __label_warn_on: \ >>> \ >>> WARN_ENTRY(PPC_TLNEI " %4, 0", \ >>> BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN), \ >>> - __label_warn_on, "r" (x)); \ >>> + __label_warn_on, "r" (!!(x))); \ >>> break; \ >>> __label_warn_on: \ >>> __ret_warn_on = true; \ >> >> But for a simple WARN_ON() call: >> >> void test(unsigned long b) >> { >> WARN_ON(b); >> } >> >> Without your change with GCC you get: >> >> 00000000000012d0 <.test>: >> 12d0: 0b 03 00 00 tdnei r3,0 >> 12d4: 4e 80 00 20 blr >> >> >> With the !! change you get: >> >> 00000000000012d0 <.test>: >> 12d0: 31 23 ff ff addic r9,r3,-1 >> 12d4: 7d 29 19 10 subfe r9,r9,r3 >> 12d8: 0b 09 00 00 tdnei r9,0 >> 12dc: 4e 80 00 20 blr > > Yeah that's a pity. > > We could do something like below, which is ugly, but would be better > than having to revert the whole thing. Yes looks nice, we already had that kind of stuff in the past, even more ugly. > > Although this doesn't fix the strange warning in drivers/net/ethernet/sfc. What's the warning ? > > So possibly we need a CLANG ifdef around the whole thing, and use the > old style warn for clang. > Christophe ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto 2021-08-26 3:21 ` Michael Ellerman 2021-08-26 6:37 ` Christophe Leroy @ 2021-08-26 14:12 ` Segher Boessenkool 2021-08-26 18:54 ` Nathan Chancellor 2 siblings, 0 replies; 24+ messages in thread From: Segher Boessenkool @ 2021-08-26 14:12 UTC (permalink / raw) To: Michael Ellerman Cc: Nathan Chancellor, Christophe Leroy, llvm, linux-kernel, clang-built-linux, Paul Mackerras, linuxppc-dev On Thu, Aug 26, 2021 at 01:21:39PM +1000, Michael Ellerman wrote: > So there seems to be some misunderstanding with clang, it doesn't like > us passing an expression to the inline asm. > > AFAIK it is legal to pass expressions as inputs to inline asm, ie. it > doesn't have to just be a variable name. It certainly is. That is the whole point of inline asm! This way, all of the C code "around" the asm can be optimised. > This patch seems to fix it. Not sure if that's just papering over it though. It is, and it makes less optimised code (also on GCC), as Christophe points out. Segher ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto 2021-08-26 3:21 ` Michael Ellerman 2021-08-26 6:37 ` Christophe Leroy 2021-08-26 14:12 ` Segher Boessenkool @ 2021-08-26 18:54 ` Nathan Chancellor 2021-08-26 23:55 ` Nathan Chancellor 2 siblings, 1 reply; 24+ messages in thread From: Nathan Chancellor @ 2021-08-26 18:54 UTC (permalink / raw) To: Michael Ellerman Cc: Christophe Leroy, Benjamin Herrenschmidt, Paul Mackerras, linux-kernel, linuxppc-dev, clang-built-linux, llvm On Thu, Aug 26, 2021 at 01:21:39PM +1000, Michael Ellerman wrote: > Nathan Chancellor <nathan@kernel.org> writes: > > On Tue, Apr 13, 2021 at 04:38:10PM +0000, Christophe Leroy wrote: > >> Using asm goto in __WARN_FLAGS() and WARN_ON() allows more > >> flexibility to GCC. > ... > > > > This patch as commit 1e688dd2a3d6 ("powerpc/bug: Provide better > > flexibility to WARN_ON/__WARN_FLAGS() with asm goto") cause a WARN_ON in > > klist_add_tail to trigger over and over on boot when compiling with > > clang: > > > > [ 2.177416][ T1] WARNING: CPU: 0 PID: 1 at lib/klist.c:62 .klist_add_tail+0x3c/0x110 > > [ 2.177456][ T1] Modules linked in: > > [ 2.177481][ T1] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 5.14.0-rc7-next-20210825 #1 > > [ 2.177520][ T1] NIP: c0000000007ff81c LR: c00000000090a038 CTR: 0000000000000000 > > [ 2.177557][ T1] REGS: c0000000073c32a0 TRAP: 0700 Tainted: G W (5.14.0-rc7-next-20210825) > > [ 2.177593][ T1] MSR: 8000000002029032 <SF,VEC,EE,ME,IR,DR,RI> CR: 22000a40 XER: 00000000 > > [ 2.177667][ T1] CFAR: c00000000090a034 IRQMASK: 0 > > [ 2.177667][ T1] GPR00: c00000000090a038 c0000000073c3540 c000000001be3200 0000000000000001 > > [ 2.177667][ T1] GPR04: c0000000072d65c0 0000000000000000 c0000000091ba798 c0000000091bb0a0 > > [ 2.177667][ T1] GPR08: 0000000000000001 0000000000000000 c000000008581918 fffffffffffffc00 > > [ 2.177667][ T1] GPR12: 0000000044000240 c000000001dd0000 c000000000012300 0000000000000000 > > [ 2.177667][ T1] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > [ 2.177667][ T1] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > [ 2.177667][ T1] GPR24: 0000000000000000 c0000000017e3200 0000000000000000 c000000001a0e778 > > [ 2.177667][ T1] GPR28: c0000000072d65b0 c0000000072d65a8 c000000007de72c8 c0000000073c35d0 > > [ 2.178019][ T1] NIP [c0000000007ff81c] .klist_add_tail+0x3c/0x110 > > [ 2.178058][ T1] LR [c00000000090a038] .bus_add_driver+0x148/0x290 > > [ 2.178088][ T1] Call Trace: > > [ 2.178105][ T1] [c0000000073c3540] [c0000000073c35d0] 0xc0000000073c35d0 (unreliable) > > [ 2.178150][ T1] [c0000000073c35d0] [c00000000090a038] .bus_add_driver+0x148/0x290 > > [ 2.178190][ T1] [c0000000073c3670] [c00000000090fae8] .driver_register+0xb8/0x190 > > [ 2.178234][ T1] [c0000000073c3700] [c000000000be55c0] .__hid_register_driver+0x70/0xd0 > > [ 2.178275][ T1] [c0000000073c37a0] [c00000000116955c] .redragon_driver_init+0x34/0x58 > > [ 2.178314][ T1] [c0000000073c3820] [c000000000011ae0] .do_one_initcall+0x130/0x3b0 > > [ 2.178357][ T1] [c0000000073c3bb0] [c0000000011065e0] .do_initcall_level+0xd8/0x188 > > [ 2.178403][ T1] [c0000000073c3c50] [c0000000011064a8] .do_initcalls+0x7c/0xdc > > [ 2.178445][ T1] [c0000000073c3ce0] [c000000001106238] .kernel_init_freeable+0x178/0x21c > > [ 2.178491][ T1] [c0000000073c3d90] [c000000000012334] .kernel_init+0x34/0x220 > > [ 2.178530][ T1] [c0000000073c3e10] [c00000000000cf50] .ret_from_kernel_thread+0x58/0x60 > > [ 2.178569][ T1] Instruction dump: > > [ 2.178592][ T1] fba10078 7c7d1b78 38600001 fb810070 3b9d0008 fbc10080 7c9e2378 389d0018 > > [ 2.178662][ T1] fb9d0008 fb9d0010 90640000 fbdd0000 <0b1e0000> e87e0018 28230000 41820024 > > [ 2.178728][ T1] ---[ end trace 52ed3431f58f1847 ]--- > > > > Is this a bug with clang or is there something wrong with the patch? The > > vmlinux image is available at [1] if you want to inspect it and our QEMU > > command and the warning at boot can be viewed at [2]. If there is any > > other information I can provide, please let me know. > > > > [1] https://builds.tuxbuild.com/1xDcmp3Tvno0TTGxDVPedRKIKM2/ > > [2] https://github.com/ClangBuiltLinux/continuous-integration2/commit/cee159b66a58eb57fa2359e7888074b9da24126c/checks/3422232736/logs > > Thanks. > > This is the generated assembly: > > c0000000007ff600 <.klist_add_tail>: > c0000000007ff600: 7c 08 02 a6 mflr r0 > c0000000007ff604: f8 01 00 10 std r0,16(r1) > c0000000007ff608: f8 21 ff 71 stdu r1,-144(r1) ^ prolog > c0000000007ff60c: fb a1 00 78 std r29,120(r1) save r29 to stack > c0000000007ff610: 7c 7d 1b 78 mr r29,r3 r29 = struct klist_node *n > c0000000007ff614: 38 60 00 01 li r3,1 r3 = 1 > c0000000007ff618: fb 81 00 70 std r28,112(r1) save r28 to stack > c0000000007ff61c: 3b 9d 00 08 addi r28,r29,8 r28 = &n->n_node > c0000000007ff620: fb c1 00 80 std r30,128(r1) save r30 to stack > c0000000007ff624: 7c 9e 23 78 mr r30,r4 r30 = struct klist *k > c0000000007ff628: 38 9d 00 18 addi r4,r29,24 r4 = &n->n_ref > c0000000007ff62c: fb 9d 00 08 std r28,8(r29) n->n_node.next = &n->n_node INIT_LIST_HEAD > c0000000007ff630: fb 9d 00 10 std r28,16(r29) n->n_node.prev = &n->n_node > c0000000007ff634: 90 64 00 00 stw r3,0(r4) kref_init(&n->n_ref) > c0000000007ff638: fb dd 00 00 std r30,0(r29) n->n_klist = k > c0000000007ff63c: 0b 1e 00 00 tdnei r30,0 trap if r30 (k) is not zero > > > From: > > static void knode_set_klist(struct klist_node *knode, struct klist *klist) > { > knode->n_klist = klist; > /* no knode deserves to start its life dead */ > WARN_ON(knode_dead(knode)); > ^^^^^^^^^^^^^^^^^ > } > > Which expands to: > > static void knode_set_klist(struct klist_node *knode, struct klist *klist) > { > knode->n_klist = klist; > > ({ > bool __ret_warn_on = false; > do { > ... > } else { > __label__ __label_warn_on; > do { > asm goto( > "1: " > "tdnei" > " > " " % 4, > 0 " "\n " ".section __ex_table,\"a\";" > " " > ".balign 4;" > " " > ".long (1b) - . ;" > " " > ".long (%l[__label_warn_on]) - . ;" > " " > ".previous" > " " > ".section __bug_table,\"aw\"\n" > "2:\t.4byte 1b - 2b, %0 - 2b\n" > "\t.short %1, %2\n" > ".org 2b+%3\n" > ".previous\n" > : > : "i"("lib/klist.c"), "i"(62), > "i"((1 << 0) | ((9) << 8)), > "i"(sizeof(struct bug_entry)), > "r"(knode_dead(knode)) > ^^^^^^^^^^^^^^^^^^^^^ > > : > : __label_warn_on); > asm(""); > } while (0); > break; > __label_warn_on: > __ret_warn_on = true; > } > } while (0); > __builtin_expect(!!(__ret_warn_on), 0); > }); > } > > And knode_dead is: > > #define KNODE_DEAD 1LU > > static bool knode_dead(struct klist_node *knode) > { > return (unsigned long)knode->n_klist & KNODE_DEAD; > } > > > So it's meant to warn if (n_klist & KNODE_DEAD) is not equal to zero. > > But in the asm: > > c0000000007ff600 <.klist_add_tail>: > ... > c0000000007ff624: 7c 9e 23 78 mr r30,r4 r30 = struct klist *k > ... > c0000000007ff638: fb dd 00 00 std r30,0(r29) n->n_klist = k > c0000000007ff63c: 0b 1e 00 00 tdnei r30,0 trap if r30 (k) is not zero > > > It's just warning if n_klist is not equal to zero. ie. we lost the "& KNODE_DEAD". > > In the GCC output you can see it: > > c0000000008c8a30 <.klist_node_init>: > c0000000008c8a30: 39 24 00 08 addi r9,r4,8 > c0000000008c8a34: 39 40 00 01 li r10,1 > c0000000008c8a38: f9 24 00 08 std r9,8(r4) > c0000000008c8a3c: f9 24 00 10 std r9,16(r4) > c0000000008c8a40: 91 44 00 18 stw r10,24(r4) > c0000000008c8a44: f8 64 00 00 std r3,0(r4) > c0000000008c8a48: 54 69 07 fe clrlwi r9,r3,31 > c0000000008c8a4c: 0b 09 00 00 tdnei r9,0 > > ie. the clrlwi is "clear left (word) immediate", and zeroes everything > except bit 0, which is equivalent to "& KNODE_DEAD". > > > So there seems to be some misunderstanding with clang, it doesn't like > us passing an expression to the inline asm. > > AFAIK it is legal to pass expressions as inputs to inline asm, ie. it > doesn't have to just be a variable name. > > This patch seems to fix it. Not sure if that's just papering over it though. > > diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h > index 1ee0f22313ee..75fcb4370d96 100644 > --- a/arch/powerpc/include/asm/bug.h > +++ b/arch/powerpc/include/asm/bug.h > @@ -119,7 +119,7 @@ __label_warn_on: \ > \ > WARN_ENTRY(PPC_TLNEI " %4, 0", \ > BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN), \ > - __label_warn_on, "r" (x)); \ > + __label_warn_on, "r" (!!(x))); \ > break; \ > __label_warn_on: \ > __ret_warn_on = true; \ > > > Generates: > > c0000000008e2ac0 <.klist_add_tail>: > c0000000008e2ac0: 7c 08 02 a6 mflr r0 > c0000000008e2ac4: f8 01 00 10 std r0,16(r1) > c0000000008e2ac8: f8 21 ff 71 stdu r1,-144(r1) > c0000000008e2acc: fb a1 00 78 std r29,120(r1) > c0000000008e2ad0: 7c 7d 1b 78 mr r29,r3 > c0000000008e2ad4: 38 60 00 01 li r3,1 > c0000000008e2ad8: fb c1 00 80 std r30,128(r1) > c0000000008e2adc: 7c 9e 23 78 mr r30,r4 > c0000000008e2ae0: 38 9d 00 18 addi r4,r29,24 > c0000000008e2ae4: 57 c5 07 fe clrlwi r5,r30,31 <- > c0000000008e2ae8: fb 81 00 70 std r28,112(r1) > c0000000008e2aec: 3b 9d 00 08 addi r28,r29,8 > c0000000008e2af0: fb 9d 00 08 std r28,8(r29) > c0000000008e2af4: fb 9d 00 10 std r28,16(r29) > c0000000008e2af8: 90 64 00 00 stw r3,0(r4) > c0000000008e2afc: fb dd 00 00 std r30,0(r29) > c0000000008e2b00: 0b 05 00 00 tdnei r5,0 <- Thank you for the in-depth explanation and triage! I have gone ahead and created a standalone reproducer that shows this based on the preprocessed file and opened https://bugs.llvm.org/show_bug.cgi?id=51634 so the LLVM developers can take a look. Cheers, Nathan ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto 2021-08-26 18:54 ` Nathan Chancellor @ 2021-08-26 23:55 ` Nathan Chancellor 2021-08-27 7:53 ` Michael Ellerman 0 siblings, 1 reply; 24+ messages in thread From: Nathan Chancellor @ 2021-08-26 23:55 UTC (permalink / raw) To: Michael Ellerman Cc: llvm, linux-kernel, clang-built-linux, Paul Mackerras, linuxppc-dev, Christophe Leroy, Nick Desaulniers On Thu, Aug 26, 2021 at 11:54:17AM -0700, Nathan Chancellor wrote: > On Thu, Aug 26, 2021 at 01:21:39PM +1000, Michael Ellerman wrote: > > Nathan Chancellor <nathan@kernel.org> writes: > > > On Tue, Apr 13, 2021 at 04:38:10PM +0000, Christophe Leroy wrote: > > >> Using asm goto in __WARN_FLAGS() and WARN_ON() allows more > > >> flexibility to GCC. > > ... > > > > > > This patch as commit 1e688dd2a3d6 ("powerpc/bug: Provide better > > > flexibility to WARN_ON/__WARN_FLAGS() with asm goto") cause a WARN_ON in > > > klist_add_tail to trigger over and over on boot when compiling with > > > clang: > > > > > > [ 2.177416][ T1] WARNING: CPU: 0 PID: 1 at lib/klist.c:62 .klist_add_tail+0x3c/0x110 > > > [ 2.177456][ T1] Modules linked in: > > > [ 2.177481][ T1] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 5.14.0-rc7-next-20210825 #1 > > > [ 2.177520][ T1] NIP: c0000000007ff81c LR: c00000000090a038 CTR: 0000000000000000 > > > [ 2.177557][ T1] REGS: c0000000073c32a0 TRAP: 0700 Tainted: G W (5.14.0-rc7-next-20210825) > > > [ 2.177593][ T1] MSR: 8000000002029032 <SF,VEC,EE,ME,IR,DR,RI> CR: 22000a40 XER: 00000000 > > > [ 2.177667][ T1] CFAR: c00000000090a034 IRQMASK: 0 > > > [ 2.177667][ T1] GPR00: c00000000090a038 c0000000073c3540 c000000001be3200 0000000000000001 > > > [ 2.177667][ T1] GPR04: c0000000072d65c0 0000000000000000 c0000000091ba798 c0000000091bb0a0 > > > [ 2.177667][ T1] GPR08: 0000000000000001 0000000000000000 c000000008581918 fffffffffffffc00 > > > [ 2.177667][ T1] GPR12: 0000000044000240 c000000001dd0000 c000000000012300 0000000000000000 > > > [ 2.177667][ T1] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > [ 2.177667][ T1] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > [ 2.177667][ T1] GPR24: 0000000000000000 c0000000017e3200 0000000000000000 c000000001a0e778 > > > [ 2.177667][ T1] GPR28: c0000000072d65b0 c0000000072d65a8 c000000007de72c8 c0000000073c35d0 > > > [ 2.178019][ T1] NIP [c0000000007ff81c] .klist_add_tail+0x3c/0x110 > > > [ 2.178058][ T1] LR [c00000000090a038] .bus_add_driver+0x148/0x290 > > > [ 2.178088][ T1] Call Trace: > > > [ 2.178105][ T1] [c0000000073c3540] [c0000000073c35d0] 0xc0000000073c35d0 (unreliable) > > > [ 2.178150][ T1] [c0000000073c35d0] [c00000000090a038] .bus_add_driver+0x148/0x290 > > > [ 2.178190][ T1] [c0000000073c3670] [c00000000090fae8] .driver_register+0xb8/0x190 > > > [ 2.178234][ T1] [c0000000073c3700] [c000000000be55c0] .__hid_register_driver+0x70/0xd0 > > > [ 2.178275][ T1] [c0000000073c37a0] [c00000000116955c] .redragon_driver_init+0x34/0x58 > > > [ 2.178314][ T1] [c0000000073c3820] [c000000000011ae0] .do_one_initcall+0x130/0x3b0 > > > [ 2.178357][ T1] [c0000000073c3bb0] [c0000000011065e0] .do_initcall_level+0xd8/0x188 > > > [ 2.178403][ T1] [c0000000073c3c50] [c0000000011064a8] .do_initcalls+0x7c/0xdc > > > [ 2.178445][ T1] [c0000000073c3ce0] [c000000001106238] .kernel_init_freeable+0x178/0x21c > > > [ 2.178491][ T1] [c0000000073c3d90] [c000000000012334] .kernel_init+0x34/0x220 > > > [ 2.178530][ T1] [c0000000073c3e10] [c00000000000cf50] .ret_from_kernel_thread+0x58/0x60 > > > [ 2.178569][ T1] Instruction dump: > > > [ 2.178592][ T1] fba10078 7c7d1b78 38600001 fb810070 3b9d0008 fbc10080 7c9e2378 389d0018 > > > [ 2.178662][ T1] fb9d0008 fb9d0010 90640000 fbdd0000 <0b1e0000> e87e0018 28230000 41820024 > > > [ 2.178728][ T1] ---[ end trace 52ed3431f58f1847 ]--- > > > > > > Is this a bug with clang or is there something wrong with the patch? The > > > vmlinux image is available at [1] if you want to inspect it and our QEMU > > > command and the warning at boot can be viewed at [2]. If there is any > > > other information I can provide, please let me know. > > > > > > [1] https://builds.tuxbuild.com/1xDcmp3Tvno0TTGxDVPedRKIKM2/ > > > [2] https://github.com/ClangBuiltLinux/continuous-integration2/commit/cee159b66a58eb57fa2359e7888074b9da24126c/checks/3422232736/logs > > > > Thanks. > > > > This is the generated assembly: > > > > c0000000007ff600 <.klist_add_tail>: > > c0000000007ff600: 7c 08 02 a6 mflr r0 > > c0000000007ff604: f8 01 00 10 std r0,16(r1) > > c0000000007ff608: f8 21 ff 71 stdu r1,-144(r1) ^ prolog > > c0000000007ff60c: fb a1 00 78 std r29,120(r1) save r29 to stack > > c0000000007ff610: 7c 7d 1b 78 mr r29,r3 r29 = struct klist_node *n > > c0000000007ff614: 38 60 00 01 li r3,1 r3 = 1 > > c0000000007ff618: fb 81 00 70 std r28,112(r1) save r28 to stack > > c0000000007ff61c: 3b 9d 00 08 addi r28,r29,8 r28 = &n->n_node > > c0000000007ff620: fb c1 00 80 std r30,128(r1) save r30 to stack > > c0000000007ff624: 7c 9e 23 78 mr r30,r4 r30 = struct klist *k > > c0000000007ff628: 38 9d 00 18 addi r4,r29,24 r4 = &n->n_ref > > c0000000007ff62c: fb 9d 00 08 std r28,8(r29) n->n_node.next = &n->n_node INIT_LIST_HEAD > > c0000000007ff630: fb 9d 00 10 std r28,16(r29) n->n_node.prev = &n->n_node > > c0000000007ff634: 90 64 00 00 stw r3,0(r4) kref_init(&n->n_ref) > > c0000000007ff638: fb dd 00 00 std r30,0(r29) n->n_klist = k > > c0000000007ff63c: 0b 1e 00 00 tdnei r30,0 trap if r30 (k) is not zero > > > > > > From: > > > > static void knode_set_klist(struct klist_node *knode, struct klist *klist) > > { > > knode->n_klist = klist; > > /* no knode deserves to start its life dead */ > > WARN_ON(knode_dead(knode)); > > ^^^^^^^^^^^^^^^^^ > > } > > > > Which expands to: > > > > static void knode_set_klist(struct klist_node *knode, struct klist *klist) > > { > > knode->n_klist = klist; > > > > ({ > > bool __ret_warn_on = false; > > do { > > ... > > } else { > > __label__ __label_warn_on; > > do { > > asm goto( > > "1: " > > "tdnei" > > " > > " " % 4, > > 0 " "\n " ".section __ex_table,\"a\";" > > " " > > ".balign 4;" > > " " > > ".long (1b) - . ;" > > " " > > ".long (%l[__label_warn_on]) - . ;" > > " " > > ".previous" > > " " > > ".section __bug_table,\"aw\"\n" > > "2:\t.4byte 1b - 2b, %0 - 2b\n" > > "\t.short %1, %2\n" > > ".org 2b+%3\n" > > ".previous\n" > > : > > : "i"("lib/klist.c"), "i"(62), > > "i"((1 << 0) | ((9) << 8)), > > "i"(sizeof(struct bug_entry)), > > "r"(knode_dead(knode)) > > ^^^^^^^^^^^^^^^^^^^^^ > > > > : > > : __label_warn_on); > > asm(""); > > } while (0); > > break; > > __label_warn_on: > > __ret_warn_on = true; > > } > > } while (0); > > __builtin_expect(!!(__ret_warn_on), 0); > > }); > > } > > > > And knode_dead is: > > > > #define KNODE_DEAD 1LU > > > > static bool knode_dead(struct klist_node *knode) > > { > > return (unsigned long)knode->n_klist & KNODE_DEAD; > > } > > > > > > So it's meant to warn if (n_klist & KNODE_DEAD) is not equal to zero. > > > > But in the asm: > > > > c0000000007ff600 <.klist_add_tail>: > > ... > > c0000000007ff624: 7c 9e 23 78 mr r30,r4 r30 = struct klist *k > > ... > > c0000000007ff638: fb dd 00 00 std r30,0(r29) n->n_klist = k > > c0000000007ff63c: 0b 1e 00 00 tdnei r30,0 trap if r30 (k) is not zero > > > > > > It's just warning if n_klist is not equal to zero. ie. we lost the "& KNODE_DEAD". > > > > In the GCC output you can see it: > > > > c0000000008c8a30 <.klist_node_init>: > > c0000000008c8a30: 39 24 00 08 addi r9,r4,8 > > c0000000008c8a34: 39 40 00 01 li r10,1 > > c0000000008c8a38: f9 24 00 08 std r9,8(r4) > > c0000000008c8a3c: f9 24 00 10 std r9,16(r4) > > c0000000008c8a40: 91 44 00 18 stw r10,24(r4) > > c0000000008c8a44: f8 64 00 00 std r3,0(r4) > > c0000000008c8a48: 54 69 07 fe clrlwi r9,r3,31 > > c0000000008c8a4c: 0b 09 00 00 tdnei r9,0 > > > > ie. the clrlwi is "clear left (word) immediate", and zeroes everything > > except bit 0, which is equivalent to "& KNODE_DEAD". > > > > > > So there seems to be some misunderstanding with clang, it doesn't like > > us passing an expression to the inline asm. > > > > AFAIK it is legal to pass expressions as inputs to inline asm, ie. it > > doesn't have to just be a variable name. > > > > This patch seems to fix it. Not sure if that's just papering over it though. > > > > diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h > > index 1ee0f22313ee..75fcb4370d96 100644 > > --- a/arch/powerpc/include/asm/bug.h > > +++ b/arch/powerpc/include/asm/bug.h > > @@ -119,7 +119,7 @@ __label_warn_on: \ > > \ > > WARN_ENTRY(PPC_TLNEI " %4, 0", \ > > BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN), \ > > - __label_warn_on, "r" (x)); \ > > + __label_warn_on, "r" (!!(x))); \ > > break; \ > > __label_warn_on: \ > > __ret_warn_on = true; \ > > > > > > Generates: > > > > c0000000008e2ac0 <.klist_add_tail>: > > c0000000008e2ac0: 7c 08 02 a6 mflr r0 > > c0000000008e2ac4: f8 01 00 10 std r0,16(r1) > > c0000000008e2ac8: f8 21 ff 71 stdu r1,-144(r1) > > c0000000008e2acc: fb a1 00 78 std r29,120(r1) > > c0000000008e2ad0: 7c 7d 1b 78 mr r29,r3 > > c0000000008e2ad4: 38 60 00 01 li r3,1 > > c0000000008e2ad8: fb c1 00 80 std r30,128(r1) > > c0000000008e2adc: 7c 9e 23 78 mr r30,r4 > > c0000000008e2ae0: 38 9d 00 18 addi r4,r29,24 > > c0000000008e2ae4: 57 c5 07 fe clrlwi r5,r30,31 <- > > c0000000008e2ae8: fb 81 00 70 std r28,112(r1) > > c0000000008e2aec: 3b 9d 00 08 addi r28,r29,8 > > c0000000008e2af0: fb 9d 00 08 std r28,8(r29) > > c0000000008e2af4: fb 9d 00 10 std r28,16(r29) > > c0000000008e2af8: 90 64 00 00 stw r3,0(r4) > > c0000000008e2afc: fb dd 00 00 std r30,0(r29) > > c0000000008e2b00: 0b 05 00 00 tdnei r5,0 <- > > Thank you for the in-depth explanation and triage! I have gone ahead and > created a standalone reproducer that shows this based on the > preprocessed file and opened https://bugs.llvm.org/show_bug.cgi?id=51634 > so the LLVM developers can take a look. Based on Eli Friedman's comment in the bug, would something like this work and not regress GCC? I noticed that the BUG_ON macro does a cast as well. Nick pointed out to me privately that we have run into what seems like a similar issue before in commit 6c58f25e6938 ("riscv/atomic: Fix sign extension for RV64I"). Cheers, Nathan diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h index 1ee0f22313ee..35022667f57d 100644 --- a/arch/powerpc/include/asm/bug.h +++ b/arch/powerpc/include/asm/bug.h @@ -119,7 +119,7 @@ __label_warn_on: \ \ WARN_ENTRY(PPC_TLNEI " %4, 0", \ BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN), \ - __label_warn_on, "r" (x)); \ + __label_warn_on, "r" ((__force long)(x))); \ break; \ __label_warn_on: \ __ret_warn_on = true; \ ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto 2021-08-26 23:55 ` Nathan Chancellor @ 2021-08-27 7:53 ` Michael Ellerman 0 siblings, 0 replies; 24+ messages in thread From: Michael Ellerman @ 2021-08-27 7:53 UTC (permalink / raw) To: Nathan Chancellor Cc: llvm, linux-kernel, clang-built-linux, Paul Mackerras, linuxppc-dev, Christophe Leroy, Nick Desaulniers Nathan Chancellor <nathan@kernel.org> writes: > On Thu, Aug 26, 2021 at 11:54:17AM -0700, Nathan Chancellor wrote: >> On Thu, Aug 26, 2021 at 01:21:39PM +1000, Michael Ellerman wrote: >> > Nathan Chancellor <nathan@kernel.org> writes: >> > > On Tue, Apr 13, 2021 at 04:38:10PM +0000, Christophe Leroy wrote: >> > >> Using asm goto in __WARN_FLAGS() and WARN_ON() allows more >> > >> flexibility to GCC. >> > ... >> > > >> > > This patch as commit 1e688dd2a3d6 ("powerpc/bug: Provide better >> > > flexibility to WARN_ON/__WARN_FLAGS() with asm goto") cause a WARN_ON in >> > > klist_add_tail to trigger over and over on boot when compiling with >> > > clang: >> > > ... >> > >> > This patch seems to fix it. Not sure if that's just papering over it though. >> > >> > diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h >> > index 1ee0f22313ee..75fcb4370d96 100644 >> > --- a/arch/powerpc/include/asm/bug.h >> > +++ b/arch/powerpc/include/asm/bug.h >> > @@ -119,7 +119,7 @@ __label_warn_on: \ >> > \ >> > WARN_ENTRY(PPC_TLNEI " %4, 0", \ >> > BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN), \ >> > - __label_warn_on, "r" (x)); \ >> > + __label_warn_on, "r" (!!(x))); \ >> > break; \ >> > __label_warn_on: \ >> > __ret_warn_on = true; \ >> > >> >> Thank you for the in-depth explanation and triage! I have gone ahead and >> created a standalone reproducer that shows this based on the >> preprocessed file and opened https://bugs.llvm.org/show_bug.cgi?id=51634 >> so the LLVM developers can take a look. > > Based on Eli Friedman's comment in the bug, would something like this > work and not regress GCC? I noticed that the BUG_ON macro does a cast as > well. Nick pointed out to me privately that we have run into what seems > like a similar issue before in commit 6c58f25e6938 ("riscv/atomic: Fix > sign extension for RV64I"). Yes, I read that comment this morning, and then landed at the same fix via digging through the history of our bug code. We in fact fixed a similar bug 16 years ago :} 32818c2eb6b8 ("[PATCH] ppc64: Fix issue with gcc 4.0 compiled kernels") Which is when we first started adding the cast to long. > diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h > index 1ee0f22313ee..35022667f57d 100644 > --- a/arch/powerpc/include/asm/bug.h > +++ b/arch/powerpc/include/asm/bug.h > @@ -119,7 +119,7 @@ __label_warn_on: \ > \ > WARN_ENTRY(PPC_TLNEI " %4, 0", \ > BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN), \ > - __label_warn_on, "r" (x)); \ > + __label_warn_on, "r" ((__force long)(x))); \ > break; \ > __label_warn_on: \ > __ret_warn_on = true; \ Yeah that fixes the clang build for me. For GCC it seems to generate the same code in the simple cases: void test_warn_on_ulong(unsigned long b) { WARN_ON(b); } void test_warn_on_int(int b) { WARN_ON(b); } I get: 0000000000000020 <.test_warn_on_ulong>: 20: 0b 03 00 00 tdnei r3,0 24: 4e 80 00 20 blr 28: 60 00 00 00 nop 2c: 60 00 00 00 nop 0000000000000030 <.test_warn_on_int>: 30: 0b 03 00 00 tdnei r3,0 34: 4e 80 00 20 blr Both before and after the change. So that's good. For: void test_warn_on_int_addition(int b) { WARN_ON(b+1); } Before I get: 0000000000000040 <.test_warn_on_int_addition>: 40: 38 63 00 01 addi r3,r3,1 44: 0b 03 00 00 tdnei r3,0 48: 4e 80 00 20 blr vs after: 0000000000000040 <.test_warn_on_int_addition>: 40: 38 63 00 01 addi r3,r3,1 44: 7c 63 07 b4 extsw r3,r3 48: 0b 03 00 00 tdnei r3,0 4c: 4e 80 00 20 blr So there's an extra sign extension we don't need, but I guess we can probably live with that. cheers ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32 2021-04-13 16:38 [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32 Christophe Leroy 2021-04-13 16:38 ` [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto Christophe Leroy @ 2021-08-13 6:08 ` Nicholas Piggin 2021-08-18 15:06 ` Segher Boessenkool 2021-08-18 13:38 ` Michael Ellerman 2 siblings, 1 reply; 24+ messages in thread From: Nicholas Piggin @ 2021-08-13 6:08 UTC (permalink / raw) To: Benjamin Herrenschmidt, Christophe Leroy, Michael Ellerman, Paul Mackerras Cc: linux-kernel, linuxppc-dev Excerpts from Christophe Leroy's message of April 14, 2021 2:38 am: > powerpc BUG_ON() and WARN_ON() are based on using twnei instruction. > > For catching simple conditions like a variable having value 0, this > is efficient because it does the test and the trap at the same time. > But most conditions used with BUG_ON or WARN_ON are more complex and > forces GCC to format the condition into a 0 or 1 value in a register. > This will usually require 2 to 3 instructions. > > The most efficient solution would be to use __builtin_trap() because > GCC is able to optimise the use of the different trap instructions > based on the requested condition, but this is complex if not > impossible for the following reasons: > - __builtin_trap() is a non-recoverable instruction, so it can't be > used for WARN_ON > - Knowing which line of code generated the trap would require the > analysis of DWARF information. This is not a feature we have today. > > As mentioned in commit 8d4fbcfbe0a4 ("Fix WARN_ON() on bitfield ops") > the way WARN_ON() is implemented is suboptimal. That commit also > mentions an issue with 'long long' condition. It fixed it for > WARN_ON() but the same problem still exists today with BUG_ON() on > PPC32. It will be fixed by using the generic implementation. > > By using the generic implementation, gcc will naturally generate a > branch to the unconditional trap generated by BUG(). > > As modern powerpc implement zero-cycle branch, > that's even more efficient. > > And for the functions using WARN_ON() and its return, the test > on return from WARN_ON() is now also used for the WARN_ON() itself. > > On PPC64 we don't want it because we want to be able to use CFAR > register to track how we entered the code that trapped. The CFAR > register would be clobbered by the branch. > > A simple test function: > > unsigned long test9w(unsigned long a, unsigned long b) > { > if (WARN_ON(!b)) > return 0; > return a / b; > } > > Before the patch: > > 0000046c <test9w>: > 46c: 7c 89 00 34 cntlzw r9,r4 > 470: 55 29 d9 7e rlwinm r9,r9,27,5,31 > 474: 0f 09 00 00 twnei r9,0 > 478: 2c 04 00 00 cmpwi r4,0 > 47c: 41 82 00 0c beq 488 <test9w+0x1c> > 480: 7c 63 23 96 divwu r3,r3,r4 > 484: 4e 80 00 20 blr > > 488: 38 60 00 00 li r3,0 > 48c: 4e 80 00 20 blr > > After the patch: > > 00000468 <test9w>: > 468: 2c 04 00 00 cmpwi r4,0 > 46c: 41 82 00 0c beq 478 <test9w+0x10> > 470: 7c 63 23 96 divwu r3,r3,r4 > 474: 4e 80 00 20 blr > > 478: 0f e0 00 00 twui r0,0 > 47c: 38 60 00 00 li r3,0 > 480: 4e 80 00 20 blr That's clearly better because we have a branch anyway. > > So we see before the patch we need 3 instructions on the likely path > to handle the WARN_ON(). With the patch the trap goes on the unlikely > path. > > See below the difference at the entry of system_call_exception where > we have several BUG_ON(), allthough less impressing. > > With the patch: > > 00000000 <system_call_exception>: > 0: 81 6a 00 84 lwz r11,132(r10) > 4: 90 6a 00 88 stw r3,136(r10) > 8: 71 60 00 02 andi. r0,r11,2 > c: 41 82 00 70 beq 7c <system_call_exception+0x7c> > 10: 71 60 40 00 andi. r0,r11,16384 > 14: 41 82 00 6c beq 80 <system_call_exception+0x80> > 18: 71 6b 80 00 andi. r11,r11,32768 > 1c: 41 82 00 68 beq 84 <system_call_exception+0x84> > 20: 94 21 ff e0 stwu r1,-32(r1) > 24: 93 e1 00 1c stw r31,28(r1) > 28: 7d 8c 42 e6 mftb r12 > ... > 7c: 0f e0 00 00 twui r0,0 > 80: 0f e0 00 00 twui r0,0 > 84: 0f e0 00 00 twui r0,0 > > Without the patch: > > 00000000 <system_call_exception>: > 0: 94 21 ff e0 stwu r1,-32(r1) > 4: 93 e1 00 1c stw r31,28(r1) > 8: 90 6a 00 88 stw r3,136(r10) > c: 81 6a 00 84 lwz r11,132(r10) > 10: 69 60 00 02 xori r0,r11,2 > 14: 54 00 ff fe rlwinm r0,r0,31,31,31 > 18: 0f 00 00 00 twnei r0,0 > 1c: 69 60 40 00 xori r0,r11,16384 > 20: 54 00 97 fe rlwinm r0,r0,18,31,31 > 24: 0f 00 00 00 twnei r0,0 > 28: 69 6b 80 00 xori r11,r11,32768 > 2c: 55 6b 8f fe rlwinm r11,r11,17,31,31 > 30: 0f 0b 00 00 twnei r11,0 > 34: 7d 8c 42 e6 mftb r12 This one possibly the branches end up in predictors, whereas conditional trap is always just speculated not to hit. Branches may also have a throughput limit on execution whereas trap could be more (1 per cycle vs 4 per cycle on POWER9). On typical ppc32 CPUs, maybe it's a more obvious win. As you say there is the CFAR issue as well which makes it a problem for 64s. It would have been nice if it could use the same code though. Maybe one day gcc's __builtin_trap() will become smart enough around conditional statements that it it generates better code and tries to avoid branches. Thanks, Nick ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32 2021-08-13 6:08 ` [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32 Nicholas Piggin @ 2021-08-18 15:06 ` Segher Boessenkool 2021-08-26 3:26 ` Nicholas Piggin 0 siblings, 1 reply; 24+ messages in thread From: Segher Boessenkool @ 2021-08-18 15:06 UTC (permalink / raw) To: Nicholas Piggin Cc: Benjamin Herrenschmidt, Christophe Leroy, Michael Ellerman, Paul Mackerras, linuxppc-dev, linux-kernel On Fri, Aug 13, 2021 at 04:08:13PM +1000, Nicholas Piggin wrote: > This one possibly the branches end up in predictors, whereas conditional > trap is always just speculated not to hit. Branches may also have a > throughput limit on execution whereas trap could be more (1 per cycle > vs 4 per cycle on POWER9). I thought only *taken* branches are just one per cycle? And those branches are only taken for the exceptional condition (or the case where we do not care about performance, anyway, if we do have an error most of the time ;-) ) > On typical ppc32 CPUs, maybe it's a more obvious win. As you say there > is the CFAR issue as well which makes it a problem for 64s. It would > have been nice if it could use the same code though. On 64-bit the code looks better for the no-error path as well. > Maybe one day gcc's __builtin_trap() will become smart enough around > conditional statements that it it generates better code and tries to > avoid branches. Internally *all* traps are conditional, in GCC. It also can optimise them quite well. There must be something in the kernel macros that prevents good optimisation. Segher ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32 2021-08-18 15:06 ` Segher Boessenkool @ 2021-08-26 3:26 ` Nicholas Piggin 2021-08-26 12:49 ` Segher Boessenkool 0 siblings, 1 reply; 24+ messages in thread From: Nicholas Piggin @ 2021-08-26 3:26 UTC (permalink / raw) To: Segher Boessenkool Cc: Benjamin Herrenschmidt, Christophe Leroy, linux-kernel, linuxppc-dev, Michael Ellerman, Paul Mackerras Excerpts from Segher Boessenkool's message of August 19, 2021 1:06 am: > On Fri, Aug 13, 2021 at 04:08:13PM +1000, Nicholas Piggin wrote: >> This one possibly the branches end up in predictors, whereas conditional >> trap is always just speculated not to hit. Branches may also have a >> throughput limit on execution whereas trap could be more (1 per cycle >> vs 4 per cycle on POWER9). > > I thought only *taken* branches are just one per cycle? Taken branches are fetched by the front end at one per cycle (assuming they hit the BTAC), but all branches have to be executed by BR at one per cycle. On POWER9 BR even has to execute some other things like mflr as well, but at least that's improved on POWER10. Trap is executed at 4 per cycle and will never use branch table entries or alias with a non-tagged predictor and mispredict. > And those > branches are only taken for the exceptional condition (or the case where > we do not care about performance, anyway, if we do have an error most of > the time ;-) ) It's not that big a deal, but trap is really the best instruction for this. > >> On typical ppc32 CPUs, maybe it's a more obvious win. As you say there >> is the CFAR issue as well which makes it a problem for 64s. It would >> have been nice if it could use the same code though. > > On 64-bit the code looks better for the no-error path as well. > >> Maybe one day gcc's __builtin_trap() will become smart enough around >> conditional statements that it it generates better code and tries to >> avoid branches. > > Internally *all* traps are conditional, in GCC. It also can optimise > them quite well. There must be something in the kernel macros that > prevents good optimisation. I did take a look at it at one point. One problem is that the kernel needs the address of the trap instruction to create the entry for it. The other problem is that __builtin_trap does not return so it can't be used for WARN. LLVM at least seems to have a __builtin_debugtrap which does return. The first problem seems like the show stopper though. AFAIKS it would need a special builtin support that does something to create the table entry, or a guarantee that we could put an inline asm right after the builtin as a recognized pattern and that would give us the instruction following the trap. Thanks, Nick ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32 2021-08-26 3:26 ` Nicholas Piggin @ 2021-08-26 12:49 ` Segher Boessenkool 2021-08-26 13:57 ` Nicholas Piggin 0 siblings, 1 reply; 24+ messages in thread From: Segher Boessenkool @ 2021-08-26 12:49 UTC (permalink / raw) To: Nicholas Piggin Cc: Benjamin Herrenschmidt, Christophe Leroy, linux-kernel, linuxppc-dev, Michael Ellerman, Paul Mackerras Hi! On Thu, Aug 26, 2021 at 01:26:14PM +1000, Nicholas Piggin wrote: > Excerpts from Segher Boessenkool's message of August 19, 2021 1:06 am: > > On Fri, Aug 13, 2021 at 04:08:13PM +1000, Nicholas Piggin wrote: > >> This one possibly the branches end up in predictors, whereas conditional > >> trap is always just speculated not to hit. Branches may also have a > >> throughput limit on execution whereas trap could be more (1 per cycle > >> vs 4 per cycle on POWER9). > > > > I thought only *taken* branches are just one per cycle? > > Taken branches are fetched by the front end at one per cycle (assuming > they hit the BTAC), but all branches have to be executed by BR at one > per cycle This is not true. (Simple) predicted not-taken conditional branches are just folded out, never hit the issue queues. And they are fetched as many together as fit in a fetch group, can complete without limits as well. The BTAC is a frontend thing, used for target address prediction. It does not limit execution. Correctly predicted simple conditional branches just get their prediction validated (and that is not done in the execution units). Incorrectly predicted branches the same, but those cause a redirect and refetch. > > Internally *all* traps are conditional, in GCC. It also can optimise > > them quite well. There must be something in the kernel macros that > > prevents good optimisation. > > I did take a look at it at one point. > > One problem is that the kernel needs the address of the trap instruction > to create the entry for it. The other problem is that __builtin_trap > does not return so it can't be used for WARN. LLVM at least seems to > have a __builtin_debugtrap which does return. This is <https://gcc.gnu.org/PR99299>. > The first problem seems like the show stopper though. AFAIKS it would > need a special builtin support that does something to create the table > entry, or a guarantee that we could put an inline asm right after the > builtin as a recognized pattern and that would give us the instruction > following the trap. I'm not quite sure what this means. Can't you always just put a bla: asm(""); in there, and use the address of "bla"? If not, you need to say a lot more about what you actually want to do :-/ Segher ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32 2021-08-26 12:49 ` Segher Boessenkool @ 2021-08-26 13:57 ` Nicholas Piggin 2021-08-26 14:37 ` Segher Boessenkool 0 siblings, 1 reply; 24+ messages in thread From: Nicholas Piggin @ 2021-08-26 13:57 UTC (permalink / raw) To: Segher Boessenkool Cc: Benjamin Herrenschmidt, Christophe Leroy, linux-kernel, linuxppc-dev, Michael Ellerman, Paul Mackerras Excerpts from Segher Boessenkool's message of August 26, 2021 10:49 pm: > Hi! > > On Thu, Aug 26, 2021 at 01:26:14PM +1000, Nicholas Piggin wrote: >> Excerpts from Segher Boessenkool's message of August 19, 2021 1:06 am: >> > On Fri, Aug 13, 2021 at 04:08:13PM +1000, Nicholas Piggin wrote: >> >> This one possibly the branches end up in predictors, whereas conditional >> >> trap is always just speculated not to hit. Branches may also have a >> >> throughput limit on execution whereas trap could be more (1 per cycle >> >> vs 4 per cycle on POWER9). >> > >> > I thought only *taken* branches are just one per cycle? >> >> Taken branches are fetched by the front end at one per cycle (assuming >> they hit the BTAC), but all branches have to be executed by BR at one >> per cycle > > This is not true. (Simple) predicted not-taken conditional branches are > just folded out, never hit the issue queues. And they are fetched as > many together as fit in a fetch group, can complete without limits as > well. No, they are all dispatched and issue to the BRU for execution. It's trivial to construct a test of a lot of not taken branches in a row and time a loop of it to see it executes at 1 cycle per branch. > The BTAC is a frontend thing, used for target address prediction. It > does not limit execution. I didn't say it did. > Correctly predicted simple conditional branches just get their prediction > validated (and that is not done in the execution units). Incorrectly > predicted branches the same, but those cause a redirect and refetch. How could it validate prediction without issuing? It wouldn't know when sources are ready. > >> > Internally *all* traps are conditional, in GCC. It also can optimise >> > them quite well. There must be something in the kernel macros that >> > prevents good optimisation. >> >> I did take a look at it at one point. >> >> One problem is that the kernel needs the address of the trap instruction >> to create the entry for it. The other problem is that __builtin_trap >> does not return so it can't be used for WARN. LLVM at least seems to >> have a __builtin_debugtrap which does return. > > This is <https://gcc.gnu.org/PR99299>. Aha. >> The first problem seems like the show stopper though. AFAIKS it would >> need a special builtin support that does something to create the table >> entry, or a guarantee that we could put an inline asm right after the >> builtin as a recognized pattern and that would give us the instruction >> following the trap. > > I'm not quite sure what this means. Can't you always just put a > > bla: asm(""); > > in there, and use the address of "bla"? Not AFAIKS. Put it where? > If not, you need to say a lot > more about what you actually want to do :-/ We need to put the address (or relative offset) of the trap instruction into an entry in a different section. Basically this: __builtin_trap(); asm ("1: \n\t" " .section __bug_table,\"aw\" \n\t" "2: .4byte 1b - 2b - 4 \n\t" " .previous"); Where the 1: label must follow immediately after the trap instruction, and where the asm must be emitted even for the case of no-return traps (but anything following the asm could be omitted). Thanks, Nick ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32 2021-08-26 13:57 ` Nicholas Piggin @ 2021-08-26 14:37 ` Segher Boessenkool 2021-08-26 15:04 ` Nicholas Piggin 0 siblings, 1 reply; 24+ messages in thread From: Segher Boessenkool @ 2021-08-26 14:37 UTC (permalink / raw) To: Nicholas Piggin Cc: Benjamin Herrenschmidt, Christophe Leroy, linux-kernel, linuxppc-dev, Michael Ellerman, Paul Mackerras On Thu, Aug 26, 2021 at 11:57:52PM +1000, Nicholas Piggin wrote: > Excerpts from Segher Boessenkool's message of August 26, 2021 10:49 pm: > > On Thu, Aug 26, 2021 at 01:26:14PM +1000, Nicholas Piggin wrote: > >> Excerpts from Segher Boessenkool's message of August 19, 2021 1:06 am: > >> > On Fri, Aug 13, 2021 at 04:08:13PM +1000, Nicholas Piggin wrote: > >> >> This one possibly the branches end up in predictors, whereas conditional > >> >> trap is always just speculated not to hit. Branches may also have a > >> >> throughput limit on execution whereas trap could be more (1 per cycle > >> >> vs 4 per cycle on POWER9). > >> > > >> > I thought only *taken* branches are just one per cycle? > >> > >> Taken branches are fetched by the front end at one per cycle (assuming > >> they hit the BTAC), but all branches have to be executed by BR at one > >> per cycle > > > > This is not true. (Simple) predicted not-taken conditional branches are > > just folded out, never hit the issue queues. And they are fetched as > > many together as fit in a fetch group, can complete without limits as > > well. > > No, they are all dispatched and issue to the BRU for execution. It's > trivial to construct a test of a lot of not taken branches in a row > and time a loop of it to see it executes at 1 cycle per branch. (s/dispatched/issued/) Huh, this was true on p8 already. Sorry for my confusion. In my defence, this doesn't matter for performance on "real code". > > Correctly predicted simple conditional branches just get their prediction > > validated (and that is not done in the execution units). Incorrectly > > predicted branches the same, but those cause a redirect and refetch. > > How could it validate prediction without issuing? It wouldn't know when > sources are ready. In the backend. But that is just how it worked on older cores :-/ > >> The first problem seems like the show stopper though. AFAIKS it would > >> need a special builtin support that does something to create the table > >> entry, or a guarantee that we could put an inline asm right after the > >> builtin as a recognized pattern and that would give us the instruction > >> following the trap. > > > > I'm not quite sure what this means. Can't you always just put a > > > > bla: asm(""); > > > > in there, and use the address of "bla"? > > Not AFAIKS. Put it where? After wherever you want to know the address after. You will have to make sure they stay together somehow. It is much easier to get the address of something, not the address after it. If you know it is just one insn anyway, that isn't hard to handle either (even if prefixed insns exist). > > If not, you need to say a lot > > more about what you actually want to do :-/ > > We need to put the address (or relative offset) of the trap instruction > into an entry in a different section. Basically this: > > __builtin_trap(); > asm ("1: \n\t" > " .section __bug_table,\"aw\" \n\t" > "2: .4byte 1b - 2b - 4 \n\t" > " .previous"); > > Where the 1: label must follow immediately after the trap instruction, > and where the asm must be emitted even for the case of no-return traps > (but anything following the asm could be omitted). The address *after* the trap insn? That is much much harder than the address *of* the trap insn. Segher ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32 2021-08-26 14:37 ` Segher Boessenkool @ 2021-08-26 15:04 ` Nicholas Piggin 2021-08-26 15:30 ` Segher Boessenkool 0 siblings, 1 reply; 24+ messages in thread From: Nicholas Piggin @ 2021-08-26 15:04 UTC (permalink / raw) To: Segher Boessenkool Cc: Benjamin Herrenschmidt, Christophe Leroy, linux-kernel, linuxppc-dev, Michael Ellerman, Paul Mackerras Excerpts from Segher Boessenkool's message of August 27, 2021 12:37 am: > On Thu, Aug 26, 2021 at 11:57:52PM +1000, Nicholas Piggin wrote: >> Excerpts from Segher Boessenkool's message of August 26, 2021 10:49 pm: >> > On Thu, Aug 26, 2021 at 01:26:14PM +1000, Nicholas Piggin wrote: >> >> Excerpts from Segher Boessenkool's message of August 19, 2021 1:06 am: >> >> > On Fri, Aug 13, 2021 at 04:08:13PM +1000, Nicholas Piggin wrote: >> >> >> This one possibly the branches end up in predictors, whereas conditional >> >> >> trap is always just speculated not to hit. Branches may also have a >> >> >> throughput limit on execution whereas trap could be more (1 per cycle >> >> >> vs 4 per cycle on POWER9). >> >> > >> >> > I thought only *taken* branches are just one per cycle? >> >> >> >> Taken branches are fetched by the front end at one per cycle (assuming >> >> they hit the BTAC), but all branches have to be executed by BR at one >> >> per cycle >> > >> > This is not true. (Simple) predicted not-taken conditional branches are >> > just folded out, never hit the issue queues. And they are fetched as >> > many together as fit in a fetch group, can complete without limits as >> > well. >> >> No, they are all dispatched and issue to the BRU for execution. It's >> trivial to construct a test of a lot of not taken branches in a row >> and time a loop of it to see it executes at 1 cycle per branch. > > (s/dispatched/issued/) ? > > Huh, this was true on p8 already. Sorry for my confusion. In my > defence, this doesn't matter for performance on "real code". > >> > Correctly predicted simple conditional branches just get their prediction >> > validated (and that is not done in the execution units). Incorrectly >> > predicted branches the same, but those cause a redirect and refetch. >> >> How could it validate prediction without issuing? It wouldn't know when >> sources are ready. > > In the backend. But that is just how it worked on older cores :-/ Okay. I don't know about older cores than POWER9. Backend would normally include execution though. Only other place you could do it if you don't issue/exec would be after it goes back in order, like completion. But that would be horrible for mispredict penalty. >> >> The first problem seems like the show stopper though. AFAIKS it would >> >> need a special builtin support that does something to create the table >> >> entry, or a guarantee that we could put an inline asm right after the >> >> builtin as a recognized pattern and that would give us the instruction >> >> following the trap. >> > >> > I'm not quite sure what this means. Can't you always just put a >> > >> > bla: asm(""); >> > >> > in there, and use the address of "bla"? >> >> Not AFAIKS. Put it where? > > After wherever you want to know the address after. You will have to > make sure they stay together somehow. I still don't follow. > It is much easier to get the address of something, not the address after > it. If you know it is just one insn anyway, that isn't hard to handle > either (even if prefixed insns exist). > >> > If not, you need to say a lot >> > more about what you actually want to do :-/ >> >> We need to put the address (or relative offset) of the trap instruction >> into an entry in a different section. Basically this: >> >> __builtin_trap(); >> asm ("1: \n\t" >> " .section __bug_table,\"aw\" \n\t" >> "2: .4byte 1b - 2b - 4 \n\t" >> " .previous"); >> >> Where the 1: label must follow immediately after the trap instruction, >> and where the asm must be emitted even for the case of no-return traps >> (but anything following the asm could be omitted). > > The address *after* the trap insn? That is much much harder than the > address *of* the trap insn. No the address of the trap instruction, hence the -4. I have the label after because that is the semantics I have from inline asm. If you could give a built in that put a label at the address of the trap instruction that could be used later by inline asm then that could work too: __builtin_labeled_trap("1:"); asm (" .section __bug_table,\"aw\" \n\t" "2: .4byte 1b - 2b \n\t" " .previous"); ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32 2021-08-26 15:04 ` Nicholas Piggin @ 2021-08-26 15:30 ` Segher Boessenkool 2021-08-27 1:28 ` Nicholas Piggin 0 siblings, 1 reply; 24+ messages in thread From: Segher Boessenkool @ 2021-08-26 15:30 UTC (permalink / raw) To: Nicholas Piggin Cc: Benjamin Herrenschmidt, Christophe Leroy, linux-kernel, linuxppc-dev, Michael Ellerman, Paul Mackerras On Fri, Aug 27, 2021 at 01:04:36AM +1000, Nicholas Piggin wrote: > Excerpts from Segher Boessenkool's message of August 27, 2021 12:37 am: > >> No, they are all dispatched and issue to the BRU for execution. It's > >> trivial to construct a test of a lot of not taken branches in a row > >> and time a loop of it to see it executes at 1 cycle per branch. > > > > (s/dispatched/issued/) > > ? Dispatch is from decode to the issue queues. Issue is from there to execution units. Dispatch is in-order, issue is not. > >> How could it validate prediction without issuing? It wouldn't know when > >> sources are ready. > > > > In the backend. But that is just how it worked on older cores :-/ > > Okay. I don't know about older cores than POWER9. Backend would normally > include execution though. > Only other place you could do it if you don't > issue/exec would be after it goes back in order, like completion. You do not have to do the verification in-order: the insn cannot finish until it is no longer speculative, that takes care of all ordering needed. > But that would be horrible for mispredict penalty. See the previous point. Also, any insn known to be mispredicted can be flushed immediately anyway. > >> >> The first problem seems like the show stopper though. AFAIKS it would > >> >> need a special builtin support that does something to create the table > >> >> entry, or a guarantee that we could put an inline asm right after the > >> >> builtin as a recognized pattern and that would give us the instruction > >> >> following the trap. > >> > > >> > I'm not quite sure what this means. Can't you always just put a > >> > > >> > bla: asm(""); > >> > > >> > in there, and use the address of "bla"? > >> > >> Not AFAIKS. Put it where? > > > > After wherever you want to know the address after. You will have to > > make sure they stay together somehow. > > I still don't follow. some_thing_you_want_to_know_the_address_after_let_us_call_it_A; empty_asm_that_we_can_take_the_address_of_known_as_B; You have to make sure the compiler keeps A and B together, does not insert anything between them, does put them in the assembler output in the same fragment, etc. > If you could give a built in that put a label at the address of the trap > instruction that could be used later by inline asm then that could work > too: > > __builtin_labeled_trap("1:"); > asm (" .section __bug_table,\"aw\" \n\t" > "2: .4byte 1b - 2b \n\t" > " .previous"); How could a compiler do anything like that?! Segher ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32 2021-08-26 15:30 ` Segher Boessenkool @ 2021-08-27 1:28 ` Nicholas Piggin 0 siblings, 0 replies; 24+ messages in thread From: Nicholas Piggin @ 2021-08-27 1:28 UTC (permalink / raw) To: Segher Boessenkool Cc: Benjamin Herrenschmidt, Christophe Leroy, linux-kernel, linuxppc-dev, Michael Ellerman, Paul Mackerras Excerpts from Segher Boessenkool's message of August 27, 2021 1:30 am: > On Fri, Aug 27, 2021 at 01:04:36AM +1000, Nicholas Piggin wrote: >> Excerpts from Segher Boessenkool's message of August 27, 2021 12:37 am: >> >> No, they are all dispatched and issue to the BRU for execution. It's >> >> trivial to construct a test of a lot of not taken branches in a row >> >> and time a loop of it to see it executes at 1 cycle per branch. >> > >> > (s/dispatched/issued/) >> >> ? > > Dispatch is from decode to the issue queues. Issue is from there to > execution units. Dispatch is in-order, issue is not. I know what those mean, I wonder what your s/dispatched/issued means. I was saying they are dispatched in response to you saying they never hit the issue queue. > >> >> How could it validate prediction without issuing? It wouldn't know when >> >> sources are ready. >> > >> > In the backend. But that is just how it worked on older cores :-/ >> >> Okay. I don't know about older cores than POWER9. Backend would normally >> include execution though. >> Only other place you could do it if you don't >> issue/exec would be after it goes back in order, like completion. > > You do not have to do the verification in-order: the insn cannot finish > until it is no longer speculative, that takes care of all ordering > needed. Branches *can* finish out of order and speculative as they do in P9 and P10. Are you talking about these CPUs or something else which can verify branches without issuing them? > >> But that would be horrible for mispredict penalty. > > See the previous point. Also, any insn known to be mispredicted can be > flushed immediately anyway. The point is it has to know sources (CR) to verify (aka execute) the branch prediction was right, and if it needs sources then it needs to either issue and execute in the out of order part, or it needs to wait until completion which would seem to be prohibitively expensive. I am interested to know how it works. > >> >> >> The first problem seems like the show stopper though. AFAIKS it would >> >> >> need a special builtin support that does something to create the table >> >> >> entry, or a guarantee that we could put an inline asm right after the >> >> >> builtin as a recognized pattern and that would give us the instruction >> >> >> following the trap. >> >> > >> >> > I'm not quite sure what this means. Can't you always just put a >> >> > >> >> > bla: asm(""); >> >> > >> >> > in there, and use the address of "bla"? >> >> >> >> Not AFAIKS. Put it where? >> > >> > After wherever you want to know the address after. You will have to >> > make sure they stay together somehow. >> >> I still don't follow. > > some_thing_you_want_to_know_the_address_after_let_us_call_it_A; > empty_asm_that_we_can_take_the_address_of_known_as_B; > > You have to make sure the compiler keeps A and B together, does not > insert anything between them, does put them in the assembler output in > the same fragment, etc. How does all this help our problem of putting the address of the trap into the table? > >> If you could give a built in that put a label at the address of the trap >> instruction that could be used later by inline asm then that could work >> too: >> >> __builtin_labeled_trap("1:"); >> asm (" .section __bug_table,\"aw\" \n\t" >> "2: .4byte 1b - 2b \n\t" >> " .previous"); > > How could a compiler do anything like that?! How could it add a label at the trap instruction it generates? It didn't seem like an outlandish thing to do, but I'm not a compiler writer. It was just a handwaving idea to show what we want to be able to do. Thanks, Nick ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32 2021-04-13 16:38 [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32 Christophe Leroy 2021-04-13 16:38 ` [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto Christophe Leroy 2021-08-13 6:08 ` [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32 Nicholas Piggin @ 2021-08-18 13:38 ` Michael Ellerman 2 siblings, 0 replies; 24+ messages in thread From: Michael Ellerman @ 2021-08-18 13:38 UTC (permalink / raw) To: Christophe Leroy, Benjamin Herrenschmidt, Michael Ellerman, Paul Mackerras Cc: linux-kernel, linuxppc-dev On Tue, 13 Apr 2021 16:38:09 +0000 (UTC), Christophe Leroy wrote: > powerpc BUG_ON() and WARN_ON() are based on using twnei instruction. > > For catching simple conditions like a variable having value 0, this > is efficient because it does the test and the trap at the same time. > But most conditions used with BUG_ON or WARN_ON are more complex and > forces GCC to format the condition into a 0 or 1 value in a register. > This will usually require 2 to 3 instructions. > > [...] Applied to powerpc/next. [1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32 https://git.kernel.org/powerpc/c/db87a7199229b75c9996bf78117eceb81854fce2 [2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto https://git.kernel.org/powerpc/c/1e688dd2a3d6759d416616ff07afc4bb836c4213 cheers ^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2021-08-27 7:53 UTC | newest] Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-04-13 16:38 [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32 Christophe Leroy 2021-04-13 16:38 ` [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto Christophe Leroy 2021-08-13 6:19 ` Nicholas Piggin 2021-08-15 3:49 ` Michael Ellerman 2021-08-25 21:25 ` Nathan Chancellor 2021-08-26 3:21 ` Michael Ellerman 2021-08-26 6:37 ` Christophe Leroy 2021-08-26 13:47 ` Segher Boessenkool 2021-08-26 14:45 ` Michael Ellerman 2021-08-26 14:53 ` Christophe Leroy 2021-08-26 14:12 ` Segher Boessenkool 2021-08-26 18:54 ` Nathan Chancellor 2021-08-26 23:55 ` Nathan Chancellor 2021-08-27 7:53 ` Michael Ellerman 2021-08-13 6:08 ` [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32 Nicholas Piggin 2021-08-18 15:06 ` Segher Boessenkool 2021-08-26 3:26 ` Nicholas Piggin 2021-08-26 12:49 ` Segher Boessenkool 2021-08-26 13:57 ` Nicholas Piggin 2021-08-26 14:37 ` Segher Boessenkool 2021-08-26 15:04 ` Nicholas Piggin 2021-08-26 15:30 ` Segher Boessenkool 2021-08-27 1:28 ` Nicholas Piggin 2021-08-18 13:38 ` Michael Ellerman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).